Incorrect IRQ timings in Oric emulators
Incorrect IRQ timings in Oric emulators
Looks like Fabrice and Xeron are not totally done with their Oric work
After the Solskogen party I decided to take a look at if I could optimize the replay routine (both in terms of CPU time and quality), so I started to code a small profiler. Then I compared the sound quality between Euphoric and Oricutron... and noticed that the profiler did not return the same values.
To be sure I ran it on two of my real Orics, and I had a third set of profiler values (but both Oric agree on the same value). Then finally I downloaded MESS to check, and MESS gives yet another value (but the closest from a real Oric).
So here are the result (first number is routine called from the ROM, second result is optimized IRQ running from Overlay memory):
Oricutron: 289/279
Euphoric: 2C9/2BD
MESS: 2A3/xxx (MESS does not support floppy drives emulation)
Pravetz: 2A7/29B
Atmos: 2A7/xxx (my floppy drive does not work on that one)
So as you can see the CPU time for the real orics is somewhat between what Oricutron and Euphoric take. I suspect that the speed difference is due to a difference in the amount of clock cycles counted for the IRQ sequence.
What would be cool is if everybody who has a working Oric or random emulator could run the small test program, with or without a disk drive (or cumulus) connected, and report the numbers that appear on the top left.
When the top right message show a red ROM, it means it uses the slow ROM based IRQ, when you get a green OVR, it means it uses the faster overlay ram based IRQ system. I'm also interested by reports about crashes or lack of sound.
Here is the link: http://www.defence-force.org/download/o ... chmark.tap
Thanks
After the Solskogen party I decided to take a look at if I could optimize the replay routine (both in terms of CPU time and quality), so I started to code a small profiler. Then I compared the sound quality between Euphoric and Oricutron... and noticed that the profiler did not return the same values.
To be sure I ran it on two of my real Orics, and I had a third set of profiler values (but both Oric agree on the same value). Then finally I downloaded MESS to check, and MESS gives yet another value (but the closest from a real Oric).
So here are the result (first number is routine called from the ROM, second result is optimized IRQ running from Overlay memory):
Oricutron: 289/279
Euphoric: 2C9/2BD
MESS: 2A3/xxx (MESS does not support floppy drives emulation)
Pravetz: 2A7/29B
Atmos: 2A7/xxx (my floppy drive does not work on that one)
So as you can see the CPU time for the real orics is somewhat between what Oricutron and Euphoric take. I suspect that the speed difference is due to a difference in the amount of clock cycles counted for the IRQ sequence.
What would be cool is if everybody who has a working Oric or random emulator could run the small test program, with or without a disk drive (or cumulus) connected, and report the numbers that appear on the top left.
When the top right message show a red ROM, it means it uses the slow ROM based IRQ, when you get a green OVR, it means it uses the faster overlay ram based IRQ system. I'm also interested by reports about crashes or lack of sound.
Here is the link: http://www.defence-force.org/download/o ... chmark.tap
Thanks
Chema: Yes, I also have the 2A6/2A7 oscillation on my machines, so your experience matches mine.
About the sound quality, well its still 4bit samples playing at 4khz I have some possible ideas on how to improve the quality (basically error propagation instead of just truncating each individual sample value.)
About the sound quality, well its still 4bit samples playing at 4khz I have some possible ideas on how to improve the quality (basically error propagation instead of just truncating each individual sample value.)
[This post was edited to show the results with version 1.1 of the test]
Some updates, I actually found another issue: I modified my program to use the VIA timer 2 to accurately measure the time taken by some random routine when my sample player is running in the background (using an IRQ triggered by the VIA timer 1).
Based on that I ran five tests:
- Compute the time taken to just set the value of the timer2 and then read it immediately back to have a good base value
- Measure the time taken at startup on the Oric with just the normal 100hz system interrupt running (the one that reads the keyboad, blinks the cursor, handle internal timers, etc...)
- Measure the time when my sample player is not initialized, but with the interruptions disabled on the 6502 (instruction SEI before the test).
- Measure the time when my sample player is initialized (with the timer 1 setup), but with the interruptions disabled on the 6502 (instruction SEI before the test).
- Measure the time when everything is initialized and interruptions allowed on the 6502 (instruction CLI before the test).
I have to say that the results are entertaining
For the reference here is a screenshot taken on my TV connected to a real Oric Atmos 48k:
And here are the results we got from three different emulators:
- Euphoric 1007:
- Oricutron 0.7:
- Mess 0143u2b:
As you can see the numbers are quite different, but from these we can give some conclusions:
- The base test returns $11 on the Atmos, both Euphoric and Oricutron are close with $12, Mess is a bit faster with $f. So here we are between +1 and -2 cycles compared to the real machine.
- If you take into account this small difference in timer value reading, the Mess seems to be remarkably similar to the real machine on the normal Oric setup: Same $13c3 difference, and $5047+2=$5049, $640a+2=$640c, so it's consistent with what my Atmos shows. Euphoric has worse MAX value, but the diff is still relatively close. Oricutron is the farther with $14C3 instead of $13C3, that's about 256 more clock cycles.
- Euphoric is the only emulator which has different clock cycles values for the tests with SEI enabled, this means that for some strange reasons even with the 6502 not accepting any IRQ it still get somewhat some lost cpu time when you play with the VIA. Definitely a real bug here.
- When IRQ are disabled Oricutron is almost like the reference Atmos, just off by plus one (465D instead of 465C), while Mess is off by minus two (465A instead of 465C), that's consistent with the +1/-2 timer 2 reading difference we had at the start.
- The last test is the most interesting. Basically instead of the 100hz single interrupt (so that's two interrupts per frame) we have a 4khz interrupt (so that's 80 interrupts per frame). Mess is actually still consistent there, showing the same values than the Atmos, minus two cycles. Euphoric is about 662 clock cycles higher ($6a9c-6853) that the reference Atmos which divided by 80 would amount for about 8 additional clock cycle per IRQ. Oricutron on the other hand is way to fast by 901 clock cycles ($6806-$6481), so that would be about 11 missing clock cycles per IRQ.
Stay tuned
PS: You can find the source code on the SVN server: http://miniserve.defence-force.org/svn/ ... Benchmark/
The program itself is on the FTP: http://www.defence-force.org/ftp/forum/ ... Bench1.tap
Some updates, I actually found another issue: I modified my program to use the VIA timer 2 to accurately measure the time taken by some random routine when my sample player is running in the background (using an IRQ triggered by the VIA timer 1).
Based on that I ran five tests:
- Compute the time taken to just set the value of the timer2 and then read it immediately back to have a good base value
- Measure the time taken at startup on the Oric with just the normal 100hz system interrupt running (the one that reads the keyboad, blinks the cursor, handle internal timers, etc...)
- Measure the time when my sample player is not initialized, but with the interruptions disabled on the 6502 (instruction SEI before the test).
- Measure the time when my sample player is initialized (with the timer 1 setup), but with the interruptions disabled on the 6502 (instruction SEI before the test).
- Measure the time when everything is initialized and interruptions allowed on the 6502 (instruction CLI before the test).
I have to say that the results are entertaining
For the reference here is a screenshot taken on my TV connected to a real Oric Atmos 48k:
And here are the results we got from three different emulators:
- Euphoric 1007:
- Oricutron 0.7:
- Mess 0143u2b:
As you can see the numbers are quite different, but from these we can give some conclusions:
- The base test returns $11 on the Atmos, both Euphoric and Oricutron are close with $12, Mess is a bit faster with $f. So here we are between +1 and -2 cycles compared to the real machine.
- If you take into account this small difference in timer value reading, the Mess seems to be remarkably similar to the real machine on the normal Oric setup: Same $13c3 difference, and $5047+2=$5049, $640a+2=$640c, so it's consistent with what my Atmos shows. Euphoric has worse MAX value, but the diff is still relatively close. Oricutron is the farther with $14C3 instead of $13C3, that's about 256 more clock cycles.
- Euphoric is the only emulator which has different clock cycles values for the tests with SEI enabled, this means that for some strange reasons even with the 6502 not accepting any IRQ it still get somewhat some lost cpu time when you play with the VIA. Definitely a real bug here.
- When IRQ are disabled Oricutron is almost like the reference Atmos, just off by plus one (465D instead of 465C), while Mess is off by minus two (465A instead of 465C), that's consistent with the +1/-2 timer 2 reading difference we had at the start.
- The last test is the most interesting. Basically instead of the 100hz single interrupt (so that's two interrupts per frame) we have a 4khz interrupt (so that's 80 interrupts per frame). Mess is actually still consistent there, showing the same values than the Atmos, minus two cycles. Euphoric is about 662 clock cycles higher ($6a9c-6853) that the reference Atmos which divided by 80 would amount for about 8 additional clock cycle per IRQ. Oricutron on the other hand is way to fast by 901 clock cycles ($6806-$6481), so that would be about 11 missing clock cycles per IRQ.
Stay tuned
PS: You can find the source code on the SVN server: http://miniserve.defence-force.org/svn/ ... Benchmark/
The program itself is on the FTP: http://www.defence-force.org/ftp/forum/ ... Bench1.tap
Last edited by Dbug on Sat Aug 06, 2011 1:46 pm, edited 3 times in total.
From the above post, I understood that (as IRQs are disabled for some tests) there is a difference of one cycle between oricutron and real Orics.
Not sure if it would be a good idea, but maybe having the actual code of the routine would at least give a hint of which instruction could be the culprit? Being just one cycle, maybe one of those extra cycles due to page crossing?
About the bug in Euphoric, might look uglier, but at least we know that something is eating up cycles when interrupts are disabled. I think you should email Fabrice about this. After all he modified Euphoric after the tests with 1337, so he will probably be willing to look into this.
Finaly, maybe it is not an issue of the cycles taken by interrupt handling. Nevertheless, for the records, I found in a book the cycles used by the interrupt sequence (just in case it can help). It uses 7 cycles and fetchs the opcode of the first instruction to run in the service routine in the 8th.
BTW the interrupt disable flag is automatically turned on by the micro.
Not sure if it would be a good idea, but maybe having the actual code of the routine would at least give a hint of which instruction could be the culprit? Being just one cycle, maybe one of those extra cycles due to page crossing?
About the bug in Euphoric, might look uglier, but at least we know that something is eating up cycles when interrupts are disabled. I think you should email Fabrice about this. After all he modified Euphoric after the tests with 1337, so he will probably be willing to look into this.
Finaly, maybe it is not an issue of the cycles taken by interrupt handling. Nevertheless, for the records, I found in a book the cycles used by the interrupt sequence (just in case it can help). It uses 7 cycles and fetchs the opcode of the first instruction to run in the service routine in the 8th.
BTW the interrupt disable flag is automatically turned on by the micro.
Actually I found a bug in my test, I did not reset the min/max values between the tests, making a new version which also check the cpu time used by the normal IRQ as you normally get when booting the Oric.
Will update the SVN and numbers after. The number changed, but they are still indicating valid issues in the timings
Will update the SVN and numbers after. The number changed, but they are still indicating valid issues in the timings
If anyone wants to help fix this in Oricutron, there is a function in 6502.c called m6502_set_icycles, which calculates the number of cycles the next instruction will take.
At the top of the function, it checks for breakpoints. The cycles calculation is in the "switch( nextop )" block @ line 547.
At the top of the function, it checks for breakpoints. The cycles calculation is in the "switch( nextop )" block @ line 547.
If we consider that with IRQ disabled there's only one cycle difference shown between Oricutron and the Atmos, I believe there's no problem in any of the base instructions timing. I think there's more something to do with the VIA timers handling. Possibly something like one cycle difference in the way the latch/start counting is done.
My rationale is that the TIME2 OFFSET value is $12 instead of $11, which in term of instructions ran by the processor just amount to that exact code:
As you can see there are no incredible or rarely used addressing modes, just immediate loading, jsr, sta, a grand total of 13 instructions.
My rationale is that the TIME2 OFFSET value is $12 instead of $11, which in term of instructions ran by the processor just amount to that exact code:
Code: Select all
sei
; Start the timing
ldy #0
jsr _ProfilerReset
ldy #0
jsr _ProfilerRead
; At this point the timing is done
lda _ProfilerTimer
sta tmp0
lda _ProfilerTimer+1
sta tmp0+1
sec
lda #<( $ffff)
sbc tmp0
sta tmp0
lda #>( $ffff)
sbc tmp0+1
sta tmp0+1
lda tmp0
sta _ProfilerTimerOffset
lda tmp0+1
sta _ProfilerTimerOffset+1
;
; At this point _ProfilerTimerOffset contains
; $11 on an Oric Atmos
; $12 on Oricutron and Euphoric
; $0f on MESS
_ProfilerReset
lda #$ff
sta VIA_T2C_L
sta VIA_T2C_H
rts
_ProfilerRead
lda VIA_T2C_L
ldx VIA_T2C_H
sta _ProfilerTimer+0
stx _ProfilerTimer+1
rts
_ProfilerTimerMin
.word $ffff
_ProfilerTimerOffset
.word $0
OK, i've investigated the timer2 test at the start. After writing $FFFF to the timer 2 latch, the following instructions are executed:
As you can see, this sequence is 18 cycles. I'm guessing the real oric shows that either the VIA takes an extra cycle when you set the counter via the latch, or that the lda instruction does the actual read 1 cycle before the instruction completes. Both make sense, so I guess i need to find out which one is correct.
Code: Select all
inst. cyc t2 before t2 after t2 diff total
-----------------------------------------------------
rts 6 $ffff $fff9 6 6 ($06)
ldy imm 2 $fff9 $fff7 2 8 ($08)
jsr abs 6 $fff7 $fff1 6 14 ($0e)
lda abs 4 $fff1 $ffed 4 18 ($12)
i investigated further last night, and discussed it with dbug on irc. Just thought i'd update this thread.
After looking at the timing diagrams in the 6522 datasheet, it looked like the timer reload is ok, so it looks like the load instruction retrieves the value on the cycle before the end of the instruction. I have added support for this and now the timer 2 value and SEI tests match the reference from real hw.
now we just need to figure out why the tests with interrupts enabled are so far out...
After looking at the timing diagrams in the 6522 datasheet, it looked like the timer reload is ok, so it looks like the load instruction retrieves the value on the cycle before the end of the instruction. I have added support for this and now the timer 2 value and SEI tests match the reference from real hw.
now we just need to figure out why the tests with interrupts enabled are so far out...
Well, i'm a bit closer to the real hw result now
The SVN commit message explains the change well enough:
"Fixed an issue with the 6502 irq emulation where the cycles for the next instruction were calculated, the machine was emulated for that many cycles, then the 6502 instruction was executed.
The problem with this is that if during the cycles for that instruction an interrupt was raised, the actual instruction executed would be the first one if the irq. In real hw, thats like an irq travelling back in time and causing the CPU to execute a different instruction. It also has the side effect of the wrong number of cycles being executed of the rest of the machine for that instruction.
Now, the cycles are calculated for the next cpu instruction, and the machine is emulated for that many cycles as before, but now, when the CPU instruction is executed, it always executes the instruction used to calculate the cycles. This then behaves like the real hw; an irq only happens once the current instruction has finished."
The SVN commit message explains the change well enough:
"Fixed an issue with the 6502 irq emulation where the cycles for the next instruction were calculated, the machine was emulated for that many cycles, then the 6502 instruction was executed.
The problem with this is that if during the cycles for that instruction an interrupt was raised, the actual instruction executed would be the first one if the irq. In real hw, thats like an irq travelling back in time and causing the CPU to execute a different instruction. It also has the side effect of the wrong number of cycles being executed of the rest of the machine for that instruction.
Now, the cycles are calculated for the next cpu instruction, and the machine is emulated for that many cycles as before, but now, when the CPU instruction is executed, it always executes the instruction used to calculate the cycles. This then behaves like the real hw; an irq only happens once the current instruction has finished."