Novalight - very fast tape loading
Re: Novalight - very fast tape loading
'Multipart booster' option is working in beta version It saves around 0.32 second for each additional part after the 1st one.
Spotted another bug: 'old loader' option is ignored if combined with F16 speed. Will be corrected.
I'm not going to hurry as all this needs testing and double check before release.
Spotted another bug: 'old loader' option is ignored if combined with F16 speed. Will be corrected.
I'm not going to hurry as all this needs testing and double check before release.
Re: Novalight - very fast tape loading
I wonder if I can't get rid of this part:
I can't recall why I did this, maybe something to do with the loader / no loader option.
Getting rid of it would save 7 bytes that might be used to add a bit of complexity to the nomral bytes encoding, thus save a few fractions of seconds. There's another big IF: IF execution time of a normal byte decoding allows adding complexity! (each reading + decoding must be done within 69µs, as the shortest time between 2 sinusoids is 69µs !)
Code: Select all
0103 6 20 62 01 JSR $0162 Read a byte, waiting for value $24 to start
0106 C9 24 CMP #$24 is it $24? (no bit sync, the signal begins with several stop bits!)
0108 D0 F9 BNE -7 No, then loop waiting for $24 to start
Getting rid of it would save 7 bytes that might be used to add a bit of complexity to the nomral bytes encoding, thus save a few fractions of seconds. There's another big IF: IF execution time of a normal byte decoding allows adding complexity! (each reading + decoding must be done within 69µs, as the shortest time between 2 sinusoids is 69µs !)
Re: Novalight - very fast tape loading
Still thinking what could be done it those 7 bytes were removed.
Moments where there is some time left:
- after a normal byte decoding, something like 20µs are available
- after the last repeated byte of a RLE sequence, "loads" of time, let's say 50µs impossible due to code structure, would corrupt the RLE compression (no time to add a test to branch to another part of code)
- after any kind of byte, if we know the next one is a dictionary byte or a RLE sequence, around 20µs
Penalty for adding bytes in the normal byte decoding: it would be loaded at normal speed at the beginning, while the 7 removed bytes were loaded at Novalight speed. So that means a slower loader (+0.03s at F16 speed), not sure it's worth it.
Processing
Moments where there is some time left:
- after a normal byte decoding, something like 20µs are available
- after the last repeated byte of a RLE sequence, "loads" of time, let's say 50µs impossible due to code structure, would corrupt the RLE compression (no time to add a test to branch to another part of code)
- after any kind of byte, if we know the next one is a dictionary byte or a RLE sequence, around 20µs
Penalty for adding bytes in the normal byte decoding: it would be loaded at normal speed at the beginning, while the 7 removed bytes were loaded at Novalight speed. So that means a slower loader (+0.03s at F16 speed), not sure it's worth it.
Processing
Re: Novalight - very fast tape loading
Well, nothing I tried today managed to give better than little time saving in certain conditions, and making things a bit worse in other conditions.
I'm running out of ideas, it's time to give up for the moment
I'm running out of ideas, it's time to give up for the moment
Re: Novalight - very fast tape loading
I'm still not familiar with the Novalight code - it's complex and requires lot concentration.
But this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much
But this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much
Re: Novalight - very fast tape loading
Yes, that's one of the reasons it took me a while: after each break, it was taking me a while to recall everything.
In the end that's the kind of code that crashes if you move the slightest thing! And, I must say some of my comments are not always very clear
One puzzling thing it the reversed encoding that encodes normal bytes reverse-decoded and reversed again.
This afternoon I was having an idea with inverting bytes again but ended lost
Well I think it's here for historical reasons, my very starting point was the ROM loading code! I honestly think it's pointless now, just 3 samples gap and a starting bit would work I think. But you're right, these are Novalight encoded bytes and removing them saves something like... 0.003 seconds. Except to save room for the stack, and unless I found some significant new compression idea, they can remain here
Re: Novalight - very fast tape loading
I found something whose decoding could fit in 6 bytes/8µs: back to one of the starting ideas of Novalight which was bit compression.
The 1111 sequence is coded by 6 + 6 samples. The idea is to use 7 samples instead of 6+6.
By modifying the code like this (see lines with "+"), it should be fine:
Estimated time saving for Zorgons Revenge: from 14.5s, it would end around 13.7s. Quite good for 6 bytes!
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
- remove the 7 bytes and see if it still works
- add the 6 bytes for decoding
- test
As this will take a while (if it ever works), I think I'll release a 1.1L version with the little bug corrections and multipart booster before.
The 1111 sequence is coded by 6 + 6 samples. The idea is to use 7 samples instead of 6+6.
By modifying the code like this (see lines with "+"), it should be fine:
Code: Select all
Reading data (normal byte)
0176 2 A9 FE LDA #$FE Set accumulator bits to 11111110 => after ROL, will always give C=1 (luckily avoids SEC
before BCS) except for the last which allows to exit => no need for a loop index.
0178 2/3 B0 FE BCS -2 infinite loop (wait for interrupt with a 3 cycles precision; will stack PC and P)
017A* 2 E0 96 CPX #$96 Length <= 5 samples, so C=1, else C=0 *** *** CHANGE HERE THRESHOLD 4/5 *** ***
017Cj 2/3 B0 06 BCS +6 jump if 3 or 4 samples (C=1)
017E 2 2A ROL A 1st bit: add C to A (here C=0 for 5 or 6 samples)
+ CPX 6/7 samples threshold
+ BCS+2 if it's a 6, jump; if it's a 7...
+ ASL add a 0 bit
+ ASL add a 0 bit
017F* 2 E0 7C CPX #$7C 2nd bit: test if length < 6 samples *** *** CHANGE HERE THRESHOLD 5/6 *** ***
0181j 3 4C 87 01 JMP $0187 go to the last ROL: add C to the byte (5 samples = read '01', 6 samples = read '00')
0184 2 2A ROL A 1st bit: add C to the byte (here, C=1 for 3 or 4 samples)
0185* 2 E0 A9 CPX #$A9 2nd bit: test si length < 4 samples *** *** CHANGE HERE THRESHOLD 3/4 *** ***
0187 2 2A ROL A and directly add it to the byte (3 samples ='11', 4 samples = '10')
0188j 2/3 B0 EE BCS -18 If c=1, loop; if 0: A has been filled => end of loop
PLP
BCC+2
018A 2 49 FF EOR #$FF invert decoded bits, as they were decoded inverted (to save time)
018C 6 60 RTS
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
- remove the 7 bytes and see if it still works
- add the 6 bytes for decoding
- test
As this will take a while (if it ever works), I think I'll release a 1.1L version with the little bug corrections and multipart booster before.
Re: Novalight - very fast tape loading
Ok, I was optimistic: Zorgon only goes from 14.5 to 14.2 seconds.
That's a correct score, but far from what I was hoping.
Re: Novalight - very fast tape loading
You lost me waaaay back in this thread, but I wanted to say that you are doing an impressive and unbelievable work here... almost black magic.
Re: Novalight - very fast tape loading
Another idea for later, that would require an important redesign: a part of the "common area 0" could be re-used for a 2nd small bank system. It would free something like 30 bytes. Combined with an unused sinusoid (10 samples) this opens a door for an additional improvement (dictionary + RLE + ???).
Just got to find an interesting idea, that would be interesting on the remaining uncompressed bytes.
Just got to find an interesting idea, that would be interesting on the remaining uncompressed bytes.
Re: Novalight - very fast tape loading
To sum it up, the way to be faster needs optimizing 3 factors:
1- reduce the loader size: as it is loaded at normal (or F16) speed, adding new code makes it longer and could ruin the time this code could save. So the code needs to be compact
2- reduce the WAV file size: by puting as much information as possible in the shortest sinusoids combination. But no too fast for Oric!
3- reduce the decoding speed: if information goes fast in the WAV file, the data decoding and storing must be fast! Too slow: you will have to slower the WAV file file data rate; but if you are decoding fast and wait for the next bit of information... That means you can accelerate the WAV file rate, or implement a more complex decoding code.
1- reduce the loader size: as it is loaded at normal (or F16) speed, adding new code makes it longer and could ruin the time this code could save. So the code needs to be compact
2- reduce the WAV file size: by puting as much information as possible in the shortest sinusoids combination. But no too fast for Oric!
3- reduce the decoding speed: if information goes fast in the WAV file, the data decoding and storing must be fast! Too slow: you will have to slower the WAV file file data rate; but if you are decoding fast and wait for the next bit of information... That means you can accelerate the WAV file rate, or implement a more complex decoding code.
Re: Novalight - very fast tape loading
I assume you have checked already if any of the code of your loader happens to match sections of the ROM, in which case you could also do something like a generic "create" routine that copy snippets of code from ROM.
Re: Novalight - very fast tape loading
I checked, a bit quickly I admit... Since the "normal speed" kernel does very specific things in a small amount of bytes, there isn't much to match.
And Novalight working on both ROMs, the copied code should be present in both 1.0 and 1.1, and the copier should handle the different addresses. That makes it a longer code, so needing a longer sequence to match to be efficient :-/
And Novalight working on both ROMs, the copied code should be present in both 1.0 and 1.1, and the copier should handle the different addresses. That makes it a longer code, so needing a longer sequence to match to be efficient :-/
Re: Novalight - very fast tape loading
It does not work under WIN7 64bit.
Is there a solution to work on this system?
Thank you.
error Access Denied
Is there a solution to work on this system?
Thank you.
error Access Denied