Novalight - very fast tape loading

Symoon · Post by **Symoon** » Wed Feb 13, 2019 12:25 am

'Multipart booster' option is working in beta version

It saves around 0.32 second for each additional part after the 1st one.
Spotted another bug: 'old loader' option is ignored if combined with F16 speed. Will be corrected.

I'm not going to hurry as all this needs testing and double check before release.

Symoon · Post by **Symoon** » Fri Feb 15, 2019 4:56 pm

I wonder if I can't get rid of this part:

Code: Select all

0103	6	20 62 01	JSR $0162	Read a byte, waiting for value $24 to start
0106		C9 24		CMP #$24	is it $24? (no bit sync, the signal begins with several stop bits!)
0108		D0 F9		BNE -7		No, then loop waiting for $24 to start

I can't recall why I did this, maybe something to do with the loader / no loader option.
Getting rid of it would save 7 bytes that might be used to add a bit of complexity to the nomral bytes encoding, thus save a few fractions of seconds. There's another big IF: IF execution time of a normal byte decoding allows adding complexity! (each reading + decoding must be done within 69µs, as the shortest time between 2 sinusoids is 69µs !)

Symoon · Post by **Symoon** » Sat Feb 16, 2019 8:51 am

Still thinking what could be done it those 7 bytes were removed.

Moments where there is some time left:
- after a normal byte decoding, something like 20µs are available
- after the last repeated byte of a RLE sequence, "loads" of time, let's say 50µs impossible due to code structure, would corrupt the RLE compression (no time to add a test to branch to another part of code)
- after any kind of byte, if we know the next one is a dictionary byte or a RLE sequence, around 20µs

Penalty for adding bytes in the normal byte decoding: it would be loaded at normal speed at the beginning, while the 7 removed bytes were loaded at Novalight speed. So that means a slower loader (+0.03s at F16 speed), not sure it's worth it.
Processing

Symoon · Post by **Symoon** » Sat Feb 16, 2019 9:27 pm

Well, nothing I tried today managed to give better than little time saving in certain conditions, and making things a bit worse in other conditions.
I'm running out of ideas, it's time to give up for the moment

iss · Post by **iss** » Sat Feb 16, 2019 9:53 pm

I'm still not familiar with the Novalight code - it's complex and requires lot concentration.
But this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much

Symoon · Post by **Symoon** » Sat Feb 16, 2019 10:44 pm

iss wrote: ↑Sat Feb 16, 2019 9:53 pmI'm still not familiar with the Novalight code - it's complex and requires lot concentration.

Yes, that's one of the reasons it took me a while: after each break, it was taking me a while to recall everything.
In the end that's the kind of code that crashes if you move the slightest thing! And, I must say some of my comments are not always very clear

One puzzling thing it the reversed encoding that encodes normal bytes reverse-decoded and reversed again.
This afternoon I was having an idea with inverting bytes again but ended lost

iss wrote: ↑Sat Feb 16, 2019 9:53 pmBut this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much

Well I think it's here for historical reasons, my very starting point was the ROM loading code! I honestly think it's pointless now, just 3 samples gap and a starting bit would work I think. But you're right, these are Novalight encoded bytes and removing them saves something like... 0.003 seconds. Except to save room for the stack, and unless I found some significant new compression idea, they can remain here

Symoon · Post by **Symoon** » Sun Feb 17, 2019 9:17 am

I found something whose decoding could fit in 6 bytes/8µs: back to one of the starting ideas of Novalight which was bit compression.
The 1111 sequence is coded by 6 + 6 samples. The idea is to use 7 samples instead of 6+6.
By modifying the code like this (see lines with "+"), it should be fine:

Code: Select all

	Reading data (normal byte)
0176	2	A9 FE		LDA #$FE	Set accumulator bits to 11111110 => after ROL, will always give C=1 (luckily avoids SEC 
						 before BCS) except for the last which allows to exit => no need for a loop index.
0178	2/3	B0 FE		BCS -2		infinite loop (wait for interrupt with a 3 cycles precision; will stack PC and P)
017A*	2	E0 96		CPX #$96	Length <= 5 samples, so C=1, else C=0  *** *** CHANGE HERE THRESHOLD 4/5 *** ***
017Cj	2/3	B0 06		BCS +6		jump if 3 or 4 samples (C=1)
017E	2	2A		ROL A		1st bit: add C to A (here C=0 for 5 or 6 samples)
+				CPX 6/7 samples threshold
+				BCS+2	if it's a 6, jump; if it's a 7...
+				ASL		add a 0 bit
+				ASL		add a 0 bit
017F*	2	E0 7C		CPX #$7C	2nd bit: test if length < 6 samples *** *** CHANGE HERE THRESHOLD 5/6 *** ***
0181j	3	4C 87 01	JMP $0187	go to the last ROL: add C to the byte (5 samples = read '01', 6 samples = read '00')
0184	2	2A		ROL A		1st bit: add C to the byte (here, C=1 for 3 or 4 samples)
0185*	2	E0 A9		CPX #$A9	2nd bit: test si length < 4 samples *** *** CHANGE HERE THRESHOLD 3/4 *** ***
0187	2	2A		ROL A		and directly add it to the byte (3 samples ='11', 4 samples = '10')
0188j	2/3	B0 EE		BCS -18		If c=1, loop; if 0: A has been filled => end of loop
PLP
BCC+2
018A	2	49 FF		EOR #$FF	invert decoded bits, as they were decoded inverted (to save time)
018C	6	60		RTS

Estimated time saving for Zorgons Revenge: from 14.5s, it would end around 13.7s. Quite good for 6 bytes!
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
- remove the 7 bytes and see if it still works
- add the 6 bytes for decoding
- test

As this will take a while (if it ever works), I think I'll release a 1.1L version with the little bug corrections and multipart booster before.

Symoon · Post by **Symoon** » Sun Feb 17, 2019 3:05 pm

Symoon wrote: ↑Sun Feb 17, 2019 9:17 am Estimated time saving for Zorgons Revenge: from 14.5s, it would end around 13.7s. Quite good for 6 bytes!
Ok, just got to:
- modify the signal generation and confirm if it's a positive change

Ok, I was optimistic: Zorgon only goes from 14.5 to 14.2 seconds.
That's a correct score, but far from what I was hoping.

Chema · Post by **Chema** » Sun Feb 17, 2019 9:22 pm

You lost me waaaay back in this thread, but I wanted to say that you are doing an impressive and unbelievable work here... almost black magic.

Symoon · Post by **Symoon** » Sun Feb 17, 2019 9:29 pm

Chema wrote: ↑Sun Feb 17, 2019 9:22 pm almost black magic.

You know, it's just compressing bats!
Erm, I mean, bytes!

Symoon · Post by **Symoon** » Mon Feb 18, 2019 8:21 am

Another idea for later, that would require an important redesign: a part of the "common area 0" could be re-used for a 2nd small bank system. It would free something like 30 bytes. Combined with an unused sinusoid (10 samples) this opens a door for an additional improvement (dictionary + RLE + ???).
Just got to find an interesting idea, that would be interesting on the remaining uncompressed bytes.

Symoon · Post by **Symoon** » Tue Feb 19, 2019 1:21 pm

To sum it up, the way to be faster needs optimizing 3 factors:
1- reduce the loader size: as it is loaded at normal (or F16) speed, adding new code makes it longer and could ruin the time this code could save. So the code needs to be compact
2- reduce the WAV file size: by puting as much information as possible in the shortest sinusoids combination. But no too fast for Oric!
3- reduce the decoding speed: if information goes fast in the WAV file, the data decoding and storing must be fast! Too slow: you will have to slower the WAV file file data rate; but if you are decoding fast and wait for the next bit of information... That means you can accelerate the WAV file rate, or implement a more complex decoding code.

Post by **Dbug** » Tue Feb 19, 2019 1:24 pm

I assume you have checked already if any of the code of your loader happens to match sections of the ROM, in which case you could also do something like a generic "create" routine that copy snippets of code from ROM.

Symoon · Post by **Symoon** » Tue Feb 19, 2019 4:26 pm

I checked, a bit quickly I admit... Since the "normal speed" kernel does very specific things in a small amount of bytes, there isn't much to match.
And Novalight working on both ROMs, the copied code should be present in both 1.0 and 1.1, and the copier should handle the different addresses. That makes it a longer code, so needing a longer sequence to match to be efficient :-/

Jack_Free · Post by **Jack_Free** » Thu Feb 21, 2019 11:14 am

It does not work under WIN7 64bit.
Is there a solution to work on this system?
Thank you.

error Access Denied

forum.defence-force.org

Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading

Re: Novalight - very fast tape loading