Page 6 of 9

Re: Experimental very fast tape loading

Posted: Sun Apr 08, 2018 10:52 am
by Dbug
iss wrote:
Sun Apr 08, 2018 10:21 am
My proposal is modest but I think it makes sense - in short you don't need PHP/RTI in the interrupt routine.
Use only RTS instead - gain 3 cycles and 1 precious byte. :)
Nice one :)
Indeed, the only difference between RTI and RTS is that it restores the status register, which you don't need to do since you already pulled it out of the stack ! [EDIT: Actually, as shown on the post later, that does not work because the return address is shifted by one byte]

Re: Experimental very fast tape loading

Posted: Sun Apr 08, 2018 10:56 am
by Symoon
Hehe, I think I tried using RTS, but it crashed... IIRC the return address wasn't exactly right compared to RTI, it came back in the middle of an instruction?
Just tested using Euphoric, and it loads things, but not what was expected ;)
Actually if the loop was in 0162, the interrupt RTS goes back to 0163, right at the 2nd byte of the BCS command :cry:

Re: Experimental very fast tape loading

Posted: Sun Apr 08, 2018 11:08 am
by iss
Yep, Symoon you are right.
RTI retrieves the Processor Status Word (flags) and the Program Counter from the stack in that order (interrupts push the PC first and then the PSW).
Note that unlike RTS, the return address on the stack is the actual address rather than the address-1.

Re: Experimental very fast tape loading

Posted: Sun Apr 08, 2018 5:38 pm
by Symoon
Program now runs in page 1.
Currently re-writing the WAV generator, heavily borrowing from TAP2CD source code, again, because it was already based on it :lol:

Re: Experimental very fast tape loading

Posted: Sun Apr 08, 2018 8:11 pm
by NekoNoNiaow
iss wrote:
Sun Apr 08, 2018 11:08 am
Yep, Symoon you are right.
RTI retrieves the Processor Status Word (flags) and the Program Counter from the stack in that order (interrupts push the PC first and then the PSW).
Note that unlike RTS, the return address on the stack is the actual address rather than the address-1.
I did try that one too but fixing the stack address ended up costing more cycles than it saved.
Since the 6502 is little endian, one cannot just pop the last byte of the address, increment it then push it back because it is the MSB which is on the stack :(.

It is really a pain that RTI and RTS store the address differently because otherwise that would be a great solution indeed.
Symoon wrote:
Sun Apr 08, 2018 7:59 am
Interesting variants, I'll have to read again but I'm not sure it can apply at Novalight. I have done so many variants and changes yesterday that I'm a bit lost!
Huhu. I know that feeling.
The first variants are basically just my initial exploration, the last one "final version" is the most interesting in my opinion. It costs a few bytes of memory but saves 14 cycles and should integrate relatively easily.
If you do not have much time, only look at that one. ;)
Symoon wrote:
Sun Apr 08, 2018 7:59 am
I should have posted the whole code: as you will notice, the infinite loop is used at several places (5 times I think) and the result is managed in very different ways, according to the byte type to decode.
It is called between two and 5 times to decode a byte (and an undetermined amount of times for RLE ;) )

Here is the very latest code version.
I will try to give it a look but alas this week promises to be hectic for me so I cannot promise anything.

Re: Experimental very fast tape loading

Posted: Mon Apr 09, 2018 12:31 am
by Symoon
Don't spend too much time on more optimizations, I'm now trying to finish a 1st decent version ;)

Managed to convert a multipart program with the PC converter!
No switches yet, and converter crahsing with The Hellion that doesn't have standard headers... Might not be easy but I'd like to investigate and prevent that (TAP2CD doesn't crash with it, even if the WAV file is useless).

Re: Experimental very fast tape loading

Posted: Tue Apr 10, 2018 4:27 am
by NekoNoNiaow
Symoon wrote:
Mon Apr 09, 2018 12:31 am
Don't spend too much time on more optimizations, I'm now trying to finish a 1st decent version ;)
I will try to resist but my inner kitten is already bouncing up and down in anticipation of giving a look at the sources. ;)
Symoon wrote:
Mon Apr 09, 2018 12:31 am
Managed to convert a multipart program with the PC converter!
No switches yet, and converter crahsing with The Hellion that doesn't have standard headers... Might not be easy but I'd like to investigate and prevent that (TAP2CD doesn't crash with it, even if the WAV file is useless).
Nice job, multipart loaders must be quite tricky.

Re: Experimental very fast tape loading

Posted: Tue Apr 10, 2018 6:05 am
by Symoon
NekoNoNiaow wrote:
Tue Apr 10, 2018 4:27 am
Symoon wrote:
Mon Apr 09, 2018 12:31 am
Managed to convert a multipart program with the PC converter!
No switches yet, and converter crahsing with The Hellion that doesn't have standard headers... Might not be easy but I'd like to investigate and prevent that (TAP2CD doesn't crash with it, even if the WAV file is useless).
Nice job, multipart loaders must be quite tricky.
Thanks, but I deserve zero credit here: I'm directly copying/pasting Fabrice's or Chema's chode from TAP2CD!
I'm just slaughtering it a bit... Hence the bugs ;)
Now it's not crashing anymore, but produces an additional program of one byte :lol:
That's the effect of coding while falling asleep I guess ;)

EDIT: I wrote "salvaging", but meant "slaughtering"! I guess I should get some sleep indeed.

Re: Experimental very fast tape loading

Posted: Tue Apr 10, 2018 9:37 pm
by Symoon
Ok, the generator seems to be working :D
Time for me to compile for DOS 16 and 32 bits, clean the source code a bit, translate the comments, write the readme files and so on... But be warned, I think the C code will remain a bit messy, I'm too tired to go much further now ;)

Re: Experimental very fast tape loading

Posted: Fri Apr 13, 2018 12:46 pm
by Symoon
Hi there,
I've been trying to see how to make a ROM 1.0 version. Haven't cheked byte for byte so far (I find it quite messy trying to code for 2 different ROM), but I'm afraid there's too many routines that would require being re-coded, to fit in the page 1 space, even by removing a part of Novalight v1.1 (dictionaries).
Will try to simplify more (it might not display "Searching" and things like that)

Re: Experimental very fast tape loading

Posted: Sat Apr 14, 2018 11:49 am
by Symoon
Ok, even by suppressing the dictionaries, I can't make a slower version for both ROM1.0/1.1 (I'd need about 42 more bytes which is wayyyyy to much to even try optimizing).

So now I will try to make a (still slower) ROM 1.0 only option. User will have to choose for which ROM he generates the signal (default being 1.1).
And if possible, the program will display an error message if being loaded on the wrong ROM (probably using the standard "Errors found" and "File error/load aborted")

Re: Experimental very fast tape loading

Posted: Sun Apr 15, 2018 4:44 am
by NekoNoNiaow
Symoon wrote:
Sat Apr 14, 2018 11:49 am
Ok, even by suppressing the dictionaries, I can't make a slower version for both ROM1.0/1.1 (I'd need about 42 more bytes which is wayyyyy to much to even try optimizing).
Will try tomorrow (Sunday 15/04). :P
Symoon wrote:
Sat Apr 14, 2018 11:49 am
So now I will try to make a (still slower) ROM 1.0 only option. User will have to choose for which ROM he generates the signal (default being 1.1).
And if possible, the program will display an error message if being loaded on the wrong ROM (probably using the standard "Errors found" and "File error/load aborted")
I would really try avoiding supporting two versions, this adds quite a hassle for users and thus increases the friction for using your system and tools.
That would be a shame since your routine/system is really a fantastic addition.

At least wait until we have tried to make it smaller. ;)

Re: Experimental very fast tape loading

Posted: Sun Apr 15, 2018 9:29 am
by Symoon
You're right about the different versions (there would only be one versioin of WAV generator, just an option... But two versions of the decoder and thus WAV compatibility :? ).

I've just had an idea.
I would like to try to set up a kind of bank system.

Today the decoder code is dividied like this:
A/ Main program
1- Initialize VIA and set interrupt (a 1.0/1.1 version would require about 45 bytes)
2- Wait for start and load header and dictionaries (only "normal bytes" here, i.e. no compression)
3- Display "Loading" and name (3 bytes for ROM 1.1, about 19 required for 1.0, not counting things like JSR/RTS to call)
4- Load the program
5- Reset VIA, interrupt, launch the program (16 bytes for ROM 1.1, about 26 for ROM 1.0)

B/ Loading/Decoding routines (nothing specific ROM 1.0 or 1.1 here)
6- Synchronize
7- Read start byte and switch to adequate decoding method or end loading
8a- Decode byte in dictionary
8b- Decode a repeated byte (RLE)
8c- Decode a "normal" byte
9- Interrupt code

Well, it seems to me that I could first load a "common" (1.0/1.1) loader that would load the basic minimal code for fast loading "normal bytes", then load what it requires before executing it.
That would be:
A/ Main program
1- Call common area, filled with the common 1.0/1.1 intialize VIA and set interrupt version (about 45 bytes) should save 18 bytes compared to today's loader
2- Wait for start and load header, along with Loading display + name code for 1.0/1.1
3- Call common area, to display "Loading" and name could save 10 or 12 bytes
4a- Load the RLE and Dictionnaries code, crush the common area
4b- Load program at full speed with compression
4c- Load "Reset VIA, interrupt, launch program" code for 1.0/1.1 in common area, crushing dictionaries and RLE
5- Call common area to Reset VIA, interrupt, launch the program could save 13 or 15 bytes

B/ Loading/Decoding routines (nothing specific ROM 1.0 or 1.1 here)
6- Interrupt code
7- Synchronize
8- Read start byte and switch to adequate decoding method or end loading
9- Decode a "normal" byte
10- Common area RLE and dictionary decoding use 62 bytes

I think each loading of common area would require in the signal between 0.05 and 0.1 seconds, depends on its size. Which will be totally compensated by the fact that the initial (standard speed) loader will be shorter. We might even end up being faster.
So the WAV would sequentially hold, for each part:
1- common loader (F16 speed)
2- header
3- display loading code
4- RLE and dictionaries code
5- Program
6- Exit code

Seems doable, though I have no idea yet how to do these intermediate loadings (bank switching), so no release date ;)
The only drawback I see is that one will have to load the loader for every part... Today it is perfectly possible to load the loader only once, and then CALL#100 as much as you want for several parts in an adequate loader (saves 1.5 seconds for each part).
I think I will keep this as an option, that will be ROM 1.1 only (ok, I'm back at two versions, but most users don't use options anyway :lol: ).

Re: Experimental very fast tape loading

Posted: Fri Apr 20, 2018 11:57 am
by Symoon
Well Fabrice now has trouble loading programs. It could be that:
- I changed the thresholds a bit too much
- The F16 speed might not be liked by all machines, though it's fine on mine...

So I need to:
- be sure where problems lie
- do more tests with Fabrice's help

Don't hold your breath!

Re: Experimental very fast tape loading

Posted: Mon Apr 23, 2018 9:53 pm
by Symoon
Wohoooo, tonight Novalight loaded a basic program and Zorgons' Revenge on Euphoric, both with ROM 1.0 and ROM 1.1 :D

It now has to stand the test on real machines, as well as still having to check thresholds values and F16 reliability...
But at least the bank system code is working and Oric-1 users might be able to test Novalight at full speed, too ;)
(provided that Oric-1 computers have the same threshold values than Atmos)