Here's the readme: Novalight, very fast tape loader for Oric, 02/2019.
By Symoon, based on Fabrice's tools TAP2WAV and TAP2CD.
Version 1.1k
Novalight converts an Oric TAP file into a 44 kHz WAV file that should load very fast on most Orics, ROM 1.0 or ROM 1.1.
It must be used in MS-DOS, or Windows command line (see below).
Default version is compiled in 32 bits and should work on 32/64 bits Windows command line.
Can be compiled in 16-bits mode for old real MS-DOS, and I hope it could be compiled for Unix/Linux.
WARNING:
- requires a *perfect* WAV player (Audacity?).
- do not convert the WAV file, in any way (mp3 or whatever). Each and every sample of the WAV file is important.
In case of failure loading the file:
- keep cellphones away, switch off WiFi
- try different volume settings if loading fails, some Orics require high volume, others very low
- try rebooting your computer playing the WAV if loading fails - this happens with my PC when it's been running for a while!
- some Orics may not load this signal
Options: -s 'standard speed': use standard speed instead of F16 speed
for the loader (slower, but some Orics may not like F16?)
-o 'old loader': use the ROM 1.1-only loader instead of the
latest 1.0/1.1 loader. Old one has an advantage: it can be
loaded only once, then used with several CALL#100 to load
multiple parts, as long as it remains intact. But no ROM 1.0.
-n 'no loader': generates the Novalight file(s), without loader.
You should load an 'old Novalight' loader yourself first, then
use CALL#100 to load each of the file(s). ROM 1.1 only.
-p 'long pause': generates a 5 seconds silence between each part
of multipart programs (ERE Informatique programs, for instance,
require time to draw a loading screen).
FAST standard FAST F16 TAP2CD Novalight
Oric-1 yes yes no yes
Atmos yes yes yes yes
Emulator yes yes yes yes
Real tape yes no no no
Digital player yes 11kHz yes 44kHz yes 22kHz yes 44kHz
How does it work:
Novalight uses the TAP2CD bit encoding (2 bits per period: 00, 01, 10 and 11), but with shorter periods: 3, 4, 5 and 6 samples for 2 bits, while TAP2CD used the equivalent of 4, 6, 8 and 10 samples. Four periods are required to make a full byte.
Based on statistics made on about 1400 TAP files, showing that Oric files hold 60% of "0" and 40% of "1", Novalight encodes the zeroes on the shortest persiods ("00" is 3 samples long, while on TAP2CD it was "11").
A RLE compression has been added, repeated bytes will be encoded with the shortest possible period (3 samples, 4 samples for the last repeated one).
Two 7-bytes dictionaries are set, stored with the program header and filled with the most repeated bytes that have not been dealed by the RLE compression. Those bytes are also encoded in a shorter way.
Partiy check has been removed.
Stop bits have been merged in the start bit, its length determines the type of encoding that follows (normal byte, RLE compressed, or dictionary encoded)
Novalight's loaded has been minimised: using F16 speed (compatible with standard ROM), inverting bytes to save 10% of loading time (bits 1 coded as 0 and reversed). It loads first a minimal kernel able to load at Novalight speed without compression, then loads at fast speed small banks that do different things: VIA initialization, display, loading of banks, RLE and dictionaries decoding code, VIA restore and progam launch. This reduces the loader's loading time at about 0.6 seconds, plus the banks that follow (very, very short loading time)
Novalight is loaded in page 1, and leaves rather little room for the stack management (54 bytes). Beware if you ever call it in a sub-sub-program or in a loop.
While at it, Novalight also fixes the HIRES CLOAD bug on ROM 1.0: a HIRES file loaded on ROM 1.0 should not be damaged anymore by a black line once loaded (which was the result of the erasing of the displayed of "Loading.." while in HIRES).
Acknowledgements:
Novalight's name is a tribute to Twilighte, the great Oric coder, and Paul Woakes, co-founder of Novagen company, author of Mercenary games and Novaload fast tape loader for C64.
RIP guys, I had the greatest respect for your work.
Thanks to Fabrice for having written the Oric tapes conversion tools long ago, for TAP2CD and for his constant help, giving ideas and answering my questions about tapes, WAVs, interrupts and Oric coding.
Thanks to the Oric community, especially on forums Defence-Force (http://forum.defence-force.org) and Oric.org (http://forums.oric.org). For months, many people here spent time on various coding questions, tests, advice: FredV60, Musepat, Kenneth, Dom50, Voyageur, Oric1-Atmos, Froggy, Godzil, DrPsy, ISS, Chema, Dbug, NekoNoNiaow.
I would *never* have been able to program Novalight without this help.
Enjoy!
Re: Novalight - very fast tape loading
Posted: Sat Feb 09, 2019 9:38 pm
by iss
Absolutely incredible work! Congratulation, Symoon!
Thanks for all your effort in pushing CLOAD to the limits.
Releasing sources and the well documented details makes Novalight huge event in this small Oric world!
Re: Novalight - very fast tape loading
Posted: Sat Feb 09, 2019 10:04 pm
by Chema
Impressive work indeed! A great achievement.
I must find a way to make audio loads again on my Atmos, because I lost the laptop which was really reliable and couldn't find a way to make turbo loads work again
Re: Novalight - very fast tape loading
Posted: Sat Feb 09, 2019 10:23 pm
by Symoon
Thank you guys
@Chema I'm sure you'll find. I'm using an old eeepc for this, those tiny computers are still great for such a job.
And I bet ISS will add it to TapOric at some point anyway - well I'm sure he will, I'm just worried about possible 44/48kHz issues.
Re: Novalight - very fast tape loading
Posted: Sun Feb 10, 2019 1:28 pm
by iss
Great piece of software, Symoon, I really can't imagine how many time and effort you put in Novalight but it's worth! Chapeau bas!
The source compiles fine under Linux with GCC (just 2-3 harmless warning - no problem at all) and the tool works nice.
I spotted just one thing which seams like bug:
IMO the "+5" should be "+6", right?
Re: Novalight - very fast tape loading
Posted: Sun Feb 10, 2019 2:19 pm
by Symoon
Ooooh very nice spotting, I think you're right indeed.
Thanks!
It seems I get this bug since the very beginning. Strange, how can it have not crashed when I was running the code on ROM 1.0? does the op code $13 act like a NOP ?
I'm correcting right now and will upload the corrected verion before tonight (train travelling this afternoon)
Re: Novalight - very fast tape loading
Posted: Sun Feb 10, 2019 2:59 pm
by Symoon
iss wrote: ↑Sun Feb 10, 2019 1:28 pmI spotted just one thing which seams like bug:
NL-branch.png
IMO the "+5" should be "+6", right?
Thanks a lot ISS; it's corrected, tested on Oric-1, and uploaded on SourceForge!
About the time spent on it, I think something like 3 years, with large breaks due to real life. That allows ideas, thinking and so on, to optimize.
The only really boring thing was testing on real machines to find the right working WAV form, and find the right thresholds that would work on most machines.
iss wrote: ↑Sun Feb 10, 2019 1:28 pmThe source compiles fine under Linux with GCC (just 2-3 harmless warning - no problem at all) and the tool works nice.
That's great news!
Re: Novalight - very fast tape loading
Posted: Sun Feb 10, 2019 6:59 pm
by Symoon
Symoon wrote: ↑Sun Feb 10, 2019 2:19 pmIt seems I get this bug since the very beginning. Strange, how can it have not crashed when I was running the code on ROM 1.0? does the op code $13 act like a NOP ?
I'm puzzled. I've tried a program with $13 $60, called it and it hangs.
But I've loaded again an old test file of Acherons Rage on Oric-1: no problem, though the bug is supposed to be here.
Re: Novalight - very fast tape loading
Posted: Sun Feb 10, 2019 8:04 pm
by Symoon
Ok I think I got it. Unbelivable luck.
$13: undocumented opcode (2 bytes):
ASO (ab),Y ;13 ab This opcode ASLs the contents of a memory location and then ORs the result with the accumulator.
$63: undocumented opcode (2 bytes):
RRA (ab,X) ;63 ab RRA RORs the contents of a memory location and then ADCs the result with the accumulator.
So 13 20 65 E5 does this:
ASO (20),Y
RRA (E5,X)
instead of calling the "clear status line" code. Which is not visible since this cleaning cleans nothing (Loding from the original CLOAD command is already cleaned, and I removed the display of "Searching" in Novalight) to replace it by "Loading .. PROGRAM NAME".
In conclusion:
- there was a bug indeed
- unbelivable luck made it invisible and not crashing
- I think I could remove this "clear line status" part!
Re: Novalight - very fast tape loading
Posted: Mon Feb 11, 2019 1:05 am
by Symoon
By testing heavily again this evening, I found 2 other small problems.
Updated SourceForge.
1- F16 speed was disabled. It's now back so you should see the loader is now much faster to load!
2- I had reduced a pause between two parts of the loader, that worked fine with real machines and emulators, but just found that apparently, it caused problems with multipart programs in Euphoric. Tried with Ere Informatique's Karate, and the 5th or 6th part ended with a syntax error, while it didn't if I put back the original pause. So did I, even though it's a bit unclear why.
Re: Novalight - very fast tape loading
Posted: Tue Feb 12, 2019 12:17 am
by NekoNoNiaow
Good job! I am eager to have a look at the sources to see what magic you had to resort to.
Re: Novalight - very fast tape loading
Posted: Tue Feb 12, 2019 12:35 am
by Symoon
I'm just thinking at a possible small optimization for multipart programs.
Once fully loaded, the kernel and common area 0 ($100-$18C) are not destroyed, unless the loaded program decides to use page 1.
So maybe it only needs to be loaded once, then for next parts only loading a reduced version of Common Area 1 is required, without the "load Common area 0" part. And then the other Common areas must be loaded as usual.
This would save I guess around 0.3 or 0.4 second for each additional program. But would require a new option to choose between this or having the full loader every time, which would make the options more complex...
Worth it?
Re: Novalight - very fast tape loading
Posted: Tue Feb 12, 2019 1:13 am
by NekoNoNiaow
How long is the loader (in bytes)?
Could you store it in a darkened part of graphic memory and then wipe it entirely once the loading is done?
Re: Novalight - very fast tape loading
Posted: Tue Feb 12, 2019 7:19 am
by Symoon
NekoNoNiaow wrote: ↑Tue Feb 12, 2019 1:13 am
How long is the loader (in bytes)?
It's a bit complicated to answer, because it's using banks that are loaded as the WAV goes on and crush each other. And they are loading at different speeds: F16 first for a part of the kernel, then a reduced Novalight speed.
So the required room is 202 bytes, but the loader is longer since it swaps data inside this room (the total size is 368 bytes).
Take a look at the XLS file in the Novalight ZIP, it explains how it is built and loaded
NekoNoNiaow wrote: ↑Tue Feb 12, 2019 1:13 am
Could you store it in a darkened part of graphic memory and then wipe it entirely once the loading is done?
Page 1 (Fabrice's idea!) seems to be the less used area by programs.
I had thought at choosing an area according to the programs addresses, but it's not enough: once loaded programs can use any RAM area! And there is one or two jumps that are not relative, so relocating it dynamically could be done but with a bit of rework.
Re: Novalight - very fast tape loading
Posted: Wed Feb 13, 2019 12:25 am
by Symoon
'Multipart booster' option is working in beta version It saves around 0.32 second for each additional part after the 1st one.
Spotted another bug: 'old loader' option is ignored if combined with F16 speed. Will be corrected.
I'm not going to hurry as all this needs testing and double check before release.
0103 6 20 62 01 JSR $0162 Read a byte, waiting for value $24 to start
0106 C9 24 CMP #$24 is it $24? (no bit sync, the signal begins with several stop bits!)
0108 D0 F9 BNE -7 No, then loop waiting for $24 to start
I can't recall why I did this, maybe something to do with the loader / no loader option.
Getting rid of it would save 7 bytes that might be used to add a bit of complexity to the nomral bytes encoding, thus save a few fractions of seconds. There's another big IF: IF execution time of a normal byte decoding allows adding complexity! (each reading + decoding must be done within 69µs, as the shortest time between 2 sinusoids is 69µs !)
Re: Novalight - very fast tape loading
Posted: Sat Feb 16, 2019 8:51 am
by Symoon
Still thinking what could be done it those 7 bytes were removed.
Moments where there is some time left:
- after a normal byte decoding, something like 20µs are available
- after the last repeated byte of a RLE sequence, "loads" of time, let's say 50µs impossible due to code structure, would corrupt the RLE compression (no time to add a test to branch to another part of code)
- after any kind of byte, if we know the next one is a dictionary byte or a RLE sequence, around 20µs
Penalty for adding bytes in the normal byte decoding: it would be loaded at normal speed at the beginning, while the 7 removed bytes were loaded at Novalight speed. So that means a slower loader (+0.03s at F16 speed), not sure it's worth it.
Processing
Re: Novalight - very fast tape loading
Posted: Sat Feb 16, 2019 9:27 pm
by Symoon
Well, nothing I tried today managed to give better than little time saving in certain conditions, and making things a bit worse in other conditions.
I'm running out of ideas, it's time to give up for the moment
Re: Novalight - very fast tape loading
Posted: Sat Feb 16, 2019 9:53 pm
by iss
I'm still not familiar with the Novalight code - it's complex and requires lot concentration.
But this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much
Re: Novalight - very fast tape loading
Posted: Sat Feb 16, 2019 10:44 pm
by Symoon
iss wrote: ↑Sat Feb 16, 2019 9:53 pmI'm still not familiar with the Novalight code - it's complex and requires lot concentration.
Yes, that's one of the reasons it took me a while: after each break, it was taking me a while to recall everything.
In the end that's the kind of code that crashes if you move the slightest thing! And, I must say some of my comments are not always very clear
One puzzling thing it the reversed encoding that encodes normal bytes reverse-decoded and reversed again.
This afternoon I was having an idea with inverting bytes again but ended lost
iss wrote: ↑Sat Feb 16, 2019 9:53 pmBut this 7 bytes should be needed only if the next part starts with the standard header, right?
I'm not sure but maybe it makes sense to have a kind of synchronization between parts... and 7 bytes are not so much
Well I think it's here for historical reasons, my very starting point was the ROM loading code! I honestly think it's pointless now, just 3 samples gap and a starting bit would work I think. But you're right, these are Novalight encoded bytes and removing them saves something like... 0.003 seconds. Except to save room for the stack, and unless I found some significant new compression idea, they can remain here
Re: Novalight - very fast tape loading
Posted: Sun Feb 17, 2019 9:17 am
by Symoon
I found something whose decoding could fit in 6 bytes/8µs: back to one of the starting ideas of Novalight which was bit compression.
The 1111 sequence is coded by 6 + 6 samples. The idea is to use 7 samples instead of 6+6.
By modifying the code like this (see lines with "+"), it should be fine:
Reading data (normal byte)
0176 2 A9 FE LDA #$FE Set accumulator bits to 11111110 => after ROL, will always give C=1 (luckily avoids SEC
before BCS) except for the last which allows to exit => no need for a loop index.
0178 2/3 B0 FE BCS -2 infinite loop (wait for interrupt with a 3 cycles precision; will stack PC and P)
017A* 2 E0 96 CPX #$96 Length <= 5 samples, so C=1, else C=0 *** *** CHANGE HERE THRESHOLD 4/5 *** ***
017Cj 2/3 B0 06 BCS +6 jump if 3 or 4 samples (C=1)
017E 2 2A ROL A 1st bit: add C to A (here C=0 for 5 or 6 samples)
+ CPX 6/7 samples threshold
+ BCS+2 if it's a 6, jump; if it's a 7...
+ ASL add a 0 bit
+ ASL add a 0 bit
017F* 2 E0 7C CPX #$7C 2nd bit: test if length < 6 samples *** *** CHANGE HERE THRESHOLD 5/6 *** ***
0181j 3 4C 87 01 JMP $0187 go to the last ROL: add C to the byte (5 samples = read '01', 6 samples = read '00')
0184 2 2A ROL A 1st bit: add C to the byte (here, C=1 for 3 or 4 samples)
0185* 2 E0 A9 CPX #$A9 2nd bit: test si length < 4 samples *** *** CHANGE HERE THRESHOLD 3/4 *** ***
0187 2 2A ROL A and directly add it to the byte (3 samples ='11', 4 samples = '10')
0188j 2/3 B0 EE BCS -18 If c=1, loop; if 0: A has been filled => end of loop
PLP
BCC+2
018A 2 49 FF EOR #$FF invert decoded bits, as they were decoded inverted (to save time)
018C 6 60 RTS
Estimated time saving for Zorgons Revenge: from 14.5s, it would end around 13.7s. Quite good for 6 bytes!
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
- remove the 7 bytes and see if it still works
- add the 6 bytes for decoding
- test
As this will take a while (if it ever works), I think I'll release a 1.1L version with the little bug corrections and multipart booster before.
Re: Novalight - very fast tape loading
Posted: Sun Feb 17, 2019 3:05 pm
by Symoon
Symoon wrote: ↑Sun Feb 17, 2019 9:17 am
Estimated time saving for Zorgons Revenge: from 14.5s, it would end around 13.7s. Quite good for 6 bytes!
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
Ok, I was optimistic: Zorgon only goes from 14.5 to 14.2 seconds.
That's a correct score, but far from what I was hoping.
Re: Novalight - very fast tape loading
Posted: Sun Feb 17, 2019 9:22 pm
by Chema
You lost me waaaay back in this thread, but I wanted to say that you are doing an impressive and unbelievable work here... almost black magic.
Re: Novalight - very fast tape loading
Posted: Sun Feb 17, 2019 9:29 pm
by Symoon
Chema wrote: ↑Sun Feb 17, 2019 9:22 pm
almost black magic.
You know, it's just compressing bats!
Erm, I mean, bytes!
Re: Novalight - very fast tape loading
Posted: Mon Feb 18, 2019 8:21 am
by Symoon
Another idea for later, that would require an important redesign: a part of the "common area 0" could be re-used for a 2nd small bank system. It would free something like 30 bytes. Combined with an unused sinusoid (10 samples) this opens a door for an additional improvement (dictionary + RLE + ???).
Just got to find an interesting idea, that would be interesting on the remaining uncompressed bytes.
Re: Novalight - very fast tape loading
Posted: Tue Feb 19, 2019 1:21 pm
by Symoon
To sum it up, the way to be faster needs optimizing 3 factors:
1- reduce the loader size: as it is loaded at normal (or F16) speed, adding new code makes it longer and could ruin the time this code could save. So the code needs to be compact
2- reduce the WAV file size: by puting as much information as possible in the shortest sinusoids combination. But no too fast for Oric!
3- reduce the decoding speed: if information goes fast in the WAV file, the data decoding and storing must be fast! Too slow: you will have to slower the WAV file file data rate; but if you are decoding fast and wait for the next bit of information... That means you can accelerate the WAV file rate, or implement a more complex decoding code.
Re: Novalight - very fast tape loading
Posted: Tue Feb 19, 2019 1:24 pm
by Dbug
I assume you have checked already if any of the code of your loader happens to match sections of the ROM, in which case you could also do something like a generic "create" routine that copy snippets of code from ROM.
Re: Novalight - very fast tape loading
Posted: Tue Feb 19, 2019 4:26 pm
by Symoon
I checked, a bit quickly I admit... Since the "normal speed" kernel does very specific things in a small amount of bytes, there isn't much to match.
And Novalight working on both ROMs, the copied code should be present in both 1.0 and 1.1, and the copier should handle the different addresses. That makes it a longer code, so needing a longer sequence to match to be efficient :-/
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 11:14 am
by Jack_Free
It does not work under WIN7 64bit.
Is there a solution to work on this system?
Thank you.
error Access Denied
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 12:15 pm
by Dbug
Jack_Free wrote: ↑Thu Feb 21, 2019 11:14 am
It does not work under WIN7 64bit.
Is there a solution to work on this system?
Thank you.
error Access Denied
Which version and command line are you using?
When I run novalight_1.1k.exe from the command line I get the list of parameters, so that tells me that at least the compiler used does work fine on Windows 7 64.
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 1:17 pm
by Symoon
IIRC, it's compiled in 32 bits (I'm not a professional of compilers so forgive the potential awkward way to explain).
32 bits is supposed to work on 32/64 bits Windows command lines, but not on real old MS-DOS.
Compiling in 16 bits would allow old MS-DOS + Win 32 compatibility, but not Win 64.
So for instance, I suppose things like DosBox must be avoided.
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 3:39 pm
by Jack_Free
So I apologize, my mistake, I downloaded another program that is on the top page.
Download Latest Version
wrtdsk23.exe (56.8 kB)
Get Updates
That made me nervous, novalight_1.1k.exe works, of course.
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 3:58 pm
by Symoon
No need to apologize, good news
Re: Novalight - very fast tape loading
Posted: Thu Feb 21, 2019 10:28 pm
by Symoon
Symoon wrote: ↑Sun Feb 17, 2019 9:17 am
I found something whose decoding could fit in 6 bytes/8µs: back to one of the starting ideas of Novalight which was bit compression.
The 1111 sequence is coded by 6 + 6 samples. The idea is to use 7 samples instead of 6+6.
Ok, just got to:
- modify the signal generation and confirm if it's a positive change
- remove the 7 bytes and see if it still works
- add the 6 bytes for decoding
- test
Well, 1st working test tonight, success on emulators and real Atmos.
Still some work ("old loader" needs to be adapted) and tests (by changing the few bytes in the code, I made about 7 mistakes...)
The good news is that it does improve the loadings of big file by about 0.3s, and that the added bytes in the loader have no negative impact on small files, which is good since, on small files, making the loader a little bigger is sometimes not compensated by the little time saved on the short program to load. For instance my usual HIRES screen is loaded at the exact same speed than before (ok, actually 0.001s faster ).
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 1:20 pm
by Symoon
Are there any mathematicians aroud? I think my dictionary in Novalight can be optimized.
What I'm doing now (the RLE compression has already been performed and the concerned bytes are ignored):
1- calculate the time taken by each uncompressed byte in the Novalight signal, which is: occurencies*encoding time. For instance, if "Z" lasts 19 samples, and is present 100 times in the TAP file, its total time is 19*100 = 1900
2- the dictionary has 14 entries, so I take the 14 highest "total times" calculated. This way I'm removing a maximum of time from the uncompressed signal.
3- in the dictionary, each entry will last a different time (between 9 and 16 samples), so I'm optimizing the dictionary assignation: among the 14 bytes, the bytes with the highest occurencies, will be assigned to the shortest new encoding time. This way, I'm minimizing the new time I re-insert in the signal.
But I realize this way to go can be false!
For instance: in a TAP file, a byte "A" that lasts 22 samples is present 4 times, and another one "B" that lasts 12 samples is present 8 times.
So we get, uncompressed:
Imagine there's room in the dictionary for ONE of the two, which will last 4 samples.
If I pick up "B", as it is the longest time in the signal (removing 96 samples insted of 88 if I had taken "A"), I get this in the final signal:
So I save time with my method but it may not be optimal... But I really can't figure how to calculate the best result. What is the best way to do? Is there a magic formula, with 256 possibles bytes combined in 14 possible bytes of a dictionary?
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 4:26 pm
by Symoon
To sum it up, the best choice is the one where (removed_length - added_length) is maxmized. My main problem is that I don't know the added length since it depends on the choice of what I choose to remove.
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 5:34 pm
by Chema
I am quite sure I made something similar in my text compression routine. I should check the sources of my compressor in the space 1999 folder.
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 5:40 pm
by Symoon
That would be cool
I tried to work with average values and there is no real difference. Maybe I'm just wasting time here.
But the problem is interesting and my statistics/mathematics lessons are way too far away for me to find the right formula.
Symoon wrote: ↑Sat Feb 23, 2019 4:26 pm
To sum it up, the best choice is the one where (removed_length - added_length) is maxmized. My main problem is that I don't know the added length since it depends on the choice of what I choose to remove.
Actually it would rather be: the choice I have to do depends on the whole, but the whole depends on the choice I make... Argh. Chosing to put a byte in a dictionary prevents other bytes to go there, which could have had a better positive effet. How to be sure ? Calculating 256*256*256... (14 times) combinations to sort the best won't be possible
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 6:49 pm
by Symoon
Using an average value for the dictionary simplifies the problem, and gives a good-if-not-perfect solution.
It alows to set for good a variable parameter, and easily calculate the real benefit (what I remove - what I add)
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 8:07 pm
by Chema
Mmmm no, my routine goes on a similar path, but can work only with selecting the pair of letters which repeats most. In fact it is a bit more complex, as can code combinations of several letters if they repeat often enough with just one token. But your problem is the calculation of the savings, so won't be of any help.
Decompression is very fast but makes use of the stack... Nothing that could really help you, I'm afraid.
Re: Novalight - very fast tape loading
Posted: Sat Feb 23, 2019 8:24 pm
by Symoon
Well, thanks for having looked
I suspect anyway that this might not bring drastic changes :-/ I just noticed it by spotting a small difference after a few modifications.
So it's corrected with an average value to calculate the real potential benefit of the dictionary, hence the choice of bytes to use for it.
Another interesting thing could be to try and mix Novalight with Filepack. Just talked about it with Dbug a bit, as I'm a newbie in Filepack use.
Re: Novalight - very fast tape loading
Posted: Mon Feb 25, 2019 10:27 pm
by Symoon
Remember the Atmos had a 1st ROM version that was a bit bugged with tape loading? (IIRC Chema, you got this version?).
Oric released a small program (called "ALC" on the demo tape Welcome to Oric Atmos) deisgned to fix this, and loading 1st before many Atmos programs.
I just realised that this little program loading will probably crash with Novalight if you have this ROM version or an Oric-1.
So, as Novalight doesn't use the buggy part of the Atmos ROM, I will probably change the generator so it recognizes ths ALC program and... Trash it!
Re: Novalight - very fast tape loading
Posted: Mon Feb 25, 2019 11:36 pm
by Chema
Well, in fact what I have is the first Atmos ROM version, which does the right thing, and it is not bugged It checks for errors and stops loading if it finds them. What the Oric people did, after realizing that there were loading errors still, was to remove the error checking altogether in the second version 1.1b!
That is what the small program does... sets the IRQ vector to point to a small routine that clears the error flag... good job guys!
We found all that after disassembling both ROMs when testing Fabrice's tap2cd version for SkoolDaze.
Re: Novalight - very fast tape loading
Posted: Mon Feb 25, 2019 11:51 pm
by Symoon
Yes, sorry I was writing a bit fast by saying bugged (though it could be debated if it could be considered as a bug, not a technical one, but a design one )
I need to study further my solution. Removing this program is not so OK, as some editors (such as Loriciels) used a modified version of this patch to insert a little protection. So this is also a problem for Novalight, as it crashes, too.
Well, I think I made a design flaw by trying to be smart and load the 14 bytes dictionary with the header (in 2B2-2BF).
I have room to store it on top of the program, which would then occupy $100-$1D6 (instead of $100-$1C8 right now)... Really doesn't leave much room for the stack!
I could try and load it elsewhere but then need to find room to code the loading, and it would probably hurt the "multipart booster" option.
S**t!
Re: Novalight - very fast tape loading
Posted: Tue Feb 26, 2019 11:18 pm
by Symoon
I think I found a way to save room.
This will be at the price of a little additional fast-speed loading time and more bank swapping (sorry ISS, the structure will be more complex :p).
But this should remove the dictionary from page 2, thus making Novalight compatible with ALC and Loriciels programs.
So I'm changing my plans: next release will be v1.2 with these heavy structure changes, the multipart booster, and probably no more "old loader / no loader" option.
Re: Novalight - very fast tape loading
Posted: Fri Mar 01, 2019 4:56 am
by NekoNoNiaow
Symoon wrote: ↑Sat Feb 23, 2019 1:20 pm
Are there any mathematicians aroud? I think my dictionary in Novalight can be optimized.
I think this is less about maths and more about logic.
Symoon wrote: ↑Sat Feb 23, 2019 1:20 pm
What I'm doing now (the RLE compression has already been performed and the concerned bytes are ignored):
1- calculate the time taken by each uncompressed byte in the Novalight signal, which is: occurencies*encoding time. For instance, if "Z" lasts 19 samples, and is present 100 times in the TAP file, its total time is 19*100 = 1900
2- the dictionary has 14 entries, so I take the 14 highest "total times" calculated. This way I'm removing a maximum of time from the uncompressed signal.
3- in the dictionary, each entry will last a different time (between 9 and 16 samples), so I'm optimizing the dictionary assignation: among the 14 bytes, the bytes with the highest occurencies, will be assigned to the shortest new encoding time. This way, I'm minimizing the new time I re-insert in the signal.
Assigning variable length to samples as a function of their frequency of apparition is a well known technique and is called "entropy encoding" (cf https://en.wikipedia.org/wiki/Entropy_encoding).
Symoon wrote: ↑Sat Feb 23, 2019 1:20 pmFor instance: in a TAP file, a byte "A" that lasts 22 samples is present 4 times, and another one "B" that lasts 12 samples is present 8 times.
Which is better!
So I save time with my method but it may not be optimal... But I really can't figure how to calculate the best result. What is the best way to do? Is there a magic formula, with 256 possibles bytes combined in 14 possible bytes of a dictionary?
It is not a question of formula, you cannot know in advance how much space you save unless you actually compute it.
What should be done is compute for each candidate, NOT how much space it *currently* takes, but how much space it *will* remove.
With this, it is fairly clear that A is the better candidate since it gains you more bytes than B will.
What you are currently doing is assuming that the byte that currently takes the most space will get you better gains but as you proved yourself, this is false, what matters is to take the byte that will shrink the most.
After an few hours of headache, I had finally realised that it was logical to use the same "unit" to evaluate what I remove (frequency*time) and what I add (I was only considering frequency!).
A problem remained: claculating the real gain for a 14-bytes dictionary and 256 possible bytes values (thus encoding time) depends on the chosen bytes, which choice depends itself on the new length, which depends itself on the position I assign in the dictionary to the chosen bytes (each position having its own time length).
So if I'm not mistaken, this simply meant finding "the best dictionary" which meant claculating something like 256 to the power of 14 combinations (a 35 digits number!).
I simplified it by ignoring the dictionary position, and using an average "new encoding time" (13 samples for a byte) instead of finding the exact one (between 9 and 16 according to its position)
So in the end in the code, I'm simply checking the program to load, and each time I find a byte, I'm updating its gain (in a table) by adding (its length in samples - 13).
I could see that just adding this "-13" lead to one different choice for Zorgons Revenge and saved a little loading time.
I don't think it's worth trying to be 100% exact with the new length (I have no idea how anyway) as it would bring a crazy complexity to save a tiny fraction of seconds (something like 0.00x)
Re: Novalight - very fast tape loading
Posted: Sat Mar 02, 2019 6:13 pm
by NekoNoNiaow
You do not need to consider all possible combinations!
Just sorting each candidate with the method I described and putting them into the dictionary in order is sufficient to obtain an optimal choice. Any other method is mathematically guaranteed to result in a bigger size.
Think about it this way:
The best choice for the first selection in the dictionary is the byte with the biggest gain, right?
This much is obvious. So the first choice is a no brainer.
Now, what is the best choice for the second position in the dictionary?
Well, this is the exact same problem as for the first choice.
The only thing that changed is the encoded length of that byte.
So, just redo the gain computation for all remaining candidates and select the best one.
Third one?
Same problem again, with a new encoded length, so same method.
In the end, if you have n bytes to put in the dictionary, then you only need to apply that method n times, no need to test every combination since you have a method that gives you the best choice at every step.
This is exactly how entropy coding works, if you sort bytes by their frequency * size in decreasing order, and give the corresponding place in the dictionary in that same order (with encoded length increasing progressively in the dictionary) then you have the most optimal encoding possible.
I could explain this more formally with mathematical induction if needed but I am typing on my iPad so that is a bit of a pain right now.
In any case: ditch the average computation, it does not work, just loop on your list of candidates and apply the size*frequency sort at each step and you will have your best possible combination almost instantly. (O(n^2*log(n)) to be precise .)
Re: Novalight - very fast tape loading
Posted: Sat Mar 02, 2019 9:37 pm
by Symoon
NekoNoNiaow wrote: ↑Sat Mar 02, 2019 6:13 pm
Think about it this way:
The best choice for the first selection in the dictionary is the byte with the biggest gain, right?
This much is obvious. So the first choice is a no brainer.
You're right if you consider the gain as the time removed less the time added with the dictionary encoding. But you don't know this dictionary time yet since you don't know the position of your byte in the dictionary... So you can't know the exact gain at this first step, so you can't know if it's worth being selected for the dictionary
In the end, if you have n bytes to put in the dictionary, then you only need to apply that method n times, no need to test every combination since you have a method that gives you the best choice at every step.
Yep but I just don't know how to calculate the 1st step; I may be misunderstanding you, but for me finding the real optimal choice forbids splitting the problem into different steps!
That's what I did 1st, and it showed I was wrong.
This is exactly how entropy coding works, if you sort bytes by their frequency * size in decreasing order, and give the corresponding place in the dictionary in that same order (with encoded length increasing progressively in the dictionary) then you have the most optimal encoding possible.
Potentially wrong :p
Imagine you have A, lasting 10 and repeated 1000 times. That makes 10000. You'll consider it as a better choice than B lasting 90 and repeated 110 times (which is 9900).
So you select A. Imagine it's the 14th byte you selected (14th in the final decreasing order). So you drop B, no more room in the dictionary.
And then you have the new value for A, lasting 6. So you removed 10000, and now add 6*1000 = 6000. Gain = 10000-6000 = 4000. What if you had chosen B ? Removing 9900, and adding 6*110 (660) = a gain of 9240 instead of 4000. That's much better, because your 1st step didn't take into consideration the real potential gain, but just what you removed, which is not always optimal.
In any case: ditch the average computation, it does not work
It did give a better result with Zorgons Revenge by selecting a byte instead of another
To be honest I thought about this problem for about 2 or 3 days and changed my mind almost every hour I also found myself silly when I typed this "-13" in the code, but it did give a shorter loading time
NekoNoNiaow wrote: ↑Sat Mar 02, 2019 6:13 pm
Think about it this way:
The best choice for the first selection in the dictionary is the byte with the biggest gain, right?
This much is obvious. So the first choice is a no brainer.
You're right if you consider the gain as the time removed less the time added with the dictionary encoding. But you don't know this dictionary time yet since you don't know the position of your byte in the dictionary... So you can't know the exact gain at this first step, so you can't know if it's worth being selected for the dictionary
I think you missed the fact that this method must be used to fill the entire dictionary.
For every empty entry in the dictionary (the first being the one with the smallest encoding, the last being the one with the biggest encoding, and the size increasing constantly), you apply the selection method:
1 - compute the gain at this stage for all byte candidates using:
2 - put the candidate with the best gain in the first entry (with the smallest encoded length) of the dictionary
3 - remove the chosen candidate from your candidate list and go back to step 1 until there are no available entries in the dictionary
If your dictionary encodes bytes with gradually increasing sizes, then choosing the best gain at each step guarantees to choose the best gain overall. That is a mathematical consequence of that ordering and it can be formally proven by induction or even visually.
In the end, if you have n bytes to put in the dictionary, then you only need to apply that method n times, no need to test every combination since you have a method that gives you the best choice at every step.
Yep but I just don't know how to calculate the 1st step; I may be misunderstanding you, but for me finding the real optimal choice forbids splitting the problem into different steps!
That's what I did 1st, and it showed I was wrong.
This is not what I suggested.
What I suggested implies to redo the best candidate computation for every empty dictionary entry.
As I explained, you were not maximizing the gain at every step, you were maximizing using the size currently taken by each candidate. But as you proved yourself, that does not work, you have to maximize by the best size gain obtainable with the dictionary entry with the smallest encoding length.
This is exactly how entropy coding works, if you sort bytes by their frequency * size in decreasing order, and give the corresponding place in the dictionary in that same order (with encoded length increasing progressively in the dictionary) then you have the most optimal encoding possible.
Potentially wrong :p
This is in every book on compression, I doubt they are wrong.
Symoon wrote: ↑Sat Mar 02, 2019 9:37 pm
Imagine you have A, lasting 10 and repeated 1000 times. That makes 10000. You'll consider it as a better choice than B lasting 90 and repeated 110 times (which is 9900).
So you select A. Imagine it's the 14th byte you selected (14th in the final decreasing order). So you drop B, no more room in the dictionary.
And then you have the new value for A, lasting 6. So you removed 10000, and now add 6*1000 = 6000. Gain = 10000-6000 = 4000. What if you had chosen B ? Removing 9900, and adding 6*110 (660) = a gain of 9240 instead of 4000. That's much better, because your 1st step didn't take into consideration the real potential gain, but just what you removed, which is not always optimal.
You just proved that you did not read my method. Because it dictates to precisely to choose B for these exact reasons.
These are the gains to use to select the best candidate!
That means you select B for the current entry.
Then for the remaining candidates, you redo the same computations for all of them, using the encoding length of the next dictionary entry.
Go back to the 1-2-3 steps above I listed and that is exactly what should be done.
The great part of this method, is that If you do that for selecting every entry in the dictionary (in increasing encoding length), then this guarantees to choose the optimal candidates on the whole.
As I explained, entropy coding is a case where maximizing the result at every step maximizes the result globally.
In any case: ditch the average computation, it does not work
It did give a better result with Zorgons Revenge by selecting a byte instead of another
To be honest I thought about this problem for about 2 or 3 days and changed my mind almost every hour I also found myself silly when I typed this "-13" in the code, but it did give a shorter loading time
This is by pure chance, your first method was sub optimal and you just happened to have a case where this new method works better but I guarantee you that in most cases my method will work even better. Since it is optimal for this particular encoding method, there is no way you can get any better than it.
It looks to me that you did not understand fully what I was suggesting since the example you gave above is exactly what I suggest to not do.
Re: Novalight - very fast tape loading
Posted: Sun Mar 03, 2019 12:12 am
by Symoon
NekoNoNiaow wrote: ↑Sat Mar 02, 2019 11:39 pm
For every empty entry in the dictionary
Aaaah, this is the key that I missed! I kept willing to have the right order before assigning to the dictionary... Sorry it took so long to understand, I feel stupid
It all makes sense now, thanks a lot
I'll give it a try
NekoNoNiaow wrote: ↑Sat Mar 02, 2019 11:39 pm
For every empty entry in the dictionary
Aaaah, this is the key that I missed! I kept willing to have the right order before assigning to the dictionary... Sorry it took so long to understand, I feel stupid
It all makes sense now, thanks a lot
I'll give it a try
Ahem, the fault does not lie with you, the reason is probably that I did not explain myself very well the first time.
However, I am very ashamed to admit that unfortunately I was wrong in saying this is optimal.
After writing my last post, I decided to illustrate visually how that method was optimal and in the process I actually demonstrated to myself that this was incorrect.
An important point though, is that the method, although not optimal, is probably overall close to optimal as the data below will show. A combinatorial approach would work better (although it could be prohibitively expensive) but maybe an intermediate approach between the two would work well enough. (I will explain that more in detail below.)
Now to the test data: I generated some random stats corresponding to a file to compress, selecting a few compression candidates and decided some arbitrary encoding values for the dictionary entries and graphed visually the gain obtained by each candidate for each entry.
Here is the data itself, all gain values were computed using the formula given in previous posts:
Since the gain numbers are difficult to interpret, I graphed the gains for each candidate for every entry of the dictionary.
Each line in the graph below correspond to the gains of a given candidate and logically, they all decrease as the entries in the dictionary increase in encoded size.
As you can see, the data proves me absolutely wrong : for several dictionary entries, there are cases where inverting the choices given by my method is the most optimal choice.
You will notice however that in most cases, there is not much difference between the choice suggested by my method and the inverted choice so the method globally is not too bad, but it is certainly not optimal.
This said, most of the lines of the graph decrease almost in parallel which generally means that my method kinda works well, however it is when they cross or are very close to one another that my method makes the wrong choice.
What to do then?
What I would recommend would be to mix my method with just one level of combinatorial exploration. The data seems to suggest that inversions usually concern only crossing or very close lines and these do not seem to involve more than two points (at least in my sample) so that may be enough.
Obviously more real-life testing would be needed to compare that with other methods you experimented with so far.
Here is what the algorithm could look like:
1 - For each candidate, compute their gain for the current empty dictionary entry, write down the candidate with the biggest gain and the candidate with the second best gain (respectively called entry1_c1 and entry1_c2).
2 - For each candidate, compute their gains for the next empty dictionary entry, once again write down the gains of the best two candidates (respectively entry2_c1 and entry2_c2).
3 - Compute which of the choices is the best by computing the gain over the two entries:
4 - If normal_gain_over_two_entries > inverted_gain_over_two_entries, then simply select the best candidate in this round, otherwise, select the second best candidate instead.
5 - repeat for the next dictionary entry.
Here you go.
I must apologize again for insisting that my method was optimal when the data proves me completely wrong. I am not sure why I (wrongly) remembered that this greedy algorithm (Wikipedia) was optimal but clearly this memory of mine was completely false.
So, sorry for that again, I will be more careful from now on.
Re: Novalight - very fast tape loading
Posted: Sun Mar 03, 2019 7:32 am
by Symoon
Nooooooo
Ha ha, this problem drives me crazy as I'm sure its solution is not so hard, but each time we find something, it proves being a bit more complicated :p
Edit: I didn't see how the graph proved you wrong, but OK I see now.
That's what my brain tried painfully to show me last week and ended up telling me, in a simple way, before I collapsed: all combinations should be calculated as every gain depends on all the other choices one could make.
Re: Novalight - very fast tape loading
Posted: Sun Mar 03, 2019 8:02 am
by Symoon
NekoNoNiaow wrote: ↑Sun Mar 03, 2019 3:46 am
I must apologize again for insisting that my method was optimal when the data proves me completely wrong.
Please don't, this problem is evil (and I also tend to read a bit fast too as I work on short-spare-times).
BTW, I also realize maybe I should have given more details of the real values with Novalight:
- each "normal" byte is encoded between 17 and 29 samples (actually 5+4*[3;6])
- each entry of the dictionary is encoded between 9 and 16 samples (actually [6;7]+[3;9])
So dictionary lengths are, from 1st to 14th position: 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16.
This way:
- we are sure any byte can be selected in the dictionary and be shorter
- maybe calculating with a two steps depth can be neglected as the gain would really be minimal with such dictionary lengths, which makes your 1st solution really good enough? (that's probably also why using an average seem to give good results, which would not with much more difference between the 1st and the last)
NekoNoNiaow wrote: ↑Sun Mar 03, 2019 3:46 am
I must apologize again for insisting that my method was optimal when the data proves me completely wrong.
Please don't, this problem is evil (and I also tend to read a bit fast too as I work on short-spare-times).
Evil, hence absolutely fascinating.
Symoon wrote: ↑Sun Mar 03, 2019 8:02 am
BTW, I also realize maybe I should have given more details of the real values with Novalight:
- each "normal" byte is encoded between 17 and 29 samples (actually 5+4*[3;6])
- each entry of the dictionary is encoded between 9 and 16 samples (actually [6;7]+[3;9])
So dictionary lengths are, from 1st to 14th position: 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16.
This way:
- we are sure any byte can be selected in the dictionary and be shorter
- maybe calculating with a two steps depth can be neglected as the gain would really be minimal with such dictionary lengths, which makes your 1st solution really good enough? (that's probably also why using an average seem to give good results, which would not with much more difference between the 1st and the last)
Interesting, thanks for these details.
May I ask what determine these sizes? Both the non encoded byte sizes and the sizes of dictionary encoded bytes.
And I guess this question also implies to delve into the details of how you select which bits to use to form all these "normal" and "dictionary" bytes.
I assume that your goal when deciding which bits to emit for each byte is to guarantee that no sequence of bits can ever be mistaken for another?
So the reason these "normal" bytes have the sizes you list above are that you have somehow determined that this condition is always verified?
Re: Novalight - very fast tape loading
Posted: Mon Mar 04, 2019 6:53 am
by Symoon
NekoNoNiaow wrote: ↑Mon Mar 04, 2019 12:35 amMay I ask what determine these sizes? Both the non encoded byte sizes and the sizes of dictionary encoded bytes.
Sure
With Novalight, a byte begins by a sinusoid coding a 'stop/start' bit, which is there to give time to the Oric to work on building and storing the previous byte once all its bits have been read. This 'stop/start' bit must last at least 5 samples (115µs) to let the Oric work.
But to save time, I thought this could also be a 'switch bit': if it lasts 5 samples, what follows is a 'normal byte', 6 or 7 then it's a dictionary byte, 8 means a RLE repeated byte, and 9 means 'end of program'. This way, the time lost to let the Oric work becomes useful.
Then follow other sinusoids, the whole is encoded like this:
Coding format in WAV signal:
Sinusoid 1 length in samples Sinusoid 2 (and if required, 3, 4 and 5) length in samples
3 Stop
4 (unused)
5 Start normal byte 3 00 \
(Oric needs at least 4 01 |__ Four times to code a byte
5 samples time to work) 5 10 |
6 11 /
7 1111 -- replaces 6+6 samples for 1111 bits
6 Start byte from dictionary 1 [3;9]: byte position in dictionary
7 Start byte from dictionary 2 [3;9]: byte position in dictionary
8 Start repeat previous byte RLE; for N repeats: (N-1)*3 + 4 (4 being for the last repetition)
9 End of program
Why not using 2 samples? Because the Oric does not react below 3 sample
Why coding '00' on the shortest period? Because statistics on thousands of TAP files show Oric programs are made of 62% of zeroes.
Why not using 8 and 9 samples to code more complexity in a 'normal byte'? Because there is no more room in the code, nor working time: adding a more complex decoding would last longer than the 3 samples sinusoid time (which is the major time constraint), so information will be lost.
The same goes for all kind of bytes: sometimes the decoding loop only last 1 or 2 µs less than the 3 samples length.
Re: Novalight - very fast tape loading
Posted: Thu Mar 07, 2019 10:34 pm
by Symoon
Currently working at heavily re-structuring Novalight so the dictionary is stored in page 1, with the code, thus improve compatibility.
Not there yet...
1.2_working_screen.png (4.42 KiB) Viewed 15960 times
Re: Novalight - very fast tape loading
Posted: Fri Mar 08, 2019 6:07 am
by Symoon
Done! Successfully loaded 3D Fongus this morning. Still needs testing with all the options and ROM 1.0.
But I'm glad I'm not using page 2 anymore, that's a relief and the loading penalty for this patch is only of 0.013 second. It also saves 8 more bytes for the stack.
Novalight is now cut into 8 parts!
Next steps before v1.2 release:
- try adding a memory relocation if a loading in page 1 is detected (actually, should I make it automatic of set up a user option to choose the memory page?)
- optimizing the dictionary as discussed with NekoNoNiaow
Re: Novalight - very fast tape loading
Posted: Fri Mar 08, 2019 1:27 pm
by Symoon
Symoon wrote: ↑Fri Mar 08, 2019 6:07 am- try adding a memory relocation if a loading in page 1 is detected (actually, should I make it automatic of set up a user option to choose the memory page?)
Well, thinking about it, I'll try to make it automatic, but also add a user option (that will have to indicate the memory page number to use), that will override the automatic relocation. Not sure I know how to program this yet, but I'll try!
Re: Novalight - very fast tape loading
Posted: Sat Mar 09, 2019 9:14 am
by Symoon
Chema wrote: ↑Mon Feb 25, 2019 11:36 pm
Well, in fact what I have is the first Atmos ROM version, which does the right thing, and it is not bugged It checks for errors and stops loading if it finds them. What the Oric people did, after realizing that there were loading errors still, was to remove the error checking altogether in the second version 1.1b!
That is what the small program does... sets the IRQ vector to point to a small routine that clears the error flag... good job guys!
Back to this subject.
Actually there are two modifications that were done on the "1.1b" ROM in tape routines:
1- stop checking parity errors during the synchro (bytes 16 16 16 ... 16 16 24)
2- only display "Errors Found" and don't stop the program execution
Chema, I now understand better why you say it was not bugged while I said it was: I had in mind the 1st change, you had in mind the second
BTW that's quite funny to try fixing a loading problem with... A program to load
I will investigate all this ROM/patch story in detail and try to write an article about it.
Now, a poll: this 'ALC' program is useless with Novalight. Should the WAV generator keep it anyway (for historical reasons or whatever), or should it detect it and remove it from the WAV file?
Re: Novalight - very fast tape loading
Posted: Sat Mar 09, 2019 5:03 pm
by iss
For the poll: If it's not too difficult - detect and discard it during WAV generation.
Here is the quick disassembly of ALC:
It intercepts the IRQ and clears the error flag - so not useful (especially) with Novalight I think.
For historical and archive purposes we can comment the above source and upload it (for instance) on oric.org.
@Chema: I'm interested to have your ROM if possible. Do you know is it factory PROM or custom patched EPROM?
Do you know if it's in the list already here: http://oric.free.fr/ROMS/romident.txt
You can use testfbs testing utility to find ROM's crc32 which I posted here.
BTW, did your friend had chance to try the same testfbs with Jasmin ?
Re: Novalight - very fast tape loading
Posted: Sat Mar 09, 2019 8:42 pm
by Symoon
iss wrote: ↑Sat Mar 09, 2019 5:03 pm
For the poll: If it's not too difficult - detect and discard it during WAV generation.
Here is the quick disassembly of ALC.
It intercepts the IRQ and clears the error flag - so not useful (especially) with Novalight I think.
This leads to another question: would it be useful to have an option to insert ALC before the "normal speed" Novalight loader? What I don't get yet is if the 1st generation of Atmos who might detect parity errors during the synchro, may in certain conditions have loading error mistakes without ALC, even with a good sound. Maybe this is why Chema has trouble loading it...
Edit: mmmh, I probably tested it on an Atmos with the 1st version of the ROM already (no idea which of my machines have this 1st ROM version), so this is probably useless.
Re: Novalight - very fast tape loading
Posted: Sun Mar 10, 2019 11:10 am
by Symoon
1st working tests with user memory relocation!
Sadly doesn't help Psychiatric loading - it was using page 1 but I suspect it uses specific loading routines or protection.
Anyway.
Now, removing ALC and optimizing dictionary. I think I need some holidays
Re: Novalight - very fast tape loading
Posted: Tue Mar 12, 2019 10:57 pm
by Symoon
The (now useless) ALC program is now detected and skipped by Novalight!
Seems like French editors (Cobra Soft, Loriciels, Infogrames) enjoyed modifying it to store a part of their protection; in such cases of course the modified ALC program is kept.
Re: Novalight - very fast tape loading
Posted: Sat Mar 16, 2019 1:54 am
by Symoon
After a rest, I'm back to this dictionary matter.
Reading my existing code again, I see something interesting from my thoughts of 2 years ago.
Once the code considers it has chosen rightly the 14 values to put in the dictionary (wrongly or not actually), it sorts them again in the dictionary to make it optimal in a quite obvious way: you need to re-assign each selected value according to its frequency only. The way you selected them doesn't matter anymore: if you selected 500 "A", 30 "B" and 200 "C", you will obviously minimize the newly inserted length by assigning the shortest time to "A", then "C", then "B".
If you look at your Novalogic table BTW, it seems each time you choose to invert the selection order of two values, the 2nd one actually has more occurencies (is it just by chance, I don't know! Only exception is the last "TRUE" which I actually don't understand why it says "TRUE" ).
You also can see that, from the new signal length point of view, it's obvious that it would be much better to use "I" (frequency 940) from 3rd or 2nd, to 1st position, than K (frequency 860)
=> 940*4+860*6=8920, is better than K 1st and I 2nd: 940*6+860*4=9080
So I wonder if the problem solution could not actually be simplified/optimized:
1- remove the biggest frequency*gain for the first 13 values: if a choice had to be inverted, it will be selected next anyway
2- 14th value has to take the next choice into consideration to select the best one (if you don't, you may drop the best choice, just like my very first example with "A" and "B") EDIT: mmmh, no, this was another problem actually (I was not using gain, but initial length only)
3- the gain is finally optimized by sorting the 14 values by frequency
... Or am I again on a totally wrong track? EDIT: going to bed, I'm actually a bit lost again in how to initially select the right 14 values! I tend to think we are close to perfection with your initial selection: frequency*gain (which means calculation for each dictionary entry), and with my final optimisation which is just sorting again by freqency the 14 selected items, avoiding this "do I switch my choice or not" step during the selection.
Re: Novalight - very fast tape loading
Posted: Sat Mar 16, 2019 10:05 am
by Symoon
Symoon wrote: ↑Sat Mar 16, 2019 1:54 amI tend to think we are close to perfection with your initial selection: frequency*gain (which means calculation for each dictionary entry), and with my final optimisation which is just sorting again by freqency the 14 selected items, avoiding this "do I switch my choice or not" step during the selection.
Ok, just tried this solution, and it's actually better than my previous "use an average value to compute the gain"!
=> Zorgon is 0.01 second faster, choosing $CE (frequency 271), over $05 (frequency 613) with the previous method. That may sound ridiculous, but it just shows it's better, and compensates about 50% of the time lost with the new Novalight structure. I like that
Maybe with other programs the gain will be better.
EDIT: arrrgh, nope, 3D Fongus is slower with this new version. Something is still wrong somewhere. I think I got it: the last value to be chosen must be compared between a gain, and a length..
For instance if I take A and leave B out of the dictionary, I will gain what I calculated for A, but I will leave for B its inital length, not gain. The last comparison is different from the previous ones.
Re: Novalight - very fast tape loading
Posted: Sun Mar 17, 2019 12:47 pm
by Symoon
Ok, after another evening/morning working at it, I realise there is no simplified way to be optimal with the dictionary: each attempt (tried about 8 of them) gave better results on several programs, and inferior results on other. None was "always better".
The only solution IMHO would be to compute and combine all possibilities. Forget it, the gain is too small for such a complexity to deal with
So I will keep the code as it was, avoiding to complexify it too much for a random minor gain. That being said, anyone willing to give it a try as we did with NekoNoNiaow, please do
Holds the following changes since previously released v1.1k: V 1.2a (released 2019-03-24):
- auto-relocation in memory if page 1 is used by the program to load. /!\ heavy ASM/C dependency
- ALC program is now detected and skipped (it's a now useless patch for old tape players)
- public release, cumulating all previous changes from v1.1l to v1.1n.
V1.1n (not released)
!!! incompatibility with ALC program and Loriciels software using it as copy protection!
=> new version required to store the dictionary in page 1.
- stopped using page 2 to store the dictionary, it caused compatibility problems (especially with Loriciels
software). This leads to split Common Area 0 with additional 0a and 0b parts.
- this means the old loader is obsolete, so "old loader" option removed. Same for "no loader", as the old
loader was required to load a Novalight file without loader.
- "multipart booster" option allows to use a reduced Common Area 1 if the Kernel and Common Area 0 are kept
intact. Requires re-loading of Common Area 0a at the end of a part.
- optimized the dictionary by not only taking care of the time wasted by the most frequent isolated
bytes, but also by taking care of how much time they use with their new encoding in the dictionary.
(sometimes, it is worth selecting a byte that occupies a bit less time in the original signal,
but that will occupy much less time with its new encoding!)
V1.1m (not released)
- removed the $24 starting byte. This was here for historical versions and had become useless.
- the room saved by this removal (7 bytes) is now used for a new change: in an uncompressed byte,
the bit sequence '1111', which should be coded by 6 + 6 samples, is now coded by 7 samples.
!!! OLD LOADER not working anymore.
V1.1l (not released)
- removed 3 useless bytes in Common Area 2: in ROM 1.0, clearing the status line is
done by the CLOAD (it was historically required because Novalight used to display
its own "Searching", but this has been removed before the 1st public release)
- added 'multipart booster' option
- corrected a bug: oldloader was ignored if combined with F16 speed
- just for the joke, modified the options letters a bit (-s, -i, -m, -o and -n)
'No Loader' is superior to all the other options: standard speed, old loader and multipart booster will be ignored.
'Old Loader' will ignore 'Multipart Booster', as the booster only works with the normal loader.
Re: Novalight - very fast tape loading
Posted: Wed Mar 27, 2019 11:23 am
by Symoon
PS: ISS, unless someone finds bugs, I don't plan to change Novalight anymore, not before a while I mean
So you can go for it with TapOric
Re: Novalight - very fast tape loading
Posted: Wed Mar 27, 2019 12:22 pm
by iss
Cool! Congrats for the new release. I'll try to integrate it asap.
Re: Novalight - very fast tape loading
Posted: Wed Mar 27, 2019 12:27 pm
by Symoon
Good luck with the code
Re: Novalight - very fast tape loading
Posted: Fri Mar 29, 2019 12:27 am
by Symoon
Symoon wrote: ↑Wed Mar 27, 2019 11:23 am
PS: ISS, unless someone finds bugs, I don't plan to change Novalight anymore, not before a while I mean
So you can go for it with TapOric
(oh, well, I've just had a new idea... Might require some testing though... But very tempting... That would make of me a liar )
EDIT: will probably try, though I don't like much the idea of reading a signal with a margin limited to 1 or 2µs.
Re: Novalight - very fast tape loading
Posted: Sun Mar 31, 2019 4:44 pm
by NekoNoNiaow
Symoon wrote: ↑Sat Mar 16, 2019 1:54 am
After a rest, I'm back to this dictionary matter.
[snip]
If you look at your Novalogic table BTW, it seems each time you choose to invert the selection order of two values, the 2nd one actually has more occurencies (is it just by chance, I don't know! Only exception is the last "TRUE" which I actually don't understand why it says "TRUE" ).
If you look at the graph which shows the frequencies of each candidates, this is much clearer.
Choosing K is definitely the best choice as first candidate because it gives the better gain compared to the original size.
Symoon wrote: ↑Sat Mar 16, 2019 1:54 am
You also can see that, from the new signal length point of view, it's obvious that it would be much better to use "I" (frequency 940) from 3rd or 2nd, to 1st position, than K (frequency 860)
=> 940*4+860*6=8920, is better than K 1st and I 2nd: 940*6+860*4=9080
Yes, but what matters is what gives you the best size reduction overall by comparing how much you gain by encoding candidates using the dictionary compared to their original size.
Encoding K first, gives a (140-4)*860 = 116 960 bytes gain over the original non-compressed size.
Encoding I first, gives a (90-4)*940 = 80840 bytes gain over the original non-compressed size.
Even if encoding I first takes less place than encoding K first, this does not matter because encoding K gives a much better size reduction.
K can still take more place than I and have a better size reduction overall.
But even then, check at how the INVERT field is computed:
It sums the gains of both possible choices and compares them : K 1st + I 2nd < K 2nd + I 1st.
And the result is : K 1st gives a better gain.
As I mentioned in my previous post, for the subsequent entries, this is less true. But note that the graph is very helpful in visualizing when inverting is better than the greedy " always encodes the candidate with the best gain at each step" strategy.
Symoon wrote: ↑Sat Mar 16, 2019 1:54 am
So I wonder if the problem solution could not actually be simplified/optimized:
1- remove the biggest frequency*gain for the first 13 values: if a choice had to be inverted, it will be selected next anyway
This does not work, there is no general rule that says that inverting is good, this solely depends on the data.
And as you mention in your next post, only exhausting all combinations can tell you what is the best choice.
Re: Novalight - very fast tape loading
Posted: Sun Mar 31, 2019 5:46 pm
by Symoon
To be honest, I tried all the solutions we talked about and always found a program which ended being slower
So after a full day/night spent on it for a difference of something like 0.005s, I ended up choosing to keep a simple code and make a break with this - really needed to.
I'm working on another optimization which, should it work, may save (I hope) about 1 second for Zorgons Revenge. And it changes the dictionary too :p
Re: Novalight - very fast tape loading
Posted: Tue Apr 02, 2019 2:45 am
by NekoNoNiaow
Symoon wrote: ↑Sun Mar 31, 2019 5:46 pm
To be honest, I tried all the solutions we talked about and always found a program which ended being slower
So after a full day/night spent on it for a difference of something like 0.005s, I ended up choosing to keep a simple code and make a break with this - really needed to.
I'm working on another optimization which, should it work, may save (I hope) about 1 second for Zorgons Revenge. And it changes the dictionary too :p
Yup, there will always be a program which makes your heuristic wrong since the structure of data in the file ultimately determines what is the best dictionary.
I have been working on writing a combinatorial exploration algorithm with a user specified coverage parameter:
1 = simply take the best candidate at each round, that is explore only one branch of the combinatorial tree (1 over n! possible choices).
This is simply the greedy strategy I initially recommended.
2 = explore two branches of the tree, that is explore n!/(n-2)! choices (for n = 100 that is 100 * 99 = 9900 explorations.
3 = explore three branches of the tree, n! / (n-3)! choices (for n = 100, = 100 * 99 * 98 = 970200)
...
n = explore n branches of the tree : n! choices (for n=100 that's 9.33e+157 choices )
I will add it to the thread when I am done but it really is not that complicated.
Note that exploring the entire tree will be impossible obviously but that is fine:
playing with the Google sheet I posted some time ago shows that in general, exploring two branches is very often enough to select quite good choices. The sheet generates random data every time so it is very quick to have a look at the graph and get a grasp of how deep one generally needs to go.
Moreover, these numbers are probably worse than actual Oric data since I use a completely random distribution whereas structured data such as Oric programs should probably observe a Normal distribution and thus offer simpler choices.
I would expect that going three or four levels deep would give very good results.
I will post that soon.
Re: Novalight - very fast tape loading
Posted: Wed Apr 03, 2019 5:51 am
by NekoNoNiaow
So, here is the exhaustive exploration algorithm.
Note that this is:
a mix of c and c++ idioms
untested, just written in an editor and a sheet of paper
and thus is bound to be buggy if not entirely algorithmically faulty.
struct S_Element {
int id; // uniquely identifies this element
int freq;
int size;
};
struct S_Examined {
int id; // comes from S_Element.id, is used to identify S_Element-s.
int gain;
};
struct S_Chosen {
int totalgain;
std::list<S_Examined> elements;
};
// Ugly globals.
int DictionarySizes[] = { 4, 6, 9, 14, 21, 32, 48, 72 }; // replace with real values.
int DictCount = 8; // replace with real values.
int UsedDictEntries = 0;
int main(argc, argv)
{
int maxWidth = atoi( argv[1] ); // run as "carrot.exe <width>"
std::vector<S_Element> candidates;
candidates.push_back( { /* A */ 0, 800, 50, /*bytes...*/ } ); // invalid syntax but you get the idea
candidates.push_back( { /* B */ 1, 390, 180, /*bytes...*/ } );
candidates.push_back( { /* C */ 2, 280, 160, /*bytes...*/ } );
candidates.push_back( { /* D */ 3, 430, 10, /*bytes...*/ } );
// and so on...
if (maxWidth > candidates.size())
{
maxWidth = candidates.size(); // Do not explore more choices than exist.
}
if (maxWidth > DictCount)
{
maxWidth = DictCount; // No need to return more results than dictionary entries.
}
S_Chosen chosen = ChooseElements( DictionarySizes, &candidates[0], candidates.size(), maxWidth );
double examinedCount = Factorial(maxWidth);
examinedCount *= examinedCount; // Because we are re-examining all entries at each step.
double totalCount = Factorial(candidates.size());
totalCount *= totalCount;
printf("Examined %.2f combinations out of a total of %.2f.\n", examinedCount);
printf("Size reduction obtained: %d\n", chosen.totalgain);
printf("Fill your dictionary with (in order):\n");
for (auto it = chosen.elements.begin() ; *it ; ++it )
{
printf("id %d\n", (*it).id);
}
}
long Factorial(int n)
{
// Note: this might overflow, do not use too large values of width.
double top = 1.f;
for (int i = 2 ; i <= n ; i++ )
{
top *= (double)i;
}
return top;
}
bool CompareElements(S_Examined const& a, S_Examined const& b)
{
return a.gain < b.gain; // Sort by increasing gain.
}
S_Chosen ChooseElements( const int* const dict, const S_Element* const candidates, int count, int width )
{
S_Chosen chosen; // Will be returned with RVO so declared first is better.
chosen.totalgain = ~0; // largest negative number, can also use std::limits<int>::Min() or something like that.
if (count == 0 || width == 0)
{
chosen.totalgain = 0; // No candidates or dictionary entries, return an empty list.
return chosen;
}
std::vector<S_Examined> examined;
for (int c = 0 ; c < count ; c++)
{
// Compute the gain obtained by encoding this candidate in the first spot of the dictionary.
S_candidate& cand = *candidates[i];
int gain = ComputeGain(dict, cand);
examined.push_back( { c, gain } ); // Add to vector for sorting.
}
// We sort by gain, this allows us to make the greedy choice of "best immediate candidate"
// when we are asked to examine the minimum number of choices (width = 1).
std::sort(examined, CompareElements); // Sort by gain, highest last.
// Below is the loop where we examine "width" possible branches starting from the
// current best gain candidate. If width is 1, then we only examine the first best
// candidate, if width is 2, then we examine the two best candidates,
// if width == count, then we examine all possible candidates.
int index = count - 1;
S_Examined current = examined.last();
examined.pop_back(); // Remove from the array (since we are choosing it for the first iteration).
do
{
// Obtain the rest of the entries to put in the dictionary (recursive call).
std::vector<S_Element>&
S_Chosen rest = ComputeGains( dict + 1, &examined[0], count - 1, width - 1 );
// If this returned list has better or equal total gain than the current one, we choose it.
if (current.gain + rest.totalgain >= bestTotalGain)
{
chosen.totalgain = current.gain + rest.totalgain;
chosen.elements.swap(rest);
chosen.elements.push_front(current);
}
// Get the "next current" and place the "current current" back into the vector in its place.
S_Examined next = examined[--index];
examined[index] = current;
current = next;
} while (width-- >= 0); // We do at least one run of that loop.
return chosen;
}
int ComputeGain( int* dictionary, S_Element& element)
{
// Note: this could be negative
int gain = element.freq * (element.size - dictionary[0]);
return gain;
}
Re: Novalight - very fast tape loading
Posted: Wed Apr 03, 2019 10:04 am
by Symoon
Ouch, thanks
I promise I'll test it once I finish (or drop!) v1.3, which is half-coded and will require heavy testing, as I'm changing many more things than expected.
Re: Novalight - very fast tape loading
Posted: Fri Apr 05, 2019 11:53 am
by Symoon
Okay, so the new version is promising but not working yet, and not sure it will.
Loading time saved on Zorgon would be more than 1 second
So far, "normal" (uncompressed) bytes seem to load correctly, but I have bugs and not been able to check yet if it's a global timing problem, or specific to compressed bytes - RLE or dictionary.
Oh, and it seems it's not working with Euphoric anymore - no idea why but I don't care yet as it's not the main target.
Re: Novalight - very fast tape loading
Posted: Wed Apr 10, 2019 7:38 pm
by Symoon
Giving up Novalight 1.3.
Its main goal was to try reducing the Oric working time to save a decoded byte in memory by 1 sample (22.67µs). This also opened a door to extend the dictionary.
But timings are too short for real machines. It worked at 95% on emulators, but definitely doesn't on real Orics.
That being said, in the process I found a few more small optimisations that might bring new ideas. Maybe.
Re: Novalight - very fast tape loading
Posted: Mon Apr 22, 2019 6:36 am
by Symoon
Couldn't help trying again
And got a prototpye of Novalight 1.3 to work on Oricutron and Euphoric... But not on a real Oric.
The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators
That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.
Re: Novalight - very fast tape loading
Posted: Mon Apr 22, 2019 9:36 pm
by NekoNoNiaow
Symoon wrote: ↑Mon Apr 22, 2019 6:36 am
The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators
Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate.
Symoon wrote: ↑Mon Apr 22, 2019 6:36 am
That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.
Maybe someone will find some optimizations eventually.
Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.
Symoon wrote: ↑Mon Apr 22, 2019 6:36 am
The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators
Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate.
I tried with both Euphoric, and Oricutron (1.2). Both load correct bytes, while real Orics don't seem to.
What I fail to understand, now that I've spent 1 more hour on it, is that I was wrong: it's not a 3µs margin but rather somthing like 12µs (based on Oricutron), which should be far enough to work.
How I wish I could see the Oric memory content!
NekoNoNiaow wrote: ↑Mon Apr 22, 2019 9:36 pm
Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.
Not yet, sorry. I was focused on this new version which could have saved a full second, annd took me about 15 different test versions - too messy to start another change
But I will eventually! What I'd need first would be to understand why it doesn't work on real Orics to definitely get this 1.3 version out of my mind
Re: Novalight - very fast tape loading
Posted: Tue Apr 23, 2019 9:12 am
by Symoon
Ok, by loading bytes on screen, I've been able to understand the beginning of something: it's actually not the timing problem I was thinking of, but a silence between two sub-parts that causes a wrong reading of the following byte, on real machines only. The same silence that, IIRC, I had tried to remove but put back a while ago as it caused problems without it on some programs.
I have no clear explanation yet but it's obvious there's something there.
So I'm back with multiple testings... When I have time to
Re: Novalight - very fast tape loading
Posted: Wed Apr 24, 2019 9:45 am
by Symoon
Yes!
Calculations showed I was apparently less than 1µs short on real machines (playing the WAV file at 43500Hz instead of 44100 worked ), and I found a 2µs optimisation in the 'guilty' part of code.
So now, I have to finish the page 2 swapping to place the interrupt there and restore page 2 once loaded, and I hope Novalight 1.3 will work. Only drawback: it's apparently not working on Euphoric anymore.
On optimising: I might end up mystic or crazy, beliving that all that is required to find optimisations is to stare at the code with a threatening look, for one hour, then the code gets tired of it and surrenders, and you find a way to code it better
Re: Novalight - very fast tape loading
Posted: Fri Apr 26, 2019 9:37 pm
by Symoon
Found a way to, I hope, avoid using page 2.
So far, Novalight 1.3 works with Oricutron, and some programs work on real Atmos, others load but don't work. Got to find why.
Symoon wrote: ↑Wed Apr 24, 2019 9:45 amOnly drawback: it's apparently not working on Euphoric anymore.
Then tell F. Frances to fix Euphoric! If it works on real hardware and not emulators, the fault lies with the emulator, not your code.
And in any case, you should also tell the Oricutron guys to fix it as well if it does not behave like the actual hardware.
If the emulator was correct, you would have detected the issue while using it and it would have been much easier to debug.
Oh, and by the way, have you published the sources to 1.2?
Re: Novalight - very fast tape loading
Posted: Sat Apr 27, 2019 8:37 am
by Symoon
I'm afraid I've been optimistic yesterday. I still don't understand why, but I get few random loading errors. I know something must be too slow somewhere (slowing down from 44100Hz to 43500 loads perfectly), but errors occur on various bytes types (compressed or not), strange. And a bit depressing!
Symoon wrote: ↑Wed Apr 24, 2019 9:45 amOnly drawback: it's apparently not working on Euphoric anymore.
Then tell F. Frances to fix Euphoric! If it works on real hardware and not emulators, the fault lies with the emulator, not your code.
And in any case, you should also tell the Oricutron guys to fix it as well if it does not behave like the actual hardware.
If the emulator was correct, you would have detected the issue while using it and it would have been much easier to debug.
I do agree, but I have to check my code first, it's more likely that the problem is on this side Anyway, as I can't check exactly what happens on the real machine, it's hard to tell well the problem is!
Also, Euphoric might not be working because I'm using an old computer with XP (not plain DOS), and the difference may be due to slowdowns... As we're talking about 1 or 2µs differences, it might be the problem.
NekoNoNiaow wrote: ↑Sat Apr 27, 2019 3:19 am
Oh, and by the way, have you published the sources to 1.2?
Of course, I don't release the thing without sources
Good luck reading them though, sorry for that
Re: Novalight - very fast tape loading
Posted: Sat Apr 27, 2019 10:35 pm
by Symoon
I am still trying to understand the problem with Novalight 1.3 on real machines. Doing this, I had loaded Oricium, without success: I left it as it was, i.e. stuck near the end or the loading. So while the PC audio was still plugged on the Oric, I tried to reproduce the problem with Oricutron, and while typing CLOAD on the PC... Oricium on the real machine started!
Each time I pressed a key on Oricutron, I could see the scrolling of Oricium shift by one char.
So, so far, I turned Novalight 1.3 into an Oricium scrolling controller from PC! Not really what I'm expecting, I must say
Re: Novalight - very fast tape loading
Posted: Sun Apr 28, 2019 7:42 am
by Chema
In the last version I included a Press Key To Start just to avoid that, not sure if you tried and old version or pressed a key.
The thing is that the VSync hack detection routine detects noise coming from the tape in, it wrongly assumes the hack is present and identifies every pulse as the VSync signal to which all the drawing routines are tied
Re: Novalight - very fast tape loading
Posted: Sun Apr 28, 2019 7:54 am
by Symoon
Ha ha, thanks for the explanation
Chema wrote: ↑Sun Apr 28, 2019 7:42 am
In the last version I included a Press Key To Start just to avoid that, not sure if you tried and old version or pressed a key.
There is the "press a key" at start indeed! Not sure what happened there, anyway the loading probably wasn't 100% good so it could be anything EDIT: actually it was probably loaded 100%, but Novalight was waiting for one or two last sinusoids, which it got when I pressed an Oricutron key. And then Oricium carried on the same way with the sound input (the initial message being actually "insert vsync cable or press a key").
BTW Oricium's scrolling on a CRT screen is really a killer.
Re: Novalight - very fast tape loading
Posted: Sun Apr 28, 2019 9:13 am
by Symoon
Ok, back to Novagliht 1.3. Corrected a few bugs:
- a new threshold for v1.3 wasn't correctly set for real machines
- making "start-stop-bit" shorter required an additional wait between the two last loaded parts
Facing other problems now:
- on real machines, the 4-bits compression ('1111' compressed by 7 samples) seems too long now when ending a byte, still because "start-stop-bit" is shorter in v1.3. Works fine with Oricutron though. Got to see if I remove it when ending a byte, or if I can find another optimisation (don't think I can, I really already changed many things to save as much time as possible)
- got a "?SYNTAX ERROR" with Oricutron when using the standard speed for the loader (works fine on real Atmos and Oric-1!). This one is tough to find why, so far.
Re: Novalight - very fast tape loading
Posted: Tue Apr 30, 2019 2:52 am
by NekoNoNiaow
Symoon wrote: ↑Sat Apr 27, 2019 8:37 am
Also, Euphoric might not be working because I'm using an old computer with XP (not plain DOS), and the difference may be due to slowdowns... As we're talking about 1 or 2µs differences, it might be the problem.
This would still quality as a bug of the emulator, it should work fine even if it is not capable of emulating at full speed.
That is, the timing of the host machine should have no influence on the timing of the guest machine.
NekoNoNiaow wrote: ↑Sat Apr 27, 2019 3:19 am
Oh, and by the way, have you published the sources to 1.2?
Of course, I don't release the thing without sources
Good luck reading them though, sorry for that
I tried to get them on the Sourceforge site, but I always get lost on Sourceforge, their organization is such a mess.
Could you post a link to them? Thanks!
NekoNoNiaow wrote: ↑Sat Apr 27, 2019 3:19 am
Oh, and by the way, have you published the sources to 1.2?
Of course, I don't release the thing without sources
Good luck reading them though, sorry for that
I tried to get them on the Sourceforge site, but I always get lost on Sourceforge, their organization is such a mess.
Could you post a link to them? Thanks!
Symoon wrote: ↑Sun Apr 28, 2019 9:13 am
Facing other problems now:
(...)
- got a "?SYNTAX ERROR" with Oricutron when using the standard speed for the loader (works fine on real Atmos and Oric-1!). This one is tough to find why, so far.
The problem is that apparently, when loading a WAV file, Oricutron stops reading the WAV right at the end of the last byte... But doesn't read all the last stop bits that follow this last byte.
So, when it starts reading again the WAV, Oricutron begins by reading apparently the last stop bit, before reading the next program.
So when Novalight begins to load after the standard loader, Oricutron reads the remaining "standard fast speed" stop bit, and it means something else for Novalight, hence the strange behaviour.
I suspected then that on real machines, this didn't happen because the WAV kept playing, hence Novalight would start reading after those standard stop bits have been played. I'm not quite sure of this anymore, but it does work on real machines - it could be what I thought, or simply luck that gives a non-error timing!
I think the real solution would be to set only ONE stop bit for the last standard byte, before the Novalight singal begins (NekoNoNiaow, you were right when saying I should have done something more robust here. Lack of room for code sadly).
Re: Novalight - very fast tape loading
Posted: Thu May 02, 2019 9:53 am
by Symoon
Ok, at the moment, playing with stop bits of the last "standard speed byte", I have solution that works on Oricutron but not on real Oric, or on real Oric but not Oricutron.
I sure don't see myself asking Oricutron to keep the WAV file playing like a real tape players does. In fact, I want to sort things out, there's a timing thing on my side as it works fine (luck again?) with F16 speed, but not with standard speed.
Re: Novalight - very fast tape loading
Posted: Thu May 02, 2019 11:18 am
by iss
Symoon, about the issue with Oricutron turn off both options "Turbo tape" and "VSync hack".
Else keep in mind that Oricutron emulates the raw tape in "granularity" of CPU instruction lengths,
i.e. it's possible that somehow short pulses (with length less than current instruction cycles) are missed.
I'm working now exactly on this problem related to OricExos and I have an idea for a "high precision" emulation mode ,
but this will take some time...
Re: Novalight - very fast tape loading
Posted: Sat May 04, 2019 11:16 am
by Symoon
iss wrote: ↑Thu May 02, 2019 11:18 am
Symoon, about the issue with Oricutron turn off both options "Turbo tape" and "VSync hack".
Else keep in mind that Oricutron emulates the raw tape in "granularity" of CPU instruction lengths,
i.e. it's possible that somehow short pulses (with length less than current instruction cycles) are missed.
I'm working now exactly on this problem related to OricExos and I have an idea for a "high precision" emulation mode ,
but this will take some time...
Thanks! I should have thought about that; actually, playing with "turbo tape" is bringing more headaches
At the moment, I'm empirically trying to understand what happens. I got a signal that works on Oricutron but not real Oric, and I don't really understand why yet - it should work.
So it brings interesting questions anyway!
Re: Novalight - very fast tape loading
Posted: Sun May 05, 2019 6:07 am
by Symoon
Ok, so the problem was that, after a standard speed program, the transition to Novalight wasn't working on Oricutron because of the stop bits (I haven't figured out why it worked with F16 speed actually, probably luck )
=> I tried to remove the stop bits on the very last byte of the "standard speed" program: it worked fine on Oricutron, but failed on real Orics. Still haven't unerstood why: the last parity bit wasn't correctly decoded though the timing on this last bit was correct (see picture below).
=> I added ONE standard stop bit to this last byte, and it worked again on real Orics... But not on Oricutron anymore
So I finally made a risky optimisation in the code: at the beginning of Novalight decoding, instead of initialising the interrupts and so on, I assume the interrupt flag is set (since Novalight is supposed to always begins with "stop pulses", it should be ok). This saved room allows to begin with TWO initial pulses reading that are trashed, instead of one. This way, the last stop bit is read but not processed, so real Orics and Oricutron are all happy now!
I still don't understand everything clearly but it seems to work fine so far. In the process, I found another 4-bytes possible optimisation (no real timing gain though, but useless code complexity)
Re: Novalight - very fast tape loading
Posted: Mon May 06, 2019 5:32 pm
by Symoon
Symoon wrote: ↑Sun May 05, 2019 6:07 amSo I finally made a risky optimisation in the code: at the beginning of Novalight decoding, instead of initialising the interrupts and so on, I assume the interrupt flag is set (since Novalight is supposed to always begins with "stop pulses", it should be ok). This saved room allows to begin with TWO initial pulses reading that are trashed, instead of one. This way, the last stop bit is read but not processed, so real Orics and Oricutron are all happy now!
Well, real Orics are actually randomly happy!
This is becoming tiresome, I will switch back to the previous code I think. Anyway Oricutron doesn't require standard speed loading. And actually, the problem could be elsewhere as, still when playing the WAV at 43500Hz, it all runs fine again. Very strange.
Re: Novalight - very fast tape loading
Posted: Mon May 06, 2019 6:13 pm
by Chema
You've done a thorough research here, and tried many many different options to squeeze every little bit of loading time. Your work is incredible and I really think gaining yet another 0.001% is not worth it. Unless you are having a really good time by trying
I always thought that Fabrice's tap2CD was the fastest speed possible, but for compression (and your smart trick with encoding 1s and 0s the other way around based on statistics), and you've gone much further.
I find your work here incredible and praiseworthy. Save your energy for other projects, please
Congratulations indeed.
Re: Novalight - very fast tape loading
Posted: Mon May 06, 2019 6:32 pm
by Symoon
Thanks a lot Chema
The "problem" is that new ideas keep popping up! That, I think, will solve the problems. I got another new one now
Would it be for 0.001, sure, I'd give up. But one full second... It keeps being tempting. One last try maybe
Re: Novalight - very fast tape loading
Posted: Mon May 06, 2019 8:05 pm
by Symoon
BTW, that still makes me laugh Hey maybe that could be a way to control something (though just one button is not much!)
Re: Novalight - very fast tape loading
Posted: Mon May 06, 2019 9:36 pm
by Chema
Lol that's the price to add support for the sync hack... never again
Re: Novalight - very fast tape loading
Posted: Sun May 19, 2019 8:33 am
by Symoon
1st successful loading with Novalight 1.3!
Oricium now loads in 15.3 seconds (instead of 16.3 with Novalight 1.2).
Some options are still broken and need to be fixed, and testing on a single file is not enough, so I'm being carfeul with the "happy register" so far. But I'm glad it finally worked, with a significant speed improvement.
Re: Novalight - very fast tape loading
Posted: Tue May 21, 2019 12:50 pm
by Symoon
Ouch, having LOADS of new optimization ideas... If things work as I hope (no gaurantee), it might save about 18 bytes, extend the dictionary from 18 to 21 bytes, and save time on repeats.
Re: Novalight - very fast tape loading
Posted: Wed May 22, 2019 12:47 pm
by Symoon
One of the ideas might be, instead of a dictionary for a whole program only, to use a part of the dictionary for a sub-disctionary for each page loaded. For instance a 6-bytes long dictionary, that would be loaded after each page crossing in memory, and specific to the next page.
If you take the 1st 1024 bytes from Zorgon, that makes four pages, so four dictionaries of 6 bytes of the "most used bytes in the page".
Well among those 24 bytes (6*4 = 24), there are 17 different values. Only 7 are found in two pages or three, none in the four.
It has to be checked is if the loading time of those 6 bytes saves actually time with the gain thanks to this dictionary.
Just an idea so far, for the moment I have other issues to solve, that could help saving room, to code this new loading.
Re: Novalight - very fast tape loading
Posted: Mon May 27, 2019 8:54 am
by Symoon
Symoon wrote: ↑Wed May 22, 2019 12:47 pm
One of the ideas might be, instead of a dictionary for a whole program only, to use a part of the dictionary for a sub-disctionary for each page loaded. (...)
It has to be checked is if the loading time of those 6 bytes saves actually time with the gain thanks to this dictionary.
I simulated the new WAV size with such pages dictionary, and zrogon was acually a bit slower, even before adding the bytes to load. Probably due to the fact that it makes the big dictionary less efficient. Idea dropped!
Back to technical optimization, ideas will be for later.
Re: Novalight - very fast tape loading
Posted: Wed Jun 12, 2019 12:41 am
by NekoNoNiaow
So many improvements one after another that I am now officially completely lost as to what the actual progression is.
Maybe you should setup a page somewhere (possibly http://wiki.defence-force.org/doku.php? ... :novalight ? ) so the current status is clear.
Also, I still do not understand why you modify your code to also work with Oricutron, I think you should definitely not do so.
If Oricutron does not behave like the Oric's then Oricutron just needs to be updated but you should probably not modify your code since that increases the risk of it being compatible with less actual machines.
You may actually be losing some optimization opportunities by trying to make it work with Oricutron.
Re: Novalight - very fast tape loading
Posted: Wed Jun 12, 2019 6:38 am
by Symoon
NekoNoNiaow wrote: ↑Wed Jun 12, 2019 12:41 am
So many improvements one after another that I am now officially completely lost as to what the actual progression is.
Well, I'm currently stuck in a "nothing works" warp: Orics not working, phones not working, computer not working as expected, and similar things at work, so I can't work at all on my projects
Also, I still do not understand why you modify your code to also work with Oricutron, I think you should definitely not do so.
That's because it's a major debugging tool for me
Euphoric already doesn't work anymore, and without Oricutron I'd be blind to understand what happens in case of failure (even with it, it's not so easy )
Also, I still do not understand why you modify your code to also work with Oricutron, I think you should definitely not do so.
That's because it's a major debugging tool for me
Euphoric already doesn't work anymore, and without Oricutron I'd be blind to understand what happens in case of failure (even with it, it's not so easy )
Can I strangle you? With all my love?
I understand that it is useful when the hardware and emulator behave the same, but when Oricutron is not accurate then your debugging is not either.
One thing that would solve it though: debugging the original hardware via a minimal hardware interface between your PC and the Oric.
On consoles (PS4, Xbox, Switch, etc.) we use devkits which allow to remote-debug and trace directly on the original hardware from our dev PCs, if the Oric expansion connector exposes the signals necessary to freeze the CPU, it should be possible to do the same with a relatively minimal hardware interface (an Arduino or a Raspberry PI would be more than enough).
And maybe Jede's current WIP expansion may be able to be used for that purpose?
Sorry for the off-topic. Dreaming out loudly. (I have been thinking about designing such an interface for my Amiga 500 for quite some time. )
(Someone made one for the MegaDrive/Genesis: https://hackaday.com/2014/06/18/the-seg ... e-dev-kit/.)
Symoon wrote: ↑Wed Jun 12, 2019 6:38 am
That's because it's a major debugging tool for me
Euphoric already doesn't work anymore, and without Oricutron I'd be blind to understand what happens in case of failure (even with it, it's not so easy )
Can I strangle you? With all my love?
I understand that it is useful when the hardware and emulator behave the same, but when Oricutron is not accurate then your debugging is not either.
Ha ha, wait wait wait
I'm not talking about validating new improvements. For this, of course I use real Orics, about 5 of them!
I'm talking about debugging my code in which I often make mistakes in a new version
Novalight code is self-modifying, it's a real hell (for me) to follow on paper and change it being sure I'm not breaking anything, especially when I can only work on it half an hour every 8 days. So being able to reach any part of the code in a monitor and execute it step by step really saves time
One thing that would solve it though: debugging the original hardware via a minimal hardware interface between your PC and the Oric.
On consoles (PS4, Xbox, Switch, etc.) we use devkits which allow to remote-debug and trace directly on the original hardware from our dev PCs, if the Oric expansion connector exposes the signals necessary to freeze the CPU, it should be possible to do the same with a relatively minimal hardware interface (an Arduino or a Raspberry PI would be more than enough).
And maybe Jede's current WIP expansion may be able to be used for that purpose?
Well if there's something that can help me seeing the VIA state and stop the incoming WAV file after each instruction (so I can measure the time after each of them), I'd buy it
Re: Novalight - very fast tape loading
Posted: Mon Jun 17, 2019 12:44 pm
by Symoon
Yesterday, I corrected a bug in the 1.3h version.
It became visible at the latest Oric meeting when I tried to convert the TAP of Retroric's game (Electroric). It seemed to load but didn't run.
The reason: when the last byte of a memory page was using a specific encoding (1 sinusoid for '1111', which is longer to decode), there was not enough time to both decode it and change de memory page address.
I added back a small delay in the tape signal at the end of each memory page (it existed before), and everything is back to normal in such cases.
It wasn't easy to find such cases because it requires to:
- be the last byte of a memory page
- not be a RLE byte
- not be a dictionary byte
- have a specific value
With later versions, it might be possible to remove this delay again. I now have all I need to test it, anyway
Re: Novalight - very fast tape loading
Posted: Tue Jun 18, 2019 1:12 pm
by NekoNoNiaow
Symoon wrote: ↑Thu Jun 13, 2019 6:47 am
Ha ha, wait wait wait
I'm not talking about validating new improvements. For this, of course I use real Orics, about 5 of them!
I'm talking about debugging my code in which I often make mistakes in a new version
Novalight code is self-modifying, it's a real hell (for me) to follow on paper and change it being sure I'm not breaking anything, especially when I can only work on it half an hour every 8 days. So being able to reach any part of the code in a monitor and execute it step by step really saves time
One thing that would solve it though: debugging the original hardware via a minimal hardware interface between your PC and the Oric.
[...]
And maybe Jede's current WIP expansion may be able to be used for that purpose?
Well if there's something that can help me seeing the VIA state and stop the incoming WAV file after each instruction (so I can measure the time after each of them), I'd buy it
I am not sure I correctly understand your needs, can you elaborate a bit?
To me, it seems that what you want is:
- assuming the wav file is represented by a stream of analog values sent to the VIA's input pin
- that this stream can be frozen at the same time as the CPU and VIA so you can inspect values (current time, etc.) while step-tracing through your code
Symoon wrote: ↑Thu Jun 13, 2019 6:47 am
Well if there's something that can help me seeing the VIA state and stop the incoming WAV file after each instruction (so I can measure the time after each of them), I'd buy it
I am not sure I correctly understand your needs, can you elaborate a bit?
To me, it seems that what you want is:
- assuming the wav file is represented by a stream of analog values sent to the VIA's input pin
- that this stream can be frozen at the same time as the CPU and VIA so you can inspect values (current time, etc.) while step-tracing through your code
Is this correct?
Well I was describing what I'm doing with Oricutrion, which helps a lot: executing my new code step by step while it reads the WAV file I just produced with a beta version of Novalight.
The code can be wrong, but the WAV file too (I'm sometimes changing the signal).
I can't see how to do on real hadware, especially since the WAV file would carry on playing while I'd be watching a monitor.
Doing this helped me debugging, and optimizing, finding when I have spare execution time I could use (the goal being to minimize the time spent by the Oric "waiting" for the VIA to detect the next pulse). It also helps me understanding when I've been too optimistic on some treatments, which last too long and lead to information loss. Not forgetting that Oricutron is more optimistic than the real harware that requires more time.
Re: Novalight - very fast tape loading
Posted: Wed Jul 10, 2019 2:04 am
by NekoNoNiaow
Symoon wrote: ↑Thu Jun 20, 2019 9:10 am
Well I was describing what I'm doing with Oricutrion, which helps a lot: executing my new code step by step while it reads the WAV file I just produced with a beta version of Novalight.
The code can be wrong, but the WAV file too (I'm sometimes changing the signal).
I can't see how to do on real hadware, especially since the WAV file would carry on playing while I'd be watching a monitor.
Doing this helped me debugging, and optimizing, finding when I have spare execution time I could use (the goal being to minimize the time spent by the Oric "waiting" for the VIA to detect the next pulse). It also helps me understanding when I've been too optimistic on some treatments, which last too long and lead to information loss. Not forgetting that Oricutron is more optimistic than the real harware that requires more time.
Oki, gotcha. Thanks for the precisions!
Is there a way on the Oric to freeze the entire machine ULA, CPU, VIA, at the same time?
Re: Novalight - very fast tape loading
Posted: Sat Oct 26, 2019 7:53 pm
by Symoon
Seems something's wrong between Pulsoids TAP version and Novalight.
Loading directly the game works fine, but loading it after the loading screens doesn't. I wonder if the execution of the Twilighte's animation could have something to do with that, or if it's a completely different reason.
The good news is that it crashes both on real machine and on Oricutron, so understanding the puzzle should be easier.
I tried relocating Novalight, and discovered by the same way that my working beta version of Novalight 1.3 has some bugs with relocation parameter -ouch, more work on the TO DO list, don't hold your breath for Novalight 1.3.
Re: Novalight - very fast tape loading
Posted: Tue Nov 12, 2019 11:44 pm
by Symoon
Symoon wrote: ↑Sat Oct 26, 2019 7:53 pm
Seems something's wrong between Pulsoids TAP version and Novalight.
Bug solved, it probably affected all Novalight's versions: when a multipart program had one part beginning with a byte equal to the last byte of the previous part, this byte was encoded as a RLE repetition - but the value was wrong since other things were loaded in between.
So I reinitialised the "previous byte value" at the beginning of each new part.
Pulsoids works on Oricutron now
Re: Novalight - very fast tape loading
Posted: Sun Nov 24, 2019 12:19 pm
by Vyper68
Great news Symoon thanks for such a great program, it has made my life a lot better as a tape based user
Re: Novalight - very fast tape loading
Posted: Thu Nov 19, 2020 11:04 pm
by iss
Well, it's almost one year later but I think that the 'last word' about the max speed has not yet been spoken.
@Symoon: Can you remember the reason to use this code for sending a bit in Novalight/F16 source :
// F16 is an optimised WAV signal working a bit faster, with all Oric standard ROM routines.
void emit_F16_bit(int bit)
{
switch (speed) {
case 44100:
if (bit) {
emit_level(1,48);
emit_level(2,208);
emit_level(2,48);
} else {
emit_level(1,48);
emit_level(2,208);
emit_level(20,48);
}
break;
}
}
____
| |______ - for '1'
____
| |__... x21 ...____ - for '0'
or: why you start with 1 sample low-level instead directly with high-level assuming that previous bit ended with a low-level?
btw, your code works fine I just want to find the clue .
Re: Novalight - very fast tape loading
Posted: Fri Nov 20, 2020 9:46 am
by Symoon
Ah, this is for the F16 speed bytes, right? (it's used for the loader part, loading with the standard ROM routines). EDIT: ok, you said it in your post, doh!
IIRC, for Novalight, the best results (on something like 10 machines) for the four base waveforms were like:
_--
_--_
_--__
_--___
So I suspect I kept this "best" wavform for the F16 format.
PS: there's a working Novalight 1.3, which was a bit faster. I never released it, being close to another speed upgrade, but I was chasing a bug I didn't understand, and had to leave it aside in 2020. Hope I can get back to it soon
Re: Novalight - very fast tape loading
Posted: Sat Oct 21, 2023 3:54 pm
by iss
Symoon wrote: ↑Fri Nov 20, 2020 9:46 amHope I can get back to it soon
Any chance for 1.3 in 2023
FYI, aside joking, I'm playing with Novalight 1.2a trying to make it work in real-time....
Symoon wrote: ↑Fri Nov 20, 2020 9:46 amHope I can get back to it soon
Any chance for 1.3 in 2023
I'm afraid not
There's a 1.3g that goes a bit faster but I seem to recall it wasn't as "robust" as the 1.2a, I mean it was working fine with some machines but it seemed to have loading errors more often.
I also had internal code speed improvements in mind, but put it aside for other projects, especially since, in the end, all those improvements would have made a 12 seconds loading "drop" to something like 11.5 seconds. I mean, the amount of work and testing was only bringing a marginal gain, so I chose to work on other things
iss wrote: ↑Sat Oct 21, 2023 3:54 pm
FYI, aside joking, I'm playing with Novalight 1.2a trying to make it work in real-time....