Novalight - very fast tape loading

Anything related to the tools Tap2Wav, Tap2CD, Tap2Dsk, Sedoric Disc Manager, Tape Header Creator, WriteDsk, and generaly speaking tools related to the management of Oric data files and devices.
User avatar
NekoNoNiaow
Flying Officer
Posts: 218
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Novalight - very fast tape loading

Post by NekoNoNiaow » Tue Apr 02, 2019 2:45 am

Symoon wrote:
Sun Mar 31, 2019 5:46 pm
To be honest, I tried all the solutions we talked about and always found a program which ended being slower ;)
So after a full day/night spent on it for a difference of something like 0.005s, I ended up choosing to keep a simple code and make a break with this - really needed to.

I'm working on another optimization which, should it work, may save (I hope) about 1 second for Zorgons Revenge. And it changes the dictionary too :p
Yup, there will always be a program which makes your heuristic wrong since the structure of data in the file ultimately determines what is the best dictionary.

I have been working on writing a combinatorial exploration algorithm with a user specified coverage parameter:
  • 1 = simply take the best candidate at each round, that is explore only one branch of the combinatorial tree (1 over n! possible choices).
    This is simply the greedy strategy I initially recommended.
  • 2 = explore two branches of the tree, that is explore n!/(n-2)! choices (for n = 100 that is 100 * 99 = 9900 explorations.
  • 3 = explore three branches of the tree, n! / (n-3)! choices (for n = 100, = 100 * 99 * 98 = 970200)
  • ...
  • n = explore n branches of the tree : n! choices (for n=100 that's 9.33e+157 choices :D :) :( :o :shock: :? 8) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :mrgreen: )
I will add it to the thread when I am done but it really is not that complicated. ;)

Note that exploring the entire tree will be impossible obviously but that is fine:
playing with the Google sheet I posted some time ago shows that in general, exploring two branches is very often enough to select quite good choices. The sheet generates random data every time so it is very quick to have a look at the graph and get a grasp of how deep one generally needs to go.
Moreover, these numbers are probably worse than actual Oric data since I use a completely random distribution whereas structured data such as Oric programs should probably observe a Normal distribution and thus offer simpler choices.

I would expect that going three or four levels deep would give very good results.
I will post that soon. ;)

User avatar
NekoNoNiaow
Flying Officer
Posts: 218
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Novalight - very fast tape loading

Post by NekoNoNiaow » Wed Apr 03, 2019 5:51 am

So, here is the exhaustive exploration algorithm.

Note that this is:
  • a mix of c and c++ idioms
  • untested, just written in an editor and a sheet of paper ;)
and thus is bound to be buggy if not entirely algorithmically faulty. ;)

Code: Select all

struct S_Element {
  int id;   // uniquely identifies this element
  int freq;
  int size;
};
struct S_Examined {
  int id;   // comes from S_Element.id, is used to identify S_Element-s.
  int gain;
};
struct S_Chosen {
  int totalgain;
  std::list<S_Examined> elements;
};

// Ugly globals.
int DictionarySizes[] = { 4, 6, 9, 14, 21, 32, 48, 72 }; // replace with real values.
int DictCount         = 8;                               // replace with real values.
int UsedDictEntries   = 0;

int main(argc, argv)
{
  int maxWidth = atoi( argv[1] );  // run as "carrot.exe <width>"

  std::vector<S_Element> candidates;
  candidates.push_back( { /* A */ 0, 800,  50, /*bytes...*/ } ); // invalid syntax but you get the idea
  candidates.push_back( { /* B */ 1, 390, 180, /*bytes...*/ } );
  candidates.push_back( { /* C */ 2, 280, 160, /*bytes...*/ } );
  candidates.push_back( { /* D */ 3, 430,  10, /*bytes...*/ } );
  // and so on...

  if (maxWidth > candidates.size())
  {
    maxWidth = candidates.size(); // Do not explore more choices than exist.
  }
  if (maxWidth > DictCount)
  {
    maxWidth = DictCount; // No need to return more results than dictionary entries.
  }

  S_Chosen chosen = ChooseElements( DictionarySizes, &candidates[0], candidates.size(), maxWidth );

  double examinedCount = Factorial(maxWidth);
  examinedCount *= examinedCount; // Because we are re-examining all entries at each step.
  double totalCount = Factorial(candidates.size());
  totalCount *= totalCount;
  printf("Examined %.2f combinations out of a total of %.2f.\n", examinedCount);
  printf("Size reduction obtained: %d\n", chosen.totalgain);
  printf("Fill your dictionary with (in order):\n");
  for (auto it = chosen.elements.begin() ; *it ; ++it )
  {
    printf("id %d\n", (*it).id);
  }
}

long Factorial(int n)
{
  // Note: this might overflow, do not use too large values of width.
  double top = 1.f;
  for (int i = 2 ; i <= n ; i++ )
  {
    top *= (double)i;
  }

  return top;
}

bool CompareElements(S_Examined const& a, S_Examined const& b)
{
  return a.gain < b.gain; // Sort by increasing gain.
}

S_Chosen ChooseElements( const int* const dict, const S_Element* const candidates, int count, int width )
{
  S_Chosen chosen; // Will be returned with RVO so declared first is better.
  chosen.totalgain = ~0; // largest negative number, can also use std::limits<int>::Min() or something like that.

  if (count == 0 || width == 0)
  {
    chosen.totalgain = 0; // No candidates or dictionary entries, return an empty list.
    return chosen;
  }

  std::vector<S_Examined> examined;

  for (int c = 0 ; c < count ; c++)
  {
    // Compute the gain obtained by encoding this candidate in the first spot of the dictionary.
    S_candidate& cand = *candidates[i];
    int gain = ComputeGain(dict, cand);
    examined.push_back( { c, gain } );  // Add to vector for sorting.
  }

  // We sort by gain, this allows us to make the greedy choice of "best immediate candidate"
  // when we are asked to examine the minimum number of choices (width = 1).

  std::sort(examined, CompareElements); // Sort by gain, highest last.

  // Below is the loop where we examine "width" possible branches starting from the
  // current best gain candidate. If width is 1, then we only examine the first best
  // candidate, if width is 2, then we examine the two best candidates,
  // if width == count, then we examine all possible candidates.

  int index = count - 1;
  S_Examined current = examined.last();
  examined.pop_back();  // Remove from the array (since we are choosing it for the first iteration).

  do
  {
    // Obtain the rest of the entries to put in the dictionary (recursive call).
    std::vector<S_Element>&
    S_Chosen rest = ComputeGains( dict + 1, &examined[0], count - 1, width - 1 );

    // If this returned list has better or equal total gain than the current one, we choose it.
    if (current.gain + rest.totalgain >= bestTotalGain)
    {
      chosen.totalgain = current.gain + rest.totalgain;
      chosen.elements.swap(rest);
      chosen.elements.push_front(current);
    }

    // Get the "next current" and place the "current current" back into the vector in its place.
    S_Examined next = examined[--index];
    examined[index] = current;
    current = next;

  } while (width-- >= 0); // We do at least one run of that loop.

  return chosen;
}

int ComputeGain( int* dictionary, S_Element& element)
{
  // Note: this could be negative
  int gain = element.freq * (element.size - dictionary[0]);

  return gain;
}

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Wed Apr 03, 2019 10:04 am

Ouch, thanks :D

I promise I'll test it once I finish (or drop!) v1.3, which is half-coded and will require heavy testing, as I'm changing many more things than expected.

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Fri Apr 05, 2019 11:53 am

Okay, so the new version is promising but not working yet, and not sure it will.
Loading time saved on Zorgon would be more than 1 second :)

So far, "normal" (uncompressed) bytes seem to load correctly, but I have bugs and not been able to check yet if it's a global timing problem, or specific to compressed bytes - RLE or dictionary.
Oh, and it seems it's not working with Euphoric anymore - no idea why but I don't care yet as it's not the main target.

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Wed Apr 10, 2019 7:38 pm

Giving up Novalight 1.3.
Its main goal was to try reducing the Oric working time to save a decoded byte in memory by 1 sample (22.67µs). This also opened a door to extend the dictionary.

But timings are too short for real machines. It worked at 95% on emulators, but definitely doesn't on real Orics.

That being said, in the process I found a few more small optimisations that might bring new ideas. Maybe.

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Mon Apr 22, 2019 6:36 am

Couldn't help trying again ;)
And got a prototpye of Novalight 1.3 to work on Oricutron and Euphoric... But not on a real Oric.

The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators ;)

That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.

User avatar
NekoNoNiaow
Flying Officer
Posts: 218
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Novalight - very fast tape loading

Post by NekoNoNiaow » Mon Apr 22, 2019 9:36 pm

Symoon wrote:
Mon Apr 22, 2019 6:36 am
The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators ;)
Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate. ;)
Symoon wrote:
Mon Apr 22, 2019 6:36 am
That's too bad, it really was opening doors to new things, but I suppose it has to stop at some point! Also, this very last version put the whole interrupt in page 2 (to save a JMP, 3µs), which was already risky as version 1.1 showed.
Maybe someone will find some optimizations eventually. ;)
Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Mon Apr 22, 2019 10:08 pm

NekoNoNiaow wrote:
Mon Apr 22, 2019 9:36 pm
Symoon wrote:
Mon Apr 22, 2019 6:36 am
The decoding is probably too demanding, there's actually only a 3µs margin left on emulators, and I suppose real hardware doesn't cope with it. Hey, that's a difference between real machines and emulators ;)
Is this the case for all emulators? Out of curiosity which ones did you try?
Also, I guess you could submit the issue to emulator writers since I suppose they would be happy to know when the emulators are not accurate. ;)
I tried with both Euphoric, and Oricutron (1.2). Both load correct bytes, while real Orics don't seem to.
What I fail to understand, now that I've spent 1 more hour on it, is that I was wrong: it's not a 3µs margin but rather somthing like 12µs (based on Oricutron), which should be far enough to work.
How I wish I could see the Oric memory content!
NekoNoNiaow wrote:
Mon Apr 22, 2019 9:36 pm
Have you implemented the combinatorial exploration algorithm I posted some time ago? It should definitely help create smaller files.
Not yet, sorry. I was focused on this new version which could have saved a full second, annd took me about 15 different test versions - too messy to start another change ;)
But I will eventually! What I'd need first would be to understand why it doesn't work on real Orics to definitely get this 1.3 version out of my mind :x

User avatar
Symoon
Archivist
Posts: 1539
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France
Contact:

Re: Novalight - very fast tape loading

Post by Symoon » Tue Apr 23, 2019 9:12 am

Ok, by loading bytes on screen, I've been able to understand the beginning of something: it's actually not the timing problem I was thinking of, but a silence between two sub-parts that causes a wrong reading of the following byte, on real machines only. The same silence that, IIRC, I had tried to remove but put back a while ago as it caused problems without it on some programs.

I have no clear explanation yet but it's obvious there's something there.
So I'm back with multiple testings... When I have time to :?

Post Reply