Line routine - Second attempt

Here you can ask questions or provide insights about how to use efficiently 6502 assembly code on the Oric.
User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Sat Feb 06, 2010 12:00 pm

I worked on optimizing a little bit this morning.

It turns out, that the memory layout is a big performance killer. Not the 6 bits/byte, but the 40 bytes/row.

If the screen buffer would have been organized in columns instead of rows, we wouldn't have to update 16-bit pointers which each step in y-direction. And in x-direction we only would have to update them every 6th pixel.

I wonder if there is an efficient way to handle this problem.
Have fun!
thrust26

JamesD
Flight Lieutenant
Posts: 352
Joined: Tue Nov 07, 2006 7:38 am

Post by JamesD » Sat Feb 06, 2010 3:33 pm

You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem? :D :D :D
Seriously, there is a reason I try to stick to the 6803/9 CPUs. I ported a simple music player to the 6502 (it's somewhere on this forum) just to re-familiarize myself with the CPU, and the code size was at least 3 times that of the 6803 version (not that I claim my 6502 code is fully optimized).

I don't know that you'll get more efficient than a page 0 (16bit) pointer for Y and then index off of it for X.
Adding/Subtracting 40 to/from the Y pointer shouldn't cause all 16 bits to be updated more than every 4 lines or so, but it does look ugly in the code.

FWIW, this is one of the strongest cases for adding a special case for horizontal lines (X1 = X2). You can remove the 16 bit Y code from inside the loop.

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Sat Feb 06, 2010 3:42 pm

JamesD wrote:You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem? :D :D :D
Not at all. They are just time consuming. And with a clever memory setup you can reduce that time. Unfortunately the Oric setup is not that clever. :(
I don't know that you'll get more efficient than a page 0 (16bit) pointer for Y and then index off of it for X.
Adding/Subtracting 40 to/from the Y pointer shouldn't cause all 16 bits to be updated more than every 4 lines or so, but it does look ugly in the code.
Every 6 lines. But instead of simply indexing with Y, you have to add some extra logic. And that costs time.
FWIW, this is one of the strongest cases for adding a special case for horizontal lines (X1 = X2). You can remove the 16 bit Y code from inside the loop.
That's all already in the code. :)
Have fun!
thrust26

JamesD
Flight Lieutenant
Posts: 352
Joined: Tue Nov 07, 2006 7:38 am

Post by JamesD » Sat Feb 06, 2010 4:31 pm

thrust26 wrote:
JamesD wrote:You have been programming the 6502 for how long and you are just now figuring out 16 bit pointers are a problem? :D :D :D
Not at all. They are just time consuming. And with a clever memory setup you can reduce that time. Unfortunately the Oric setup is not that clever. :(
I was just giving you a bad time.
Just remember that what is clever for one application, might be a nightmare for another.

User avatar
Chema
Game master
Posts: 2067
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema » Mon Feb 08, 2010 10:20 am

Hi.

Of course I am very interested in this routine and new optimizatios... I just updated from the svn server and cannot compile the new version. _TableMod6 is missing.

Anything wrong here?

Could you please post here when there is a more or less final version, so I can integrate in my programs to check the improvements in speed?

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Mon Feb 08, 2010 10:28 am

Chema wrote:Hi.

Of course I am very interested in this routine and new optimizatios... I just updated from the svn server and cannot compile the new version. _TableMod6 is missing.

Anything wrong here?
My fault. Committed display.s now too.
Could you please post here when there is a more or less final version, so I can integrate in my programs to check the improvements in speed?
That should be during today, only minor improvements from then on.
Have fun!
thrust26

User avatar
Chema
Game master
Posts: 2067
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema » Mon Feb 08, 2010 10:29 am

Thanks thanks thanks...

:)

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Mon Feb 08, 2010 10:56 am

Chema wrote:Thanks thanks thanks...

:)
Don't expect too much overall improvement, there are other things which cost a lot of time.

BTW: How did you do your tests without the double buffer? Did you calculate everything (all points, polygons etc.) before you finally erased and redraw line by line? So that the screen updates are as close together as possible?

If not, try that once please. If yes, try again with the much faster line draw code. :)
Have fun!
thrust26

User avatar
Chema
Game master
Posts: 2067
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema » Mon Feb 08, 2010 4:45 pm

thrust26 wrote: Don't expect too much overall improvement, there are other things which cost a lot of time.
I know :)

I gave it a try and noticed a couple of things. It is quite faster, and that is noticeable in the game overall speed. It is much nicer, better lines and again nice ships on sight :)

But also I found it alters attributes when drawing totally horizontal lines. Beware I have changed the eor(tmp0) with ora(tmp0) so I might have altered something in the process. Or maybe something is not working with chunking?

And finally... where did my memory go? I have lost 2K from my free space!!! Surely it is not just for the new tables, so I suppose I will need to have a look at how things are aligned and the space I am losing there, because this gives me indeed trouble.
BTW: How did you do your tests without the double buffer? Did you calculate everything (all points, polygons etc.) before you finally erased and redraw line by line? So that the screen updates are as close together as possible?

If not, try that once please. If yes, try again with the much faster line draw code. :)
Ok, I will. This means rewritting many things in the program, so I will give it a try on the ship demo code. I in fact saved the origin destination of all lines drawn and cleared them just before drawing the new ones, thus using two lists of line endings. And the flickering was horrible. But the line routine was much slower, so I will try again.

Another ugly side effect of drawing with eor is that when two lines converge, they erease each other, so some ship corners and details are lost.

And I will have to workout a similar process for circles, stars and every other in-screen info...

Regards and thanks indeed again for your help here...

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Mon Feb 08, 2010 5:02 pm

Chema wrote:I gave it a try and noticed a couple of things. It is quite faster, and that is noticeable in the game overall speed. It is much nicer, better lines and again nice ships on sight :)
Sounds good.

BTW: I checked out TINE and build it, but the result seems not to work. What am I missing?
But also I found it alters attributes when drawing totally horizontal lines. Beware I have changed the eor(tmp0) with ora(tmp0) so I might have altered something in the process. Or maybe something is not working with chunking?
I will have a look at the horizontal line code. Though I didn't touch it, other changes may have influenced it.
And finally... where did my memory go? I have lost 2K from my free space!!!
Oh, that's just me. :)
Surely it is not just for the new tables, so I suppose I will need to have a look at how things are aligned and the space I am losing there, because this gives me indeed trouble.
Understood. I am always optimizing to the limit when possible. But if you want me to save memory, I need to know what is the limit. So where would you rather save memory instead of speeding up the graphics.

How much memory do you have to work with in total? And how much is currently left?
Ok, I will. This means rewriting many things in the program, so I will give it a try on the ship demo code. I in fact saved the origin destination of all lines drawn and cleared them just before drawing the new ones, thus using two lists of line endings. And the flickering was horrible. But the line routine was much slower, so I will try again.
So did you first clear all lines and then redraw all? Or did you clear and redraw line by line?

I really would like to see your experiments with my own eyes (well, emulated), too.
Another ugly side effect of drawing with eor is that when two lines converge, they erase each other, so some ship corners and details are lost.
Yes and no. Yes, the erase each other, but when to many lines come close together, you don't get an unstructured pixel blob. With double buffering I still would go for OR, but XOR is IMO not that bad. Especially if using it give a massive speed improvement.

It's all just a matter of compromises and which ones are easier for YOU to accept.
And I will have to workout a similar process for circles, stars and every other in-screen info...
Sure, if you should decide for a change, you have to touch quite a lot of code again
Regards and thanks indeed again for your help here...
I am having fun, so it is a pleasure. :)
Have fun!
thrust26

User avatar
Chema
Game master
Posts: 2067
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema » Mon Feb 08, 2010 5:28 pm

thrust26 wrote:BTW: I checked out TINE and build it, but the result seems not to work. What am I missing?
Mmmm not sure. Did you get any error message? I am using a utility called taptap to correctly setup the filenames inside the disk. If you don't have it (which is possible) then you end up with a disk with nonamexxx.com instead of the correct name, so the game is not launched. You can launch it manually though.

I am not sure where taptap is... I think in the repository.
I will have a look at the horizontal line code. Though I didn't touch it, other changes may have influenced it.
That is quite strange... I will also have a look then. If you did not touch that code, then the bug could be there or maybe I introduced it without noticing...

EDIT Ok, it was an old bug, that has appeared again.
in draw_totaly_horizontal8 there is a code:

Code: Select all

    ldx _OtherPixelX
    sta __auto_cpx+1
which should read:

Code: Select all

    ldx _OtherPixelX
    stx __auto_cpx+1
Understood. I am always optimizing to the limit when possible. But if you want me to save memory, I need to know what is the limit. So where would you rather save memory instead of speeding up the graphics.

How much memory do you have to work with in total? And how much is currently left?
Well, I always prefer a bigger routine which is optimized to the maximum, and stick to it, but I had only 3.7K left in main memory, which I planned to use for missions, and now I have 1.6 K left, which might be a bit low.

I can use from $500 to $9fff in main memory, plus page 4 (already mostly used) and page 2 (I have plans for it). In overlay I am using nearly all of the 16K

Well, I trend to eat up all the memory, as you do, but in my case it is due my fat code... well structured, but fat :)
So did you first clear all lines and then redraw all? Or did you clear and redraw line by line?

I really would like to see your experiments with my own eyes (well, emulated), too.
I can't remember really... I think I erased all, then draw all, as I only had the rotating ship in sight. Doing it ship-by-ship could be easy, doing it line-by-line... I am not sure.
Yes and no. Yes, the erase each other, but when to many lines come close together, you don't get an unstructured pixel blob. With double buffering I still would go for OR, but XOR is IMO not that bad. Especially if using it give a massive speed improvement.

It's all just a matter of compromises and which ones are easier for YOU to accept.
You are also right here, but whenever I compare the ship pics between the 6502 eor versions and the speccy, I end up with the same conclusion: the speccy version looks nicer.

But I am biased, so I have to try. Gimme some time to sort it out and I will send you a demo with eor, right?
I am having fun, so it is a pleasure. :)
That is the spirit, but you are anyway being of great help :)

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Mon Feb 08, 2010 5:57 pm

Chema wrote:Mmmm not sure. Did you get any error message? I am using a utility called taptap to correctly setup the filenames inside the disk. If you don't have it (which is possible) then you end up with a disk with nonamexxx.com instead of the correct name, so the game is not launched. You can launch it manually though.

I am not sure where taptap is... I think in the repository.
It compiles without error into a .tap file. But shouldn't this be a .dsk-file?

Also I got the latest files (e.g. taptap) from dBug, so I should have everything I need.
EDIT Ok, it was an old bug, that has appeared again.
Fixed.
Well, I always prefer a bigger routine which is optimized to the maximum, and stick to it, but I had only 3.7K left in main memory, which I planned to use for missions, and now I have 1.6 K left, which might be a bit low.
I see. I think one page be regained by rearranging code alignments. But the code got quite a lot larger and I added 3 new 240 byte tables. Not sure why this sums up to 2.1k, I would expect maybe 1.5k.
I can use from $500 to $9fff in main memory, plus page 4 (already mostly used) and page 2 (I have plans for it). In overlay I am using nearly all of the 16K
Too many detail for my little hardware knowledge. ;)

That's ~56k, right? And you are down to just 2k now?
Well, I trend to eat up all the memory, as you do, but in my case it is due my fat code... well structured, but fat :)
You can't have both. Since I am coding for the Atari 2600, my code cannot afford much structure. Especially subroutines have to be avoided there, since each level eats up 2 bytes of the available 128 byte RAM.
I can't remember really... I think I erased all, then draw all, as I only had the rotating ship in sight. Doing it ship-by-ship could be easy, doing it line-by-line... I am not sure.
In theory ship-by-ship should cause a bit more flicker and a bit less tearing than line-by-line. I don't know your data structure, but I suppose line-by-line requires quite some reorganization. But maybe you can do this for you test code.
You are also right here, but whenever I compare the ship pics between the 6502 eor versions and the speccy, I end up with the same conclusion: the speccy version looks nicer.
Close ups, definitely. But in most cases ships are quite far away, and then the differences disappear soon.
But I am biased, so I have to try. Gimme some time to sort it out and I will send you a demo with eor, right?
OR, XOR, all things you test with, please.
Have fun!
thrust26

JamesD
Flight Lieutenant
Posts: 352
Joined: Tue Nov 07, 2006 7:38 am

Post by JamesD » Mon Feb 08, 2010 6:01 pm

I tried to build Space1999 this morning and I was missing taptap. I don't think it's in the OSDK.

The horizontal line code looks like it can be sped up. It appears to do a pixel at a time where end bytes should come from a table(s) and middle bytes should be doable 6 bits (a byte) at a shot like I mentioned before. I might tinker with it a bit.

User avatar
thrust26
Officer Cadet
Posts: 49
Joined: Wed Jan 27, 2010 7:34 pm
Location: Düsseldorf, Germany

Post by thrust26 » Mon Feb 08, 2010 6:09 pm

JamesD wrote:The horizontal line code looks like it can be sped up. It appears to do a pixel at a time where end bytes should come from a table(s) and middle bytes should be doable 6 bits (a byte) at a shot like I mentioned before. I might tinker with it a bit.
Yes, that should definitely work.

Didn't touch the code there yet, because completely horizontal (or vertical) lines should occur very rarely. I doubt the benchmark will even go down by 1, even after this optimization.
Have fun!
thrust26

JamesD
Flight Lieutenant
Posts: 352
Joined: Tue Nov 07, 2006 7:38 am

Post by JamesD » Mon Feb 08, 2010 6:19 pm

I'm getting an "Unresolved External : osdk_stack" message on the latest code so I'm dead in the water.

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests