Polygon Routine - PolyBench is here

Here you can ask questions or provide insights about how to use efficiently 6502 assembly code on the Oric.
User avatar
Dbug
Site Admin
Posts: 4437
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Polygon Routine - PolyBench is here

Post by Dbug »

Hello.

Some time ago (well, can call that YEARS) I did some attempts at drawing polygons on screen, and since some people are looking into vector graphics I think it's worth trying to do with this routine what we did with the LineBench thing.

I modified my old routine to add a timer counter which allow people to modify the code and see if it's faster or not. Just by doing that I found out a huge number of places where the code can be optimised, and I will probably release a version 2 with new numbers quite soon, but still, here it is, in all it's non optimised glory, PolyBench !

Image

The code itself is here:
http://www.defence-force.org/ftp/forum/ ... ench-1.zip
(The demo TAPE file is in the Build folder)

Here is some comments about the code:
  • What the ClearAndSwapFlag is doing, is actually to load X and Y with a ink color attribute each (one that matches the paper color, the other one that did not), and then execute a serie of STX $A000,STY $A000+40,STX $A000+40*2, etc...
    It's really a quick show/hide scanlines.
  • Then a second pass runs through each of the 100 odd or even lines, and erase the content of the line.
    This could definitely be improved a lot by having 100 STA $axxx,X, and then erase from X=39 to X=1 to erase even lines, and X from 79 to 41 to erase odd lines. (not 0 and 40, we should not erase the color attribute)

    Huge boost I expect !
  • One of the reasons about why it's very slow, is that the AddTriangle is actually a C routine, that calls three time a AddLineAsm routine, and finally call the filler itself (FillTablesASM).
    Having the whole AddTriangle in assembler would speedup things significantly
  • X0,Y0,X1,Y1 are not in zero page, they are bog standard globals. Will never win any speed contest :)
  • The AddLineAsm is quite a old routine, it predates the one used in LineBench, the only difference is that this one always draws segments vertically. The idea is to update the MinX/MaxX table with the position in X. This will be used later to draw horizontal segments. This routine is also not efficient because it always check if x is most in the left or most in the right. Normaly with a well constructed triangle routine you know before hand if a line is on the left, or on the right. That's a lot faster. And for the first line you don't even have to compare anything at all. Just store the coordinates.

    go_compute_left and go_compute_right are essentially the same routine, just one doing doing dex and the other one inx... yeah yeah I know... old code :p
  • FillTablesASM is the actually scanline drawing code. It goes from top to bottom, and for each scanline draws a segment from minx to maxx. Managing to optimise that one woud speed up the whole thing a lot.
Have fun !
User avatar
Dbug
Site Admin
Posts: 4437
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Made some optimisations yesterday and this morning (mostly did what I indicated in the comments in the first post), and here is the results:

Image

The code itself is here:
http://www.defence-force.org/ftp/forum/ ... ench-2.zip

The first version was showing 3871 3407 3546 as benchmark results (time taken to render a batch of 200 frames of animation), and the new version takes 3230 2769 2906, which is about a 16% speed increase :)

To get that, I moved some things to zero page, but mostly the big speed up was the improvement in the ClearAndSwapFlag routine.

I did some clean up in the code as well to make it a bit more readable, but there is still a lot of things to do :)

I really wonder how much faster it can be in the end !
User avatar
Dbug
Site Admin
Posts: 4437
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Made some more optimisations this evening. Here is the results:

Image

The code itself is here:
http://www.defence-force.org/ftp/forum/ ... ench-3.zip

The previous version was showing 3230 2769 2906 as benchmark results (time taken to render a batch of 200 frames of animation), and the new version takes 3049 2640 2774, not as impressive I must say, but every single optimisation and simplification is good to take :)

Comments are welcome ;)
Post Reply