Fast sextet shift techniques for the hires ?

The Oric video chip is not an easy beast to master, so any trick or method that allows to achieve nice visual results is welcome. Don't hesitate to comment (nicely) other people tricks and pictures :)
User avatar
Euphoric
Game master
Posts: 99
Joined: Mon Jan 09, 2006 11:33 am
Location: France

Fast sextet shift techniques for the hires ?

Post by Euphoric »

I'm posting this subject here but it is related to Chema's question about moving big software sprites in the (so-cumbersome) HIRES mode of the Oric.

As you already know, the HIRES screen is made of 200 lines of 40 bytes, but only six bits of each byte are actually displayed.

As soon as you want to shift these bits in order to have a smooth animation, it becomes a nightmare...

E.g.: let's say there's an object in memory you want to transfer to screen.
Without any shift, a sequence of bytes (a line of the object) would normally be transfered like this:

Code: Select all

    ldy #WIDTH_IN_BYTES-1
xferloop:
    lda (obj_line_ptr),y
    sta (screen_ptr),y
    dey
    bpl xferloop
So, no problem here (I'm assuming that the object already has all its 7th and 8th bits correctly set)... and of course, you can unroll the loop to save some cycles.

When you wish to shift the bytes one position to the left, it is a bit more complicated, but still manageable through the Carry flag, like in:

Code: Select all

   ldy #WIDTH_IN_BYTES-1
   clc
xferloop:
   lda (obj_line_ptr),y
   rol a                          ; shifts in the bit coming from the right
   tax
   asl a                          ; shifts the 6th bit two times to carry
   asl a                          ; so that the 6th bit can be propagated to next byte
   txa
   and #$3F                   ; keeps only the 6 bits
   ora #$40                   ; make sure the byte is not an attribute
   sta (screen_ptr),y
   dey
   bpl xferloop
But when you have to shift the objects two bit positions or more, it becomes a nightmare, and I've not even talked about masked graphics yet...

We don't have a hardware shifter like there was on the early Midway arcade machines (Space Invaders, etc). So, an obvious fast technique is to store all objects (and their masks) six times. This means that a single 24x40 sprite would require something like 2400 bytes (5x40x2x6)... 15 sprites and you eat all the memory...

So, come on, share you ideas for fast hires shift techniques on the Oric...

Cheers,

Fabrice
User avatar
Chema
Game master
Posts: 3013
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema »

This is indeed a good point. What I am currently doing is (basically):

Code: Select all

sps_noinvert
    lda (tmp4),y      ; Take scan of sprite graphic
    and #$bf           ; remove bit 6 
    sta reg10
    lda (tmp5),y       ; Take scan of mask
    eor #$bf            ; Complement it
    sta reg13
Obviously both the and and eor could be saved if: 1/ bit 6 is not set as it is currently by pictconv, and 2/ if remembered to store masks already complemented. Just need to reconvert *all* the graphics again... will do at some time.

reg* and tmp* are page zero addresses.

Then the rotation loop

Code: Select all

sps_rotate
    ;; Initializations for rotation loop
    lda #0
    sta reg11   ; This si for the graphic
    lda #$ff     ; This (reg a) is for the mask
  
    ldx tmp1
    beq end_rot

sps_looprot
    lsr reg10
    ror reg11
    
    sec
    ror reg13
    ror 
   
    dex
    bne sps_looprot
end_rot
Now we have rotated both the mask and the graphic and stored the remains (what is not going to be painted in this scan) for future use.

The next is to avoid painting when unnecessary (out of the clip box), even if it could be necessary to rotate to obtain what I called "remains"

Code: Select all

cpy scans_to_draw
    bpl sps_nopaint        
Now put all together and prepare next loop.

Code: Select all

   ;; As an scan is composed by the less-significant 6 bits 
    ;; of the byte we should rotate this two times more...
    ror 
    ror     
    lsr reg11
    ror reg11
   
    ;; Now we have:
    ;;  For the graphic:    
    ;;      in reg11 the new value for this scan
    ;;      in reg12 the value calculated previously for this scan (to be ORed)
    ;;      in reg10 the value to be ORed to the next scan to be calculated
    ;;  For the mask;
    ;;      in a the new value for this scan
    ;;      in reg15 the value calculated previosly for this scan (to be ORed)
    ;;      in reg13 the value to be ORed to the next scan to be calculated

    ; mask is already in a
    and reg15       ; Complete mask for this scan
    ora tmp+1
    and tmp         ; Mask AND screen
    sta tmp
    lda tmp+1
    eor #$3f
    sta tmp+1
    lda reg11       ; OR Graphic    
    ora reg12       ;   Complete graphic
    and tmp+1
    ora tmp

    sta (tmp0),y    ; Put everything in screen

sps_nopaint
    
    ldx reg10
    stx reg12       
    ldx reg13
    stx reg15       ; Ready for next loop

   [...]
Oh, btw, in tmp+1 (I think) there is _another_ mask needed for correctly rendering objects that may lie between two tiles.

And in addition, this function can invert graphics *on the fly* (using a table for this), so a door in the N/S walls is the same graphic than a door in the E/W walls, and the same goes for characters, thus saving a big amount of space.

I am sure quite a few things can be done here, when unrolling the rotation loop is the most notable as we will allways rotate in pairs (0,2 or 4 times).

Oh well... looking at this old code makes me tremble each time I remember I have to clean it up....
User avatar
Dbug
Site Admin
Posts: 4437
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Chema wrote:Obviously both the and and eor could be saved if: 1/ bit 6 is not set as it is currently by pictconv, and 2/ if remembered to store masks already complemented. Just need to reconvert *all* the graphics again... will do at some time.
Well, just tell me what you need in pictconv; if you need a flag that does not set the value of bit 6, I can do it :)
Any other requests ?

Are you just using PictConv, or also the whole OSDK including XA and RCC16 compiler ?

About the shifting, well I'm basicaly what Twilighte would call, hum, I forgot but it's not pretty :)

Basicaly I never rotate things in real time, I just use tables. I don't usually preshift all graphics (well I do it when I have time), I just use two 64 bytes tables that contains pre shift (on two bytes) the various potential bit patterns.

Since we have only 6 usable bits for the graphics, it means that we have only 64 different bitmap patterns:

xx000000
xx000001
xx000010
xx000011
...
xx111111

("xx" in our particular case should always be forced to 00 else instead of 64 entries we need 256)

So let's say we want to shift "xx111111" by one bit to the right, it means that we are moving from this:

xx111111 xx000000

to that:

xx011111 xx100000

So for one 6 bits values, we generate two 6 bits values. Thus the need for two 64 bits entry tables.

The code become quite simple:

Code: Select all

ldx Sprite+0
lda TableShift1_Left,x
sta Screen+0
lda TableShift1_Right,x
sta Screen+1
Now of course this is not as simple, because our sprite is not one byte long, all the bytes have to be shifted and merged accordingly.

If our sprite is like this:

xx101111 xx111101

after shifting one bit to the right we should get that:

xx010111 xx111110 xx100000

Using our two 64 bytes entry tables, we find that the two two bytes combinations for each byte of our sprite:
xx101111 => xx010111 xx100000
xx111101 => xx011110 xx100000

and they have to be assembled this way:
xx010111 xx100000
xx011110 xx100000
====================
xx010111 xx111110 xx100000

So our code for the two bytes sprite become something like that:

Code: Select all

ldx Sprite+0
lda TableShift1_Left,x
sta Screen+0
lda TableShift1_Right,x
ldx Sprite+1
ora TableShift1_Left,x
sta Screen+1
lda TableShift1_Right,x
sta Screen+2
Of course, in the 64 bytes tables, you have to set correctly the two top bits with something decent, like for example "01" if you want to draw directly on screen :)
User avatar
Chema
Game master
Posts: 3013
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema »

Dbug wrote: Well, just tell me what you need in pictconv; if you need a flag that does not set the value of bit 6, I can do it :)
Any other requests ?
That would be very nice indeed!, thanks! I also considered writting a program that uses pictconv on a definition file, so all the graphics are converted and the correct files (with data+structures+some #defines) created to directly use thim in WHITE+NOISE.
Dbug wrote:Are you just using PictConv, or also the whole OSDK including XA and RCC16 compiler ?
I am using the whole OSDK indeed, from within UltraEdit, which is very very confortable.
Dbug wrote:Basicaly I never rotate things in real time, I just use tables.
[...]
This is an excellent idea! It is quite difficult to think all the time in Look-up tables, when you are used to modern systems and compilers.

I am terrible at counting cycles, which could be the improvement? I know rotating is quite slow, but if you have to go and fetch something+store it per each rotation, isn't it slower in the end?

Maybe we could benefit from the fact that rotations are mostly done in pairs, so you can have two tables for each scan value: one with 2 rotations and another with 4.

Cheers
User avatar
Dbug
Site Admin
Posts: 4437
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Post by Dbug »

Chema wrote:
Dbug wrote: Well, just tell me what you need in pictconv; if you need a flag that does not set the value of bit 6, I can do it :)
Any other requests ?
That would be very nice indeed!, thanks! I also considered writting a program that uses pictconv on a definition file, so all the graphics are converted and the correct files (with data+structures+some #defines) created to directly use thim in WHITE+NOISE.
Yep, it's what I'm doing in my programs. I have a "osdk_makedata.bat" file that do a bunch of data conversions, like this:

Code: Select all

@ECHO OFF
%OSDK%\bin\PictConv -f0 -d0 -o2 picture\connectors_bw.png %OSDK%\tmp\picture.hir
%OSDK%\bin\FilePack -p %OSDK%\tmp\picture.hir %OSDK%\tmp\picture.pak
%OSDK%\bin\FilePack -u %OSDK%\tmp\picture.pak %OSDK%\tmp\pictureu.hir 
%OSDK%\bin\Bin2Txt -s1 -f2 %OSDK%\tmp\picture.pak picture.s _LabelPicture
I will try to do ASAP a new version of PictConv with the two following improvements:
- Have a flag to control the fact that bit 6 of generated data is set or not
- Have the possibility to replace the _LabelPicture by some user defined name when exporting in source code mode.

Chema wrote:I am using the whole OSDK indeed, from within UltraEdit, which is very very confortable.
Cool :) One more user, one !
Seriously, don't hesitate to provide comments, bug reports/fixes for any part of the toolset. The whole idea is to get something that get better and better.
Chema wrote:I am terrible at counting cycles, which could be the improvement? I know rotating is quite slow, but if you have to go and fetch something+store it per each rotation, isn't it slower in the end?
Well, I can see a number of parameters that make it interesting to use:
- You don't have to deal with masking the bits 6 and 7 any more (because it's in the table data)
- You have the same CPU time taken whateve the amount of shift you perform because you can have 6 sets of tables, one for each amount of bits to rotate, for a grand total of 6x64x2=768 bytes :)
User avatar
Twilighte
Game master
Posts: 819
Joined: Sat Jan 07, 2006 12:07 am
Location: Luton, UK
Contact:

Post by Twilighte »

One interesting proposition is to use the 3:1 aspect ratio format instead of 2:1, this way you may get slightly cruder movement (Steps of 3 instead of 2), but the benefit (If using tables) would be speed for shifting to correct position (only one shift process ever done).
Apologies if i keep harking on about 3:1 aspect ratio, i do realise this is a very odd aspect, since no other 8-bit and possibly 16 bit could achieve the same aspect as the orics 6 pixel wide byte.
However this aspect (so far) keeps coming back with advantages. :)
User avatar
Twilighte
Game master
Posts: 819
Joined: Sat Jan 07, 2006 12:07 am
Location: Luton, UK
Contact:

Post by Twilighte »

From that previous post, i seemed to have terminated the topic a bit. However i was recently trying to work out how to shift by 3 bits without the penalty of slowing down the cpu too much.
I don't completely understand what was said above, but here is my two pennies worth of code (perhaps it could be adapted to Chema's stuff?

Code: Select all

shift_3bitright
	;
	; Convert object width to gfx buffer width (object width + 1)
	;
	ldx object_width
	inx
	stx gfxbuffer_width
	;
	; Fetch gfx buffer base address
	;
	lda #<GFXBuffer
	sta gfx_buffer
	lda #>GFXBuffer
	sta gfx_buffer+1
	;
	; Fetch and store gfx buffer height
	;
	lda object_height
	sta row_counter
loop2	;
	; Reset column index
	;
	ldy #00
	;
	; Reset previous columns bits for each new row
	;
	ldx #00
loop1	;
	; Fetch the current byte
	;
	lda (gfx_buffer),y
	;
	; Store it as low byte of table
	;
	sta vector1+1
	;
	; table converts Bits 0,1,2 to 3,4,5
	;
vector1	lda bot2top_shifttable		;B0-2 >> B3-5             (64 bytes)
	;
	; Using byte from previous column, table converts 3,4,5 to 0,1,2 which is combined
	; Table also adds b6 (Normal bitmap flag)
	;
	ora top2bot_shifttable,x	;B3-5(X) >> B0-2 (+B6)	  (64 Bytes)
	;
	; Store back in current byte of Graphic
	;
	sta (gfx_buffer),y
	;
	;place current byte into x for next column to pick up
	;
	ldx vector1+1
	;
	; Advance to next column
	;
	iny
	;
	; Until end
	;
	cpy gfxbuffer_width
	bcc loop1
	;
	; Now move onto next row
	;
	lda gfx_buffer
	clc
	adc gfxbuffer_width
	sta gfx_buffer
	bcc skip1
	inc gfx_buffer+1
skip1	;
	; Decrement row counter
	;
	dec row_counter
	;
	; Until zero
	;
	bne loop2
	;
	; All Done
	;
	rts
Please note Graphics are stored to GFX buffer without b6 set.
And here are those all important 64 byte tables...

Code: Select all

bot2top_shifttable	;B0-2 >> B3-5             (64 bytes)
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
 .byt 0,8,16,24,32,40,48,56
top2bot_shifttable	;B3-5(X) >> B0-2 (+B6)	  (64 Bytes)
 .byt 0+64,0+64,0+64,0+64,0+64,0+64,0+64,0+64
 .byt 1+64,1+64,1+64,1+64,1+64,1+64,1+64,1+64
 .byt 2+64,2+64,2+64,2+64,2+64,2+64,2+64,2+64
 .byt 3+64,3+64,3+64,3+64,3+64,3+64,3+64,3+64
 .byt 4+64,4+64,4+64,4+64,4+64,4+64,4+64,4+64
 .byt 5+64,5+64,5+64,5+64,5+64,5+64,5+64,5+64
 .byt 6+64,6+64,6+64,6+64,6+64,6+64,6+64,6+64
 .byt 7+64,7+64,7+64,7+64,7+64,7+64,7+64,7+64
This routine would benefit slightly by being stored in zero page (just a tad faster).
Post Reply