Not-so-ultra fast division routine

Here you can ask questions or provide insights about how to use efficiently 6502 assembly code on the Oric.
Post Reply
User avatar
Chema
Game master
Posts: 1974
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Not-so-ultra fast division routine

Post by Chema » Tue Apr 15, 2008 2:48 pm

Greetings.

I have also adapted S. Judd's division routine (just copied it, really... little modifications needed).

It can divide two 16-bit numbers and generate a result of 8-bit integer division and the remainder as a fraction of 256 (so .5 is 128).

He is using a predictor/corrector method, first estimating the integer result with a table of logs, then adjusting it with two tables, one of exp and another of negative exp.

He also employs the fast multiplication routine I also posted before.

Some figures now. Running a loop like:

Code: Select all

for (i=10;i<1000;i+=10) 
     for (j=10;j<1000;j+=10)
        k=(int)(i/j);
Takes 1130 100ths of a second. Using this routine (also from C) it takes 388. Not sure how division is implemented in the C library, though.

Here is the code:

Code: Select all

.zero
DIVXLO   .byt 00           ;Division: DIVX/DIVY
DIVXHI   .byt 00
DIVY     .byt 00
DIVTEMP  .byt 00           ;High byte of DY
TEMP1    .byt 00
TEMP2    .byt 00

.text
DIVSHIFT
.( 
         LSR DIVTEMP
         ROR DIVY
         LSR DIVXHI
         ROR DIVXLO
.)
DIVXY
.(   
         LDA DIVTEMP      ;Div by 2 if dy>255
         BNE DIVSHIFT

         LDA DIVXHI
         CMP #2
         BCS DIVSHIFT     ;Or if dx>511

         LSR              ;Compute dx/2
         LDA DIVXLO
         ROR
         TAX
         LDA DIVY
         LSR              ;dy/2
         BEQ TWOSTEP     ;If Y=1 then handle special

         TAY
         LDA tab_log,X        ;This is the division part
         SEC
         SBC tab_log,Y
         BCC NEG
         TAX
         LDA tab_exp,X
         TAX              ;Now we have int estimate

         STA MultLo1
         STA MultHi1
         EOR #$FF
         ADC #00          ;Carry is guaranteed set
         STA MultLo2
         STA MultHi2
         LDY DIVY
         LDA (MultLo1),Y
         SEC
         SBC (MultLo2),Y  ;a=N*dy
         STA TEMP1
         LDA (MultHi1),Y
         SBC (MultHi2),Y
         STA TEMP2

         LDA DIVXLO       ;R=dx-a
         SBC TEMP1        ;C set
         STA TEMP1
         LDA DIVXHI
         SBC TEMP2
         LDA TEMP1        ;A=remainder
         BCC RNEG
                          ;If R>0 then assume R<255
                          ;(true unless dx>500 or so)

RPOS     CMP DIVY         ;If R>=dy then
         BCC DONE
L1       INX              ;a=a+1
         SBC DIVY         ;R=R-dy
         CMP DIVY
         BCS L1
DONE                      ;Now X contains integer, A rem
                          ;y=dy
;
; Compute remainder as a fraction of 256, i.e.
; 256*r/dy
;
; Currently, a small error may occur for large r
; (cumulative error of 1-2 pixels, up to 4 in rare cases)
;
FRACREM 
         STX TEMP1
         TAX
         BEQ ZERO
         LDA tab_log,X
         SEC
         SBC tab_log,Y
         TAX
         LDA tab_negexp,X
ZERO     LDX TEMP1
         RTS
                          ;And, if R<0 then assume
                          ;R>-255
RNEG     DEX
         ADC DIVY
         BCC RNEG
         JMP FRACREM

NEG      LDX #00          ;Since log is monotonic, and
         LDA DIVXLO       ;we /2, there is no chance
         LDY DIVY
         JMP FRACREM     ;of undershooting.

TWOSTEP  LDA DIVXHI       ;If Y=1
         LSR
         LDA DIVXLO       ;then just two steps of size
         ROR              ;dx/2
         TAX
         LDA #255
         RTS
.)
And the tables:

Code: Select all

tab_log
    .byt  $00,$00,$1f,$32,$3f,$4a,$52,$59,$5f,$65,$69,$6e,$72,$76,$79,$7c
    .byt  $7f,$82,$85,$87,$89,$8c,$8e,$90,$92,$94,$95,$97,$99,$9a,$9c,$9e
    .byt  $9f,$a0,$a2,$a3,$a4,$a6,$a7,$a8,$a9,$aa,$ac,$ad,$ae,$af,$b0,$b1
    .byt  $b2,$b3,$b4,$b4,$b5,$b6,$b7,$b8,$b9,$ba,$ba,$bb,$bc,$bd,$bd,$be
    .byt  $bf,$c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4,$c5,$c6,$c6,$c7,$c7,$c8,$c9
    .byt  $c9,$ca,$ca,$cb,$cb,$cc,$cc,$cd,$ce,$ce,$cf,$cf,$d0,$d0,$d1,$d1
    .byt  $d2,$d2,$d2,$d3,$d3,$d4,$d4,$d5,$d5,$d6,$d6,$d7,$d7,$d7,$d8,$d8
    .byt  $d9,$d9,$d9,$da,$da,$db,$db,$db,$dc,$dc,$dd,$dd,$dd,$de,$de,$de
    .byt  $df,$df,$df,$e0,$e0,$e1,$e1,$e1,$e2,$e2,$e2,$e3,$e3,$e3,$e4,$e4
    .byt  $e4,$e5,$e5,$e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7,$e8,$e8,$e8,$e8,$e9
    .byt  $e9,$e9,$ea,$ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec,$ec,$ec,$ed,$ed,$ed
    .byt  $ed,$ee,$ee,$ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0,$f0,$f0,$f1,$f1,$f1
    .byt  $f1,$f2,$f2,$f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4,$f4,$f4,$f4,$f5,$f5
    .byt  $f5,$f5,$f6,$f6,$f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7,$f8,$f8,$f8,$f8
    .byt  $f9,$f9,$f9,$f9,$f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb,$fb,$fb,$fb,$fc
    .byt  $fc,$fc,$fc,$fc,$fc,$fd,$fd,$fd,$fd,$fd,$fe,$fe,$fe,$fe,$fe,$ff

tab_exp
    .byt  $01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01
    .byt  $01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01
    .byt  $02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02
    .byt  $02,$02,$02,$03,$03,$03,$03,$03,$03,$03,$03,$03,$03,$03,$03,$03
    .byt  $04,$04,$04,$04,$04,$04,$04,$04,$04,$04,$04,$05,$05,$05,$05,$05
    .byt  $05,$05,$05,$06,$06,$06,$06,$06,$06,$06,$07,$07,$07,$07,$07,$07
    .byt  $08,$08,$08,$08,$08,$08,$09,$09,$09,$09,$0a,$0a,$0a,$0a,$0a,$0b
    .byt  $0b,$0b,$0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d,$0e,$0e,$0e,$0f,$0f,$0f
    .byt  $10,$10,$10,$11,$11,$11,$12,$12,$13,$13,$14,$14,$14,$15,$15,$16
    .byt  $16,$17,$17,$18,$18,$19,$1a,$1a,$1b,$1b,$1c,$1d,$1d,$1e,$1e,$1f
    .byt  $20,$21,$21,$22,$23,$24,$24,$25,$26,$27,$28,$29,$29,$2a,$2b,$2c
    .byt  $2d,$2e,$2f,$30,$31,$33,$34,$35,$36,$37,$38,$3a,$3b,$3c,$3e,$3f
    .byt  $40,$42,$43,$45,$46,$48,$49,$4b,$4d,$4e,$50,$52,$54,$56,$57,$59
    .byt  $5b,$5d,$5f,$62,$64,$66,$68,$6a,$6d,$6f,$72,$74,$77,$79,$7c,$7f
    .byt  $82,$84,$87,$8a,$8d,$90,$94,$97,$9a,$9e,$a1,$a5,$a8,$ac,$b0,$b4
    .byt  $b8,$bc,$c0,$c4,$c8,$cd,$d1,$d6,$db,$df,$e4,$e9,$ee,$f4,$f9,$fe


tab_negexp
    .byt  $fe,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01,$01
    .byt  $01,$01,$01,$01,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02
    .byt  $02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$02,$03,$03,$03,$03,$03
    .byt  $03,$03,$03,$03,$03,$03,$03,$03,$03,$03,$03,$04,$04,$04,$04,$04
    .byt  $04,$04,$04,$04,$04,$04,$04,$05,$05,$05,$05,$05,$05,$05,$05,$05
    .byt  $06,$06,$06,$06,$06,$06,$06,$07,$07,$07,$07,$07,$07,$07,$08,$08
    .byt  $08,$08,$08,$08,$09,$09,$09,$09,$09,$0a,$0a,$0a,$0a,$0a,$0b,$0b
    .byt  $0b,$0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d,$0e,$0e,$0e,$0f,$0f,$0f,$10
    .byt  $10,$10,$11,$11,$11,$12,$12,$12,$13,$13,$14,$14,$15,$15,$15,$16
    .byt  $16,$17,$17,$18,$18,$19,$1a,$1a,$1b,$1b,$1c,$1d,$1d,$1e,$1e,$1f
    .byt  $20,$20,$21,$22,$23,$23,$24,$25,$26,$27,$28,$28,$29,$2a,$2b,$2c
    .byt  $2d,$2e,$2f,$30,$31,$32,$33,$34,$36,$37,$38,$39,$3a,$3c,$3d,$3e
    .byt  $40,$41,$43,$44,$46,$47,$49,$4a,$4c,$4d,$4f,$51,$53,$55,$56,$58
    .byt  $5a,$5c,$5e,$60,$62,$65,$67,$69,$6b,$6e,$70,$73,$75,$78,$7a,$7d
    .byt  $80,$83,$85,$88,$8b,$8e,$92,$95,$98,$9b,$9f,$a2,$a6,$a9,$ad,$b1
    .byt  $b5,$b9,$bd,$c1,$c5,$ca,$ce,$d3,$d7,$dc,$e1,$e6,$eb,$f0,$f5,$fa

User avatar
Twilighte
Game master
Posts: 819
Joined: Sat Jan 07, 2006 12:07 am
Location: Luton, UK
Contact:

Post by Twilighte » Tue Apr 15, 2008 5:14 pm

This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.

User avatar
Chema
Game master
Posts: 1974
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Post by Chema » Tue Apr 15, 2008 5:41 pm

Twilighte wrote:This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.
Yep. It is not *really* a huge gain (3 times faster than the routine in the C library), but it is quite adequate. I would love to have something more efficient.

BTW I forgot to mention that it was designed for divisions that might fit in 8-bit. In fact, originally, it worked only for 9-bit numbers divided by 8-bit numbers, but it can handle larger numbers now, as it shiftes them (divide both by 2) until X<512 and Y<256... So it is not general.

Judd uses it in its filled polygon routine, so it should serve at least for that objective, as well as for clipping lines (looks at Dbug).

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest