Not-so-ultra fast division routine

Here you can ask questions or provide insights about how to use efficiently 6502 assembly code on the Oric.
Chema
Game master
Posts: 2247
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

Not-so-ultra fast division routine

Greetings.

I have also adapted S. Judd's division routine (just copied it, really... little modifications needed).

It can divide two 16-bit numbers and generate a result of 8-bit integer division and the remainder as a fraction of 256 (so .5 is 128).

He is using a predictor/corrector method, first estimating the integer result with a table of logs, then adjusting it with two tables, one of exp and another of negative exp.

He also employs the fast multiplication routine I also posted before.

Some figures now. Running a loop like:

Code: Select all

``````for (i=10;i<1000;i+=10)
for (j=10;j<1000;j+=10)
k=(int)(i/j);
``````
Takes 1130 100ths of a second. Using this routine (also from C) it takes 388. Not sure how division is implemented in the C library, though.

Here is the code:

Code: Select all

``````.zero
DIVXLO   .byt 00           ;Division: DIVX/DIVY
DIVXHI   .byt 00
DIVY     .byt 00
DIVTEMP  .byt 00           ;High byte of DY
TEMP1    .byt 00
TEMP2    .byt 00

.text
DIVSHIFT
.(
LSR DIVTEMP
ROR DIVY
LSR DIVXHI
ROR DIVXLO
.)
DIVXY
.(
LDA DIVTEMP      ;Div by 2 if dy>255
BNE DIVSHIFT

LDA DIVXHI
CMP #2
BCS DIVSHIFT     ;Or if dx>511

LSR              ;Compute dx/2
LDA DIVXLO
ROR
TAX
LDA DIVY
LSR              ;dy/2
BEQ TWOSTEP     ;If Y=1 then handle special

TAY
LDA tab_log,X        ;This is the division part
SEC
SBC tab_log,Y
BCC NEG
TAX
LDA tab_exp,X
TAX              ;Now we have int estimate

STA MultLo1
STA MultHi1
EOR #\$FF
ADC #00          ;Carry is guaranteed set
STA MultLo2
STA MultHi2
LDY DIVY
LDA (MultLo1),Y
SEC
SBC (MultLo2),Y  ;a=N*dy
STA TEMP1
LDA (MultHi1),Y
SBC (MultHi2),Y
STA TEMP2

LDA DIVXLO       ;R=dx-a
SBC TEMP1        ;C set
STA TEMP1
LDA DIVXHI
SBC TEMP2
LDA TEMP1        ;A=remainder
BCC RNEG
;If R>0 then assume R<255
;(true unless dx>500 or so)

RPOS     CMP DIVY         ;If R>=dy then
BCC DONE
L1       INX              ;a=a+1
SBC DIVY         ;R=R-dy
CMP DIVY
BCS L1
DONE                      ;Now X contains integer, A rem
;y=dy
;
; Compute remainder as a fraction of 256, i.e.
; 256*r/dy
;
; Currently, a small error may occur for large r
; (cumulative error of 1-2 pixels, up to 4 in rare cases)
;
FRACREM
STX TEMP1
TAX
BEQ ZERO
LDA tab_log,X
SEC
SBC tab_log,Y
TAX
LDA tab_negexp,X
ZERO     LDX TEMP1
RTS
;And, if R<0 then assume
;R>-255
RNEG     DEX
BCC RNEG
JMP FRACREM

NEG      LDX #00          ;Since log is monotonic, and
LDA DIVXLO       ;we /2, there is no chance
LDY DIVY
JMP FRACREM     ;of undershooting.

TWOSTEP  LDA DIVXHI       ;If Y=1
LSR
LDA DIVXLO       ;then just two steps of size
ROR              ;dx/2
TAX
LDA #255
RTS
.)
``````
And the tables:

Code: Select all

``````tab_log
.byt  \$00,\$00,\$1f,\$32,\$3f,\$4a,\$52,\$59,\$5f,\$65,\$69,\$6e,\$72,\$76,\$79,\$7c
.byt  \$7f,\$82,\$85,\$87,\$89,\$8c,\$8e,\$90,\$92,\$94,\$95,\$97,\$99,\$9a,\$9c,\$9e
.byt  \$b2,\$b3,\$b4,\$b4,\$b5,\$b6,\$b7,\$b8,\$b9,\$ba,\$ba,\$bb,\$bc,\$bd,\$bd,\$be
.byt  \$bf,\$c0,\$c0,\$c1,\$c2,\$c2,\$c3,\$c4,\$c4,\$c5,\$c6,\$c6,\$c7,\$c7,\$c8,\$c9
.byt  \$c9,\$ca,\$ca,\$cb,\$cb,\$cc,\$cc,\$cd,\$ce,\$ce,\$cf,\$cf,\$d0,\$d0,\$d1,\$d1
.byt  \$d2,\$d2,\$d2,\$d3,\$d3,\$d4,\$d4,\$d5,\$d5,\$d6,\$d6,\$d7,\$d7,\$d7,\$d8,\$d8
.byt  \$d9,\$d9,\$d9,\$da,\$da,\$db,\$db,\$db,\$dc,\$dc,\$dd,\$dd,\$dd,\$de,\$de,\$de
.byt  \$df,\$df,\$df,\$e0,\$e0,\$e1,\$e1,\$e1,\$e2,\$e2,\$e2,\$e3,\$e3,\$e3,\$e4,\$e4
.byt  \$e4,\$e5,\$e5,\$e5,\$e5,\$e6,\$e6,\$e6,\$e7,\$e7,\$e7,\$e8,\$e8,\$e8,\$e8,\$e9
.byt  \$e9,\$e9,\$ea,\$ea,\$ea,\$ea,\$eb,\$eb,\$eb,\$ec,\$ec,\$ec,\$ec,\$ed,\$ed,\$ed
.byt  \$ed,\$ee,\$ee,\$ee,\$ee,\$ef,\$ef,\$ef,\$ef,\$f0,\$f0,\$f0,\$f0,\$f1,\$f1,\$f1
.byt  \$f1,\$f2,\$f2,\$f2,\$f2,\$f3,\$f3,\$f3,\$f3,\$f4,\$f4,\$f4,\$f4,\$f4,\$f5,\$f5
.byt  \$f5,\$f5,\$f6,\$f6,\$f6,\$f6,\$f6,\$f7,\$f7,\$f7,\$f7,\$f7,\$f8,\$f8,\$f8,\$f8
.byt  \$f9,\$f9,\$f9,\$f9,\$f9,\$fa,\$fa,\$fa,\$fa,\$fa,\$fb,\$fb,\$fb,\$fb,\$fb,\$fc
.byt  \$fc,\$fc,\$fc,\$fc,\$fc,\$fd,\$fd,\$fd,\$fd,\$fd,\$fe,\$fe,\$fe,\$fe,\$fe,\$ff

tab_exp
.byt  \$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02
.byt  \$02,\$02,\$02,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03
.byt  \$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$05,\$05,\$05,\$05,\$05
.byt  \$05,\$05,\$05,\$06,\$06,\$06,\$06,\$06,\$06,\$06,\$07,\$07,\$07,\$07,\$07,\$07
.byt  \$08,\$08,\$08,\$08,\$08,\$08,\$09,\$09,\$09,\$09,\$0a,\$0a,\$0a,\$0a,\$0a,\$0b
.byt  \$0b,\$0b,\$0b,\$0c,\$0c,\$0c,\$0c,\$0d,\$0d,\$0d,\$0e,\$0e,\$0e,\$0f,\$0f,\$0f
.byt  \$10,\$10,\$10,\$11,\$11,\$11,\$12,\$12,\$13,\$13,\$14,\$14,\$14,\$15,\$15,\$16
.byt  \$16,\$17,\$17,\$18,\$18,\$19,\$1a,\$1a,\$1b,\$1b,\$1c,\$1d,\$1d,\$1e,\$1e,\$1f
.byt  \$20,\$21,\$21,\$22,\$23,\$24,\$24,\$25,\$26,\$27,\$28,\$29,\$29,\$2a,\$2b,\$2c
.byt  \$2d,\$2e,\$2f,\$30,\$31,\$33,\$34,\$35,\$36,\$37,\$38,\$3a,\$3b,\$3c,\$3e,\$3f
.byt  \$40,\$42,\$43,\$45,\$46,\$48,\$49,\$4b,\$4d,\$4e,\$50,\$52,\$54,\$56,\$57,\$59
.byt  \$5b,\$5d,\$5f,\$62,\$64,\$66,\$68,\$6a,\$6d,\$6f,\$72,\$74,\$77,\$79,\$7c,\$7f
.byt  \$82,\$84,\$87,\$8a,\$8d,\$90,\$94,\$97,\$9a,\$9e,\$a1,\$a5,\$a8,\$ac,\$b0,\$b4
.byt  \$b8,\$bc,\$c0,\$c4,\$c8,\$cd,\$d1,\$d6,\$db,\$df,\$e4,\$e9,\$ee,\$f4,\$f9,\$fe

tab_negexp
.byt  \$fe,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$01,\$01,\$01,\$01,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02
.byt  \$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$03,\$03,\$03,\$03,\$03
.byt  \$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$04,\$04,\$04,\$04,\$04
.byt  \$04,\$04,\$04,\$04,\$04,\$04,\$04,\$05,\$05,\$05,\$05,\$05,\$05,\$05,\$05,\$05
.byt  \$06,\$06,\$06,\$06,\$06,\$06,\$06,\$07,\$07,\$07,\$07,\$07,\$07,\$07,\$08,\$08
.byt  \$08,\$08,\$08,\$08,\$09,\$09,\$09,\$09,\$09,\$0a,\$0a,\$0a,\$0a,\$0a,\$0b,\$0b
.byt  \$0b,\$0b,\$0c,\$0c,\$0c,\$0c,\$0d,\$0d,\$0d,\$0e,\$0e,\$0e,\$0f,\$0f,\$0f,\$10
.byt  \$10,\$10,\$11,\$11,\$11,\$12,\$12,\$12,\$13,\$13,\$14,\$14,\$15,\$15,\$15,\$16
.byt  \$16,\$17,\$17,\$18,\$18,\$19,\$1a,\$1a,\$1b,\$1b,\$1c,\$1d,\$1d,\$1e,\$1e,\$1f
.byt  \$20,\$20,\$21,\$22,\$23,\$23,\$24,\$25,\$26,\$27,\$28,\$28,\$29,\$2a,\$2b,\$2c
.byt  \$2d,\$2e,\$2f,\$30,\$31,\$32,\$33,\$34,\$36,\$37,\$38,\$39,\$3a,\$3c,\$3d,\$3e
.byt  \$40,\$41,\$43,\$44,\$46,\$47,\$49,\$4a,\$4c,\$4d,\$4f,\$51,\$53,\$55,\$56,\$58
.byt  \$5a,\$5c,\$5e,\$60,\$62,\$65,\$67,\$69,\$6b,\$6e,\$70,\$73,\$75,\$78,\$7a,\$7d
.byt  \$b5,\$b9,\$bd,\$c1,\$c5,\$ca,\$ce,\$d3,\$d7,\$dc,\$e1,\$e6,\$eb,\$f0,\$f5,\$fa
``````

Twilighte
Game master
Posts: 819
Joined: Sat Jan 07, 2006 12:07 am
Location: Luton, UK
Contact:
This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.

Chema
Game master
Posts: 2247
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:
Twilighte wrote:This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.
Yep. It is not *really* a huge gain (3 times faster than the routine in the C library), but it is quite adequate. I would love to have something more efficient.

BTW I forgot to mention that it was designed for divisions that might fit in 8-bit. In fact, originally, it worked only for 9-bit numbers divided by 8-bit numbers, but it can handle larger numbers now, as it shiftes them (divide both by 2) until X<512 and Y<256... So it is not general.

Judd uses it in its filled polygon routine, so it should serve at least for that objective, as well as for clipping lines (looks at Dbug).

Who is online

Users browsing this forum: No registered users and 3 guests