## Not-so-ultra fast division routine

Here you can ask questions or provide insights about how to use efficiently 6502 assembly code on the Oric.
Chema
Game master
Posts: 2380
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:

### Not-so-ultra fast division routine

Greetings.

I have also adapted S. Judd's division routine (just copied it, really... little modifications needed).

It can divide two 16-bit numbers and generate a result of 8-bit integer division and the remainder as a fraction of 256 (so .5 is 128).

He is using a predictor/corrector method, first estimating the integer result with a table of logs, then adjusting it with two tables, one of exp and another of negative exp.

He also employs the fast multiplication routine I also posted before.

Some figures now. Running a loop like:

Code: Select all

``````for (i=10;i<1000;i+=10)
for (j=10;j<1000;j+=10)
k=(int)(i/j);
``````
Takes 1130 100ths of a second. Using this routine (also from C) it takes 388. Not sure how division is implemented in the C library, though.

Here is the code:

Code: Select all

``````.zero
DIVXLO   .byt 00           ;Division: DIVX/DIVY
DIVXHI   .byt 00
DIVY     .byt 00
DIVTEMP  .byt 00           ;High byte of DY
TEMP1    .byt 00
TEMP2    .byt 00

.text
DIVSHIFT
.(
LSR DIVTEMP
ROR DIVY
LSR DIVXHI
ROR DIVXLO
.)
DIVXY
.(
LDA DIVTEMP      ;Div by 2 if dy>255
BNE DIVSHIFT

LDA DIVXHI
CMP #2
BCS DIVSHIFT     ;Or if dx>511

LSR              ;Compute dx/2
LDA DIVXLO
ROR
TAX
LDA DIVY
LSR              ;dy/2
BEQ TWOSTEP     ;If Y=1 then handle special

TAY
LDA tab_log,X        ;This is the division part
SEC
SBC tab_log,Y
BCC NEG
TAX
LDA tab_exp,X
TAX              ;Now we have int estimate

STA MultLo1
STA MultHi1
EOR #\$FF
ADC #00          ;Carry is guaranteed set
STA MultLo2
STA MultHi2
LDY DIVY
LDA (MultLo1),Y
SEC
SBC (MultLo2),Y  ;a=N*dy
STA TEMP1
LDA (MultHi1),Y
SBC (MultHi2),Y
STA TEMP2

LDA DIVXLO       ;R=dx-a
SBC TEMP1        ;C set
STA TEMP1
LDA DIVXHI
SBC TEMP2
LDA TEMP1        ;A=remainder
BCC RNEG
;If R>0 then assume R<255
;(true unless dx>500 or so)

RPOS     CMP DIVY         ;If R>=dy then
BCC DONE
L1       INX              ;a=a+1
SBC DIVY         ;R=R-dy
CMP DIVY
BCS L1
DONE                      ;Now X contains integer, A rem
;y=dy
;
; Compute remainder as a fraction of 256, i.e.
; 256*r/dy
;
; Currently, a small error may occur for large r
; (cumulative error of 1-2 pixels, up to 4 in rare cases)
;
FRACREM
STX TEMP1
TAX
BEQ ZERO
LDA tab_log,X
SEC
SBC tab_log,Y
TAX
LDA tab_negexp,X
ZERO     LDX TEMP1
RTS
;And, if R<0 then assume
;R>-255
RNEG     DEX
BCC RNEG
JMP FRACREM

NEG      LDX #00          ;Since log is monotonic, and
LDA DIVXLO       ;we /2, there is no chance
LDY DIVY
JMP FRACREM     ;of undershooting.

TWOSTEP  LDA DIVXHI       ;If Y=1
LSR
LDA DIVXLO       ;then just two steps of size
ROR              ;dx/2
TAX
LDA #255
RTS
.)
``````
And the tables:

Code: Select all

``````tab_log
.byt  \$00,\$00,\$1f,\$32,\$3f,\$4a,\$52,\$59,\$5f,\$65,\$69,\$6e,\$72,\$76,\$79,\$7c
.byt  \$7f,\$82,\$85,\$87,\$89,\$8c,\$8e,\$90,\$92,\$94,\$95,\$97,\$99,\$9a,\$9c,\$9e
.byt  \$b2,\$b3,\$b4,\$b4,\$b5,\$b6,\$b7,\$b8,\$b9,\$ba,\$ba,\$bb,\$bc,\$bd,\$bd,\$be
.byt  \$bf,\$c0,\$c0,\$c1,\$c2,\$c2,\$c3,\$c4,\$c4,\$c5,\$c6,\$c6,\$c7,\$c7,\$c8,\$c9
.byt  \$c9,\$ca,\$ca,\$cb,\$cb,\$cc,\$cc,\$cd,\$ce,\$ce,\$cf,\$cf,\$d0,\$d0,\$d1,\$d1
.byt  \$d2,\$d2,\$d2,\$d3,\$d3,\$d4,\$d4,\$d5,\$d5,\$d6,\$d6,\$d7,\$d7,\$d7,\$d8,\$d8
.byt  \$d9,\$d9,\$d9,\$da,\$da,\$db,\$db,\$db,\$dc,\$dc,\$dd,\$dd,\$dd,\$de,\$de,\$de
.byt  \$df,\$df,\$df,\$e0,\$e0,\$e1,\$e1,\$e1,\$e2,\$e2,\$e2,\$e3,\$e3,\$e3,\$e4,\$e4
.byt  \$e4,\$e5,\$e5,\$e5,\$e5,\$e6,\$e6,\$e6,\$e7,\$e7,\$e7,\$e8,\$e8,\$e8,\$e8,\$e9
.byt  \$e9,\$e9,\$ea,\$ea,\$ea,\$ea,\$eb,\$eb,\$eb,\$ec,\$ec,\$ec,\$ec,\$ed,\$ed,\$ed
.byt  \$ed,\$ee,\$ee,\$ee,\$ee,\$ef,\$ef,\$ef,\$ef,\$f0,\$f0,\$f0,\$f0,\$f1,\$f1,\$f1
.byt  \$f1,\$f2,\$f2,\$f2,\$f2,\$f3,\$f3,\$f3,\$f3,\$f4,\$f4,\$f4,\$f4,\$f4,\$f5,\$f5
.byt  \$f5,\$f5,\$f6,\$f6,\$f6,\$f6,\$f6,\$f7,\$f7,\$f7,\$f7,\$f7,\$f8,\$f8,\$f8,\$f8
.byt  \$f9,\$f9,\$f9,\$f9,\$f9,\$fa,\$fa,\$fa,\$fa,\$fa,\$fb,\$fb,\$fb,\$fb,\$fb,\$fc
.byt  \$fc,\$fc,\$fc,\$fc,\$fc,\$fd,\$fd,\$fd,\$fd,\$fd,\$fe,\$fe,\$fe,\$fe,\$fe,\$ff

tab_exp
.byt  \$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02
.byt  \$02,\$02,\$02,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03
.byt  \$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$04,\$05,\$05,\$05,\$05,\$05
.byt  \$05,\$05,\$05,\$06,\$06,\$06,\$06,\$06,\$06,\$06,\$07,\$07,\$07,\$07,\$07,\$07
.byt  \$08,\$08,\$08,\$08,\$08,\$08,\$09,\$09,\$09,\$09,\$0a,\$0a,\$0a,\$0a,\$0a,\$0b
.byt  \$0b,\$0b,\$0b,\$0c,\$0c,\$0c,\$0c,\$0d,\$0d,\$0d,\$0e,\$0e,\$0e,\$0f,\$0f,\$0f
.byt  \$10,\$10,\$10,\$11,\$11,\$11,\$12,\$12,\$13,\$13,\$14,\$14,\$14,\$15,\$15,\$16
.byt  \$16,\$17,\$17,\$18,\$18,\$19,\$1a,\$1a,\$1b,\$1b,\$1c,\$1d,\$1d,\$1e,\$1e,\$1f
.byt  \$20,\$21,\$21,\$22,\$23,\$24,\$24,\$25,\$26,\$27,\$28,\$29,\$29,\$2a,\$2b,\$2c
.byt  \$2d,\$2e,\$2f,\$30,\$31,\$33,\$34,\$35,\$36,\$37,\$38,\$3a,\$3b,\$3c,\$3e,\$3f
.byt  \$40,\$42,\$43,\$45,\$46,\$48,\$49,\$4b,\$4d,\$4e,\$50,\$52,\$54,\$56,\$57,\$59
.byt  \$5b,\$5d,\$5f,\$62,\$64,\$66,\$68,\$6a,\$6d,\$6f,\$72,\$74,\$77,\$79,\$7c,\$7f
.byt  \$82,\$84,\$87,\$8a,\$8d,\$90,\$94,\$97,\$9a,\$9e,\$a1,\$a5,\$a8,\$ac,\$b0,\$b4
.byt  \$b8,\$bc,\$c0,\$c4,\$c8,\$cd,\$d1,\$d6,\$db,\$df,\$e4,\$e9,\$ee,\$f4,\$f9,\$fe

tab_negexp
.byt  \$fe,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01,\$01
.byt  \$01,\$01,\$01,\$01,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02
.byt  \$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$02,\$03,\$03,\$03,\$03,\$03
.byt  \$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$03,\$04,\$04,\$04,\$04,\$04
.byt  \$04,\$04,\$04,\$04,\$04,\$04,\$04,\$05,\$05,\$05,\$05,\$05,\$05,\$05,\$05,\$05
.byt  \$06,\$06,\$06,\$06,\$06,\$06,\$06,\$07,\$07,\$07,\$07,\$07,\$07,\$07,\$08,\$08
.byt  \$08,\$08,\$08,\$08,\$09,\$09,\$09,\$09,\$09,\$0a,\$0a,\$0a,\$0a,\$0a,\$0b,\$0b
.byt  \$0b,\$0b,\$0c,\$0c,\$0c,\$0c,\$0d,\$0d,\$0d,\$0e,\$0e,\$0e,\$0f,\$0f,\$0f,\$10
.byt  \$10,\$10,\$11,\$11,\$11,\$12,\$12,\$12,\$13,\$13,\$14,\$14,\$15,\$15,\$15,\$16
.byt  \$16,\$17,\$17,\$18,\$18,\$19,\$1a,\$1a,\$1b,\$1b,\$1c,\$1d,\$1d,\$1e,\$1e,\$1f
.byt  \$20,\$20,\$21,\$22,\$23,\$23,\$24,\$25,\$26,\$27,\$28,\$28,\$29,\$2a,\$2b,\$2c
.byt  \$2d,\$2e,\$2f,\$30,\$31,\$32,\$33,\$34,\$36,\$37,\$38,\$39,\$3a,\$3c,\$3d,\$3e
.byt  \$40,\$41,\$43,\$44,\$46,\$47,\$49,\$4a,\$4c,\$4d,\$4f,\$51,\$53,\$55,\$56,\$58
.byt  \$5a,\$5c,\$5e,\$60,\$62,\$65,\$67,\$69,\$6b,\$6e,\$70,\$73,\$75,\$78,\$7a,\$7d
.byt  \$b5,\$b9,\$bd,\$c1,\$c5,\$ca,\$ce,\$d3,\$d7,\$dc,\$e1,\$e6,\$eb,\$f0,\$f5,\$fa
``````

Twilighte
Game master
Posts: 819
Joined: Sat Jan 07, 2006 12:07 am
Location: Luton, UK
Contact:
This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.

Chema
Game master
Posts: 2380
Joined: Tue Jan 17, 2006 10:55 am
Location: Gijón, SPAIN
Contact:
Twilighte wrote:This looks similar to the division routine i used many years ago for converting 16 bit SID pitch to 12 bit AY pitch for the SID play project.
I remember its clever use of shifting to minimise the number of division cycles. Looks like the same thing done here.
It truelly reduced the time, from 100% cpu to produce a very slow SID conversion to an estimated 45% cpu and a very accurate SID conversion.
I was even able to do double speed tunes.
Yep. It is not *really* a huge gain (3 times faster than the routine in the C library), but it is quite adequate. I would love to have something more efficient.

BTW I forgot to mention that it was designed for divisions that might fit in 8-bit. In fact, originally, it worked only for 9-bit numbers divided by 8-bit numbers, but it can handle larger numbers now, as it shiftes them (divide both by 2) until X<512 and Y<256... So it is not general.

Judd uses it in its filled polygon routine, so it should serve at least for that objective, as well as for clipping lines (looks at Dbug).