Basically, I made 64 copies of the volume conversion table (so the "and #3" is not necessary anymore), I moved the code that sets the PSG to accept data on register 8 out of the main loop, and I swapped the bits around to make it more efficient to extract the two bit values.
The inner loop now looks like that:
Code: Select all
PlaySample:
stx RES
sty RES+1
sta _auto_end_sample_check+1
loop_read_page:
ldy #0
lda (RES),y
sta RESB
ldy #4
loop_decode_byte:
lda RESB
tax
lsr
lsr
sta RESB
lda TableVolumeConversionData,x
sta $030F
lda #$FD
sta $030C
lda #$DD
sta $030C
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
dey
bne loop_decode_byte
inc RES
bne skip_high_byte
inc RES+1
skip_high_byte:
lda RES+1
_auto_end_sample_check
cmp #$12 ; Self modified based on which samples play
bne loop_read_page
rts
What that means, is that it's probably viable to put the routine in an interrupt so other things can be done at the same time.
PS: There are other optimizations possible, I just wanted to check the main "bang for the buck" ones