A simple diskdrive IRQ-loader dissected by Cadaver
--------------------------------------------------

An IRQ-loader is a loader routine that lets interrupts run freely (music,
graphical effects) while loading. Here I'm talking about a loader with
1-bit transfer protocol, which also speeds up loading somewhat but not
as much as dedicated fastloaders using 2-bit transfer. But 2-bit transfer
is more limiting: interrupts will be turned off during the transfer of a
byte and sprites won't be allowed on screen.

This loader achieves roughly a 3.3x speedup compared to KERNAL loading routines,
when using sector interleave 15, leaving the screen on, and playing music on the
background.

I thank Marko Mäkelä (his original irqloader.s is available here) and K.M/TABOO,
for releasing their loader source codes. In fact they've been the sole source
of knowledge I have about custom loading routines. This loader contains a lot
of their code, too.

Here's the IRQ-loader disk image and source being discussed.

Let the dissection begin!


;COVERT BITOPS loader, simplified for Rant #5
;IRQ-loader based on work by K.M/TABOO & Marko Mäkelä
;(drive code init, main drive code, c64->drive communication by Marko Mäkelä)
;(drive->c64 communication routine by K.M/TABOO)

First I'll tell something about customized loading routines in general. They're
without exception based on running some own code with the disk drive's CPU.
That way, Commodore's clumsy official serial protocol that doesn't allow the
computer to be interrupted while receiving data can be replaced with something
smarter and faster. Naturally, there must also be code for the C64 that handles
its part of the communication. The cool thing is that you can completely define
the communication protocol the way you like!

Here we begin with some zeropage defines as usual. FA is the kernal variable
that tells the last device number used to load something. We'll be using that
device number when sending the IRQ-loader code to the disk drive.

status          = $90           ;Kernal zeropage variables
messages        = $9d
fa              = $ba

The loader itself uses three zeropage variables. The stack pointer's initial
value will be stored at the start of the load subroutine, so that the loader
can exit from any number of nested subroutines.

temp1           = $02           ;Temporary zeropage variables
temp2           = $03
stackptrstore   = $04

Then the KERNAL routine defines. There's a lot of unused ones but it doesn't
hurt.

ciout           = $ffa8         ;Kernal routines
listen          = $ffb1
second          = $ff93
unlsn           = $ffae
acptr           = $ffa5
chkin           = $ffc6
chkout          = $ffc9
chrin           = $ffcf
chrout          = $ffd2
ciout           = $ffa8
close           = $ffc3
open            = $ffc0
setmsg          = $ff90
setnam          = $ffbd
setlfs          = $ffba
clrchn          = $ffcc
getin           = $ffe4
load            = $ffd5
save            = $ffd8

Here comes the main program. Basically it initializes the fastloader,
initializes raster interrupts to play music located at $1000 while loading
(Eighties Megahit by Olli Niemitalo), performs the actual loading subroutine
call, then de-initializes raster interrupts and exits.

                processor 6502
                org 2049


;Example main program. Inits the fastloader and loads a file using it. After-
;wards the drive can be used normally.

sys:            dc.b $0b,$08           ;Address of next instruction
                dc.b $0a,$00           ;Line number(10)
                dc.b $9e               ;SYS-token
                dc.b $32,$30,$36,$31   ;2061 as ASCII
                dc.b $00
                dc.b $00,$00           ;Instruction address 0 terminates
                                       ;the basic program

start:          jsr initfastload
                jsr initmusicplayback   ;Now that we can play music while
                ldx #"D"                ;loading, let's not forget it...
                ldy #"A"                ;First letters of filename: DA
                jsr fastload
                jsr stopmusicplayback
                rts

initmusicplayback:
                sei
                lda #<raster
                sta $0314
                lda #>raster
                sta $0315
                lda #50                         ;Set low bits of raster
                sta $d012                       ;position
                lda $d011
                and #$7f                        ;Set high bit of raster
                sta $d011                       ;position (0)
                lda #$7f                        ;Set timer interrupt off
                sta $dc0d
                lda #$01                        ;Set raster interrupt on
                sta $d01a
                lda $dc0d                       ;Acknowledge timer interrupt
                lda #$00
                jsr $1000
                cli
                rts

stopmusicplayback:
                sei
                lda #<$ea31
                sta $0314
                lda #>$ea31
                sta $0315
                lda #$00
                sta $d01a
                lda #$81
                sta $dc0d
                inc $d019
                lda #$00
                sta $d418
                cli
                rts

raster:         inc $d020
                jsr $1003
                dec $d020
                dec $d019
                jmp $ea31

Here is the initialization routine, that "uploads" the custom code to the disk
drive's memory (with the Memory-Write, M-W command, 32 bytes at a time) and
once all code has been uploaded, starts it with the Memory Execute, M-E command.
For giving commands, the drive must be set to listen, and an unlisten actually
starts the execution of a command.

;INITFASTLOAD
;
;Uploads the fastloader to disk drive memory and starts it.
;This routine is completely Marko Mäkelä's work.
;
;Parameters: -
;Returns: -
;Modifies: A,X,Y


AMOUNT          = 32                    ;Bytes in one M-W command

initfastload:   lda #<drvprog           ;Initialize selfmodifying code
                sta il_mwbyte+1
                lda #>drvprog
                sta il_mwbyte+2
                lda #<drive
                sta mwcmd+2
                lda #>drive
                sta mwcmd+1
il_mwloop:      jsr il_device           ;Set drive to listen
                ldx #lmwcmd - 1
il_sendmw:      lda mwcmd,x             ;Send M-W command
                jsr ciout
                dex
                bpl il_sendmw
                ldx #0
il_mwbyte:      lda drvprog,x             ;Send AMOUNT bytes of drive
                jsr ciout                 ;code
                inx
                cpx #AMOUNT
                bne il_mwbyte
                jsr unlsn               ;Unlisten starts the command
                lda mwcmd+2
                clc
                adc #AMOUNT
                sta mwcmd+2
                bcc il_nohigh
                inc mwcmd+1
il_nohigh:      lda il_mwbyte+1
                clc                     ;Move pointers
                adc #AMOUNT
                sta il_mwbyte+1
                tax
                bcc il_nohigh2
                inc il_mwbyte+2
il_nohigh2:     lda il_mwbyte+2
                cpx #<drvprogend
                sbc #>drvprogend
                bcc il_mwloop

                jsr il_device           ;Set drive to listen again
                ldx #lmecmd - 1
il_sendme:      lda mecmd,x             ;Send M-E command
                jsr ciout
                dex
                bpl il_sendme
                jmp unlsn               ;Unlisten starts the command

il_device:      lda fa
                jsr listen
                lda #$6f
                jmp second

And now the fast loading routine itself.

;FASTLOAD
;
;Loads a file with fastloader. INITFASTLOAD must have been called first.
;Any normal KERNAL disk operations will cause the fastloader drive code to
;exit (as ATN line goes low) and after that, INITFASTLOAD has to be called
;again.
;
;Parameters: X: First letter of filename, Y: Second letter of filename
;Returns: C=0 OK, C=1 error
;Modifies: A,X,Y

We start by storing the filename (two first letters given in X & Y registers)

fastload:       stx filename
                sty filename+1

Because the Drive->C64 data sending depends on the C64 not going too fast, we
must initialize the slow mode of SuperCPU (for those who own it)

                sta $d07a               ;SCPU to slow mode

Next we store the stackpointer, to allow the loader to exit from any number
of nested subroutines.

                tsx                     ;Store stackpointer, needed when
                stx stackptrstore       ;finishing loading

Then we send the two bytes of the filename to the diskdrive (the custom code
uploaded is now running there). This byte sending protocol is Marko's invention
and it's completely asynchronous (so it could be run with fast mode of SuperCPU
and it would still work)

For each bit we do the following:
- First the C64 waits both CLK & DATA lines of the serial bus to go high.
- If a 0-bit is to be sent, C64 pulls the CLK-line low
- If a 1-bit is to be sent, C64 pulls the DATA-line low
- Now C64 waits for the drive to respond by pulling the other line low
- Then C64 puts both its CLK & DATA lines back high. We start from the
  beginning again...

                ldx #$01                ;Byte counter.
fastload_sendouter:
                ldy #$08                ;Bit counter
fastload_sendinner:
                bit $dd00               ;Wait for CLK & DATA high
                bvc fastload_sendinner
                bpl fastload_sendinner
                lsr filename,x          ;Rotate byte to be sent
                lda $dd00
                and #$ff-$30
                ora #$10
                bcc fastload_zerobit
                eor #$30
fastload_zerobit:
                sta $dd00
                lda #$c0                ;Wait for CLK & DATA low
fastload_sendack:
                bit $dd00
                bne fastload_sendack
                lda $dd00
                and #$ff-$30            ;Set DATA and CLK high
                sta $dd00
                dey
                bne fastload_sendinner
                dex                     ;All bytes sent?
                bpl fastload_sendouter

Next is a small delay. This is to be sure that the disk drive has finished
receiving the second byte of the filename, after which it pulls the CLK line
low to signal that it's not ready yet to send a byte of the file data.

fastload_delay: dex                     ;Give the drive some time to set CLK
                bne fastload_delay      ;low in preparation to sending bytes

We buffer a whole sector at a time from the disk. Mark the buffer to be
"empty" now, as we're starting the loading.

                lda #$00                ;Initialize buffer counter
                sta temp2

Get the first two bytes of the file, the startaddress.

                jsr fastload_getbyte    ;Get file start address
                sta fastload_sta+1
                jsr fastload_getbyte
                sta fastload_sta+2

Now loop, getting bytes and storing them to memory. The getbyte routine exits
automatically when all bytes have been received.

fastload_loop:  jsr fastload_getbyte    ;Then get bytes one by one. Getbyte
fastload_sta:   sta $1000               ;routine exits when all have been
                inc $d020               ;received.
                dec $d020               ;Just some flashing to know we're
                inc fastload_sta+1      ;loading...
                bne fastload_loop
                inc fastload_sta+2
                jmp fastload_loop

The getbyte subroutine. If there's bytes in the buffer, use them (in reverse
order), until buffer is empty.

fastload_getbyte:
                ldx temp2                ;Bytes still in buffer?
                beq fastload_fillbuffer
                lda loadbuffer-1,x
                dex
                stx temp2
                rts

Buffer is empty - we have to get bytes from the diskdrive. The diskdrive will
first send a code to indicate amount of bytes to transfer, or an error code, or
a "loading ended successfully" code. The codes the diskdrive will send are:

$00 - Load ended
$01 - File not found or sector read error
$02-$ff - Amount of bytes to be transferred+1

fastload_fillbuffer:
                jsr fastload_get        ;Get number of bytes to transfer
                cmp #$01                ;$00 indicates successful end of load
                bcc fastload_loadend    ;and $01 an error
                beq fastload_loadend    ;Carry is set already (error sign)
                sbc #$01                ;Carry is 1 here
                sta temp2                ;Store buffer length to bytecounter
                ldx #$00

Then we just loop to get all the bytes to the buffer. The 1541 will send the
bytes from the end of the sector to the start (reverse order). Because we
also use reverse order when getting the bytes from the buffer, this time
we must use the normal order.

fastload_gnbloop:
                jsr fastload_get        ;Get the buffer byte by byte
                sta loadbuffer,x
                inx
                cpx temp2
                bcc fastload_gnbloop
                bcs fastload_getbyte

When loading ends, get the stored stack pointer value, set SuperCPU back to
fast mode and exit.

fastload_loadend:
                ldx stackptrstore       ;Restore stackpointer & exit loader
                txs
                sta $d07b               ;SCPU to fast mode
                rts

The subroutine to get a byte from the diskdrive (by K.M/TABOO.) First we wait
the drive to become ready, it signals this by letting CLK go high. At that
point, it has also put the first databit on the DATA line (DATA is high for an
1-bit and low for a 0-bit)

We signal the drive to give the next databit by reversing the state of the
CLK line (it becomes low now). There must be some delay to allow the drive
to react (this is why the routine must not run too fast on the C64 side)
and we continue this until all 8 bits have been received. The byte transfer
ends by the drive pulling CLK back to low state.

fastload_get:   bit $dd00               ;Wait until 1541 is ready to send
                bvc fastload_get        ;(CLK=high)
                lda #$0f
                and $dd00
                sta $dd00
                nop
                ldy #$08                ;Bit counter
fastload_bitloop:
                nop
                nop
                lda #$10
                eor $dd00               ;Take databit from serialport and
                sta $dd00               ;store reversed clockbit
                asl
                rol temp1
                lda temp1
                dey
                bne fastload_bitloop    ;All bits done?
                rts

The last part of the IRQ-loader is the drive code itself. Note the use of
DASM's rorg directive to target code for the drive RAM at $0500.

;DRVPROG - Code executed in the disk drive.

RETRIES         = 5             ;Amount of retries when reading a sector
acsbf           = $01           ;Buffer 1 command
trkbf           = $08           ;Buffer 1 track
sctbf           = $09           ;Buffer 1 sector
iddrv0          = $12           ;Disk drive ID
id              = $16           ;Disk ID
datbf           = $14           ;Temp variable
buf             = $0400         ;Sector data buffer

drvprog:                        ;Address in C64's memory
                rorg $0500      ;Address in diskdrive's memory

Code execution starts here. As the C64->drive transfer of the filename is
asynchronous, interrupts don't matter. Interrupts are enabled to allow the
disk motor to eventually stop after loading a file.

drive:          cli             ;Enable interrupts while waiting the first byte
                jsr getbyte     ;(to allow motor to stop)
                sta namecmp2+1

After the 1st byte we forbid interrupts. That is because after receiving the
second byte of the filename is done, the rest of the communication becomes
timing-critical for the disk drive, and any unknown interruptions aren't
allowed.

                sei             ;Disable while waiting second byte
                jsr getbyte
                sta namecmp1+1
                lda #$08        ;Set CLK=low to tell C64 there's no data to
                sta $1800       ;be read yet

Next we read sector 1 from track 18 (the disk directory)

                ldx #18
                ldy #1	        ;Read disk directory
dirloop:        stx trkbf
                sty sctbf
                jsr readsect    ;Read sector
                bcc error       ;If failed, return error code
                ldy #$02

Then we go through all filename entries on the first directory block. The file
type must match that of a .PRG file and the two first letters of the filename
are compared. If these are satisfied, we consider the file to be found...

nextfile:       lda buf,y       ;File type must be PRG
                and #$83
                cmp #$82
                bne notfound
                lda buf+3,y     ;Check first letter
namecmp1:       cmp #$00
                bne notfound
                lda buf+4,y     ;Check second letter
namecmp2:       cmp #$00
                beq found

If not, move on to next filename entry on the directory block (each is 32
bytes).

notfound:       tya
                clc
                adc #$20
                tay
                bcc nextfile

After all names done, we read the next directory block, or give an error if
that was the last directory block and file still not found.

                ldy buf+1       ;Go to next directory block, go on until no
                ldx buf	        ;more directory blocks
                bne dirloop
error:          lda #$01        ;Send $01 - error in loading file
loadend:        jsr sendbyte

After the last byte has been sent, wait for CLK to become high to see that
the C64 has read the last databit. After that, also the DATA line can be
set high and we may start again from the beginning (waiting for the filename.)

                lda $1800       ;Set CLK=High
                and #$f7
                sta $1800
                lda #$04
loadend_wait:   bit $1800       ;Wait for CLK=High
                bne loadend_wait
                ldy #$00        ;Set DATA=High
                sty $1800
                jmp drive       ;Go back to wait for the filename

The file has been found. Get its starting track and sector (the same code is
reused for getting the next track§or link)

found:          iny
nextsect:       lda buf,y       ;File found, get starting track & sector
                sta trkbf
                beq loadend     ;If at file's end, send byte $00
                lda buf+1,y
                sta sctbf
                jsr readsect    ;Read the data sector
                bcc error

If this isn't the last block of a file (link to next track nonzero), we will
send the full 254 data bytes. Otherwise, the sector link byte contains the
amount of bytes in the last block+1.

                ldy #$ff        ;Amount of bytes to send - assume $ff
                lda buf
                bne sendblk
                ldy buf+1       ;Possibly less if it's the last block
sendblk:        tya

Here we loop to send all bytes of the block in reverse order.

sendloop:       jsr sendbyte    ;Send the amount of bytes that will be sent
                lda buf,y       ;Send the sector data in reverse order
                dey
                bne sendloop
                beq nextsect

Sector read subroutine. Not much to say about this, it will retry 5 times
before giving up and signaling error (carry set), otherwise it clears the carry
to tell the sector was read OK. Note that the drive led will be toggled during
the reading, and switched off afterwards.

readsect:       ldy #RETRIES    ;Retry counter
retry:          cli             ;Enable interrupts so that command can be
                jsr success     ;executed, turn on led
                lda #$80
                sta acsbf       ;Command:read sector
poll1:          lda acsbf       ;Wait until ready
                bmi poll1
                sei
                cmp #1
                beq success     ;Also sets carry flag to 1
                lda id	        ;Check for disk ID change
                sta iddrv0
                lda id+1
                sta iddrv0+1
                dey             ;Decrease retry counter
                bne retry
failure:        clc
success:        lda $1c00
                eor #$08
                sta $1c00
                rts

Send byte subroutine. First sets CLK high to signal a byte is ready, and puts
the first databit on the DATA line (DATA is high when an 1-bit is being sent).
Then it waits for the C64 to toggle the state of the CLK line before putting
the next data bit online, until all bits have been sent. The routine ends with
the drive pulling the CLK line back low. This routine is from K.M/TABOO.

sendbyte:       sta datbf       ;Store the byte to a temp variable
                tya             ;Store Y-register contents
                pha
                ldy #$04
                lda $1800
                and #$f7
                sta $1800
                tya
s1:             asl datbf       ;Rotate bit to carry and "invert"
                ldx #$02
                bcc s2
                ldx #$00
s2:             bit $1800
                bne s2
                stx $1800
                asl datbf
                ldx #$02
                bcc s3
                ldx #$00
s3:             bit $1800
                beq s3
                stx $1800
                dey
                bne s1
                txa
                ora #$08
                sta $1800
                pla
                tay
                rts

Receive byte subroutine by Marko Mäkelä. If ATN is pulled low, exit the custom
code, back to the drive's operating system ROM to allow normal operation.
Otherwise, wait for either line to become low (CLK for a 0-bit, DATA for an
1-bit). Then acknowledge the bit by pulling the other line low and wait for the
C64 to release the line it had pulled low. Finally set both lines high and loop
until all bits received.

getbyte:        ldy #8	        ;Counter: receive 8 bits
recvbit:
                lda #$85
                and $1800	;Wait for CLK==low || DATA==low
                bmi gotatn	;Quit if ATN was asserted
                beq recvbit
                lsr		;Read the data bit
                lda #2	        ;Prepare for CLK=high, DATA=low
                bcc rskip
                lda #8	        ;Prepare for CLK=low, DATA=high
rskip:          sta $1800	;Acknowledge the bit received
                ror datbf	;and store it
rwait:          lda $1800	;Wait for CLK==high || DATA==high
                and #5
                eor #5
                beq rwait
                lda #0
                sta $1800	;Set CLK=DATA=high
                dey
                bne recvbit	;Loop until all bits have been received
                lda datbf	;Return the data to A
                rts
gotatn:         pla		;If ATN gets asserted, exit to the operating
                pla		;system. Discard the return address.
                rend
il_ok:          rts

drvprogend:

Here's the data for the M-W and M-E command strings, in reverse order.

mwcmd:          dc.b AMOUNT,>drive,<drive,"W-M"
lmwcmd          = . - mwcmd

mecmd:          dc.b >drive,<drive,"E-M"
lmecmd          = . - mecmd

The filename and sector buffer.

filename:       dc.b 0,0
loadbuffer:     dc.b 254,0

Music data.

                org $1000

                incbin music.bin


Now some thoughts. What the loader could do better? The drive code could read
a sector using its own methods, faster than the one in the drive ROM. For that,
look at Krill's loader sources in http://noname.c64.org/csdb/release/?id=4462.

Also, the drive is waiting for the C64 to acknowledge each bit received
when sending sector data (1-bit transfer). 2-bit transfer would be faster,
as it would be using both CLK & DATA lines to transfer data.

2-bit transfer would only synchronize at the beginning of each byte to be transmitted
so the timing requirements are strict:

- The C64 must wait until there's no danger of a bad line stealing CPU
cycles and messing the timing during byte transfer, or blank the screen

- Interrupts must be disabled during byte transfer

- Sprites must not be displayed because they steal CPU cycles

- The timing is different for PAL & NTSC C64's, so the communication routines
have to be rewritten or modified for each case

But, this 1-bit loader is still quite fine for an IRQ-loader coding example.
Quite systemfriendly too, as no separate uninit routine is needed (automatic
uninit when normal serial protocol detected)


                                                  Lasse Öörni
                                                  loorni@gmail.com