2-bit transfer protocol in diskdrive IRQ-loaders by Cadaver
-----------------------------------------------------------

This is a continuation of the previous IRQ-loader rant, that focused on a 1-bit
transfer protocol. The loader discussed here is very much like the previous one,
so only the changes in the protocol are explained in detail.

2-bit transfer, as mentioned in the previous rant, requires good synchronization
between the C64 and diskdrive. Once the transfer of a byte is fired up, the
1541 sends 2 bits at a time on both the CLK & DATA lines of the serial bus
without any waiting or handshaking.

What gives problems is that synchronization between the C64 & diskdrive can't
be cycle-exact; usually the diskdrive is waiting in a loop for the C64 to reply
before firing up the strictly timed transfer, so the timing might be off as
much as the length of this waiting loop.

The other problem is that the diskdrive CPU runs at 1Mhz, but the C64 runs
either slower (PAL) or faster (NTSC). This has to be taken into account also.
In fact it seems that when the transfer routine is coded in a certain way, the
difference between correct PAL & NTSC timing is only 1 clock cycle on the C64
side; the disk drive code doesn't have to be adjusted at all.

The drawbacks with this kind of 2-bit transfer are:
- Interrupts will be disabled during the transfer of a byte, so any raster
effects might be displaced.
- Sprites are not allowed onscreen (would steal CPU cycles and mess with the
timing.)
- Hitting RESTORE while loading will cause an NMI interrupt and also mess the
timing. There are two ways I can think of to handle this: Extend the transfer
protocol to have a resend option (the NMI interrupt will set a resend flag) or
disable NMIs by triggering a one-shot CIA2 interrupt but never acknowledging
the NMI. None of these are in use in the loader, it will just load incorrectly
if NMIs are occurring.

I thank Marko Mäkelä for his C64->diskdrive asynchronous protocol, drive init
code and original main drive loop, K.M/TABOO for inspiration on the badline
detection and MuOn for testing the transfer routines on a real NTSC machine.
The 2bit transfer code in this loader is coded by me but heavily inspired by
various game loadersystems; for example Technocop.

2-bit loader disk image and source


;COVERT BITOPS 2bit fastloader for Rant #9
;Based on:
;Marko Mäkelä's IRQ-loader
;- Drive init code
;- Drive main loop
;- Asynchronous C64->drive communication
;Technocop loader and other game loadersystems
;- Drive->C64 2-bit communication
;K.M/Taboo's 2bit loaders
;- Badline detection

status          = $90           ;Kernal zeropage variables
messages        = $9d
fa              = $ba

bufferstatus    = $02
stackptrstore   = $03

ciout           = $ffa8         ;Kernal routines
listen          = $ffb1
second          = $ff93
unlsn           = $ffae
acptr           = $ffa5
chkin           = $ffc6
chkout          = $ffc9
chrin           = $ffcf
chrout          = $ffd2
ciout           = $ffa8
close           = $ffc3
open            = $ffc0
setmsg          = $ff90
setnam          = $ffbd
setlfs          = $ffba
clrchn          = $ffcc
getin           = $ffe4
load            = $ffd5
save            = $ffd8

                processor 6502
                org 2049

;Example main program. Inits the fastloader and loads a file using it. After-
;wards the drive can be used normally.

sys:            dc.b $0b,$08           ;Address of next instruction
                dc.b $0a,$00           ;Line number(10)
                dc.b $9e               ;SYS-token
                dc.b $32,$30,$36,$31   ;2061 as ASCII
                dc.b $00
                dc.b $00,$00           ;Instruction address 0 terminates
                                       ;the basic program

start:          jsr initfastload
                jsr initmusicplayback   ;Now that we can play music while
                ldx #"D"                ;loading, let's not forget it :-)
                ldy #"A"
                jsr fastload
                jsr stopmusicplayback
                rts

initmusicplayback:
                sei
                lda #<raster
                sta $0314
                lda #>raster
                sta $0315
                lda #50                 ;Set low bits of raster
                sta $d012               ;position
                lda $d011
                and #$7f                ;Set high bit of raster
                sta $d011               ;position (0)
                lda #$7f                ;Set timer interrupt off
                sta $dc0d
                lda #$01                ;Set raster interrupt on
                sta $d01a
                lda $dc0d               ;Acknowledge timer interrupt
                lda #$00
                jsr $1000
                cli
                rts

stopmusicplayback:
                sei
                lda #<$ea31
                sta $0314
                lda #>$ea31
                sta $0315
                lda #$00
                sta $d01a
                lda #$81
                sta $dc0d
                inc $d019
                lda #$00
                sta $d418
                cli
                rts

raster:         inc $d020
                jsr $1003
                dec $d020
                dec $d019
                jmp $ea31


;INITFASTLOAD
;
;Uploads the fastloader to disk drive memory and starts it.
;This routine is Marko Mäkelä's work, except for the 2-bit transfer
;preparations.
;
;Parameters: -
;Returns: -
;Modifies: A,X,Y

AMOUNT          = 32                    ;Bytes in one M-W command

The fastloader initialization code starts with PAL/NTSC detection. I didn't
want to rely on the value of $02a6 memory location so I implemented it with
raster-line based detection. This code measures the highest rasterline number
on the screen, and draws conclusions from that.

initfastload:   sei
                lda #$00
il_detectntsc:  ldx $d011              ;Get the biggest rasterline in the
                bmi il_detectntsc      ;area >= 256 to detect NTSC/PAL
il_detectntsc2: ldx $d011
                bpl il_detectntsc2
il_detectntsc3: cmp $d012
                bcs il_detectntsc4
                lda $d012
il_detectntsc4: ldx $d011
                bmi il_detectntsc3
                cli
                cmp #$20                ;PAL has 312 lines, but this check is
                bcc il_isntsc           ;somewhere in the middle...

For a PAL machine, the BPL instruction in the getbyte delay code (3 cycles,
takes the branch) is replaced with a BMI instruction (2 cycles, doesn't take
the branch)

                lda #$30                ;Adjust 2-bit fastload transfer
                sta fastload_delay      ;delay for PAL

The rest of the initialization is like in the 1-bit loader.

il_isntsc:      lda #<drvprog           ;Initialize selfmodifying code
                sta il_mwbyte+1
                lda #>drvprog
                sta il_mwbyte+2
                lda #<drive
                sta mwcmd+2
                lda #>drive
                sta mwcmd+1
il_mwloop:      jsr il_device           ;Set drive to listen
                ldx #lmwcmd - 1
il_sendmw:      lda mwcmd,x             ;Send M-W command
                jsr ciout
                dex
                bpl il_sendmw
                ldx #0
il_mwbyte:      lda drvprog,x             ;Send AMOUNT bytes of drive
                jsr ciout                 ;code
                inx
                cpx #AMOUNT
                bne il_mwbyte
                jsr unlsn               ;Unlisten starts the command
                lda mwcmd+2
                clc
                adc #AMOUNT
                sta mwcmd+2
                bcc il_nohigh
                inc mwcmd+1
il_nohigh:      lda il_mwbyte+1
                clc                     ;Move pointers
                adc #AMOUNT
                sta il_mwbyte+1
                tax
                bcc il_nohigh2
                inc il_mwbyte+2
il_nohigh2:     lda il_mwbyte+2
                cpx #<drvprogend
                sbc #>drvprogend
                bcc il_mwloop

                jsr il_device           ;Set drive to listen again
                ldx #lmecmd - 1
il_sendme:      lda mecmd,x             ;Send M-E command
                jsr ciout
                dex
                bpl il_sendme
                jmp unlsn               ;Unlisten starts the command

il_device:      lda fa
                jsr listen
                lda #$6f
                jmp second


;FASTLOAD
;
;Loads a file with fastloader. INITFASTLOAD must have been called first.
;Any normal KERNAL disk operations will cause the fastloader drive code to
;exit (as ATN line goes low) and after that, INITFASTLOAD has to be called
;again.
;
;Parameters: X: First letter of filename, Y: Second letter of filename
;Returns: C=0 OK, C=1 error
;Modifies: A,X,Y

fastload:       stx filename
                sty filename+1
                sta $d07a               ;SCPU to slow mode
                tsx                     ;Store stackpointer, needed when
                stx stackptrstore       ;finishing loading
                lda #$00                ;Reset buffer status
                sta bufferstatus
                ldx #$01                ;Byte counter.
fastload_sendouter:
                ldy #$08                ;Bit counter
fastload_sendinner:
                lsr filename,x          ;Rotate byte to be sent
                lda $dd00
                and #$ff-$30
                ora #$10
                bcc fastload_zerobit
                eor #$30
fastload_zerobit:
                sta $dd00
                lda #$c0                ;Wait for CLK & DATA low
fastload_sendack:
                bit $dd00
                bne fastload_sendack
                lda $dd00
                and #$ff-$30            ;Set DATA and CLK high
                sta $dd00
fastload_waitack:
                bit $dd00               ;Wait for CLK & DATA high
                bvc fastload_waitack
                bpl fastload_waitack
                dey
                bne fastload_sendinner
                dex                     ;All bytes sent?
                bpl fastload_sendouter

Here something has changed. In this protocol the computer first asks for data
by setting CLK=low; the drive responds with DATA=low to signal that data is 
available. In idle state both CLK and DATA lines are high. So, there is no
need for a pre-delay.

                lda #$00                ;Initialize buffer counter
                sta bufferstatus
                jsr fastload_getbyte    ;Get file start address
                sta fastload_sta+1
                jsr fastload_getbyte
                sta fastload_sta+2

fastload_loop:  jsr fastload_getbyte    ;Then get bytes one by one. Getbyte
fastload_sta:   sta $1000               ;routine exits when all have been
                inc $d020               ;received
                dec $d020
                inc fastload_sta+1
                bne fastload_loop
                inc fastload_sta+2
                jmp fastload_loop

fastload_getbyte:
                ldx bufferstatus                ;Bytes still in buffer?
                beq fastload_fillbuffer
                lda loadbuffer-1,x
                dex
                stx bufferstatus
                rts
fastload_fillbuffer:
                jsr fastload_get        ;Get number of bytes to transfer
                cmp #$01                ;$00 indicates successful end of load
                bcc fastload_loadend    ;and $01 an error
                beq fastload_loadend    ;Carry is set already (error sign)
                sbc #$01                ;Carry is 1 here
                sta bufferstatus        ;Store buffer length to bytecounter
                ldx #$00
fastload_gnbloop:
                jsr fastload_get        ;Get the buffer byte by byte
                sta loadbuffer,x
                inx
                cpx bufferstatus
                bcc fastload_gnbloop
                bcs fastload_getbyte
fastload_loadend:
                ldx stackptrstore       ;Restore stackpointer & exit loader
                txs
                sta $d07b               ;SCPU to fast mode
                rts

Here is the new getbyte routine for 2-bit transfer. It starts by pulling
CLK low and waiting for disk drive to respond by pulling DATA low.

fastload_get:   lda $dd00               ;CLK low
                ora #$10
                sta $dd00
fastload_waitdrive:
                bit $dd00               ;Wait for 1541 to signal data ready by
                bmi fastload_waitdrive  ;setting DATA low
                sei
                
Then, waiting for any badline to pass:

fastload_waitbadline:
                lda $d011
                clc                     ;Wait until a badline won't distract
                sbc $d012               ;the timing
                and #$07
                beq fastload_waitbadline
                
Now that we're certain, that a bad line won't disturb us for a while, we can
begin the actual byte transfer. We let CLK high to signal the disk drive that
we want to receive a byte. From here onwards timing is very important!

                lda $dd00
                and #$03
                sta $dd00               ;Set CLK high
                
After CLK has been pulled low, there has to be 14 clock cycles delay for PAL
and 15 cycles for NTSC (determined experimentally), before we start reading
the data bits. At the end of this delay, we set CLK back high so that we can
"see" what the disk drive is putting on the CLK line.

fastload_delay: bpl fastload_delay2
fastload_delay2:nop
                nop
                nop
                nop
                sta fastload_eor+1
                
And here comes the byte receiving, 2 bits at a time. The corresponding sending
code on the disk drive side has the same amount of cycles, except...

                lda $dd00
                lsr
                lsr
                eor $dd00
                lsr
                lsr
                eor $dd00
                lsr
                lsr
                
...for this EOR instruction. This is to ensure NTSC machines won't go ahead
of the disk drive. On the other hand, the disk drive will soon set CLK & DATA
back high, marking a return to idle state, so we can't be too slow either in
grabbing the last bits. The EOR is necessary because the video bank bits are
present in the lowest 2 bits of $dd00.

fastload_eor:   eor #$00
                eor $dd00
                cli
                rts

;DRVPROG - Code executed in the disk drive.

RETRIES         = 5             ;Amount of retries when reading a sector
acsbf           = $01           ;Buffer 1 command
trkbf           = $08           ;Buffer 1 track
sctbf           = $09           ;Buffer 1 sector
iddrv0          = $12           ;Disk drive ID
id              = $16           ;Disk ID
datbf           = $14           ;Temp variable
buf             = $0400         ;Sector data buffer

drvprog:                        ;Address in C64's memory
                rorg $0500      ;Address in diskdrive's memory
drive:          cli             ;Enable interrupts while waiting the first byte
                jsr getbyte     ;(to allow motor to stop)
                sta namecmp2+1
                jsr getbyte
                sta namecmp1+1

Also, now the readsect subroutine takes the track & sector in X & Y registers,
instead of them having to be stored on the zeropage by the caller.

                ldx #18
                ldy #1	        ;Read disk directory
dirloop:        jsr readsect    ;Read sector
                bcc error       ;If failed, return error code
                ldy #$02
nextfile:       lda buf,y       ;File type must be PRG
                and #$83
                cmp #$82
                bne notfound
                lda buf+3,y     ;Check first letter
namecmp1:       cmp #$00
                bne notfound
                lda buf+4,y     ;Check second letter
namecmp2:       cmp #$00
                beq found
notfound:       tya
                clc
                adc #$20
                tay
                bcc nextfile
                ldy buf+1       ;Go to next directory block, go on until no
                ldx buf	        ;more directory blocks
                bne dirloop
error:          ldx #$01        ;Send $01 - error in loading file
loadend:        txa
                jsr sendbyte
                jmp drive       ;Go back to wait for the filename

found:          iny
nextsect:       ldx buf,y       ;File found, get starting track & sector
                beq loadend     ;If at file's end, send byte $00
                lda buf+1,y
                tay
                jsr readsect    ;Read the data sector
                bcc error
                ldy #$ff        ;Amount of bytes to send - assume $ff
                lda buf
                bne sendblk
                ldy buf+1       ;Possibly less if it's the last block
sendblk:        tya
sendloop:       jsr sendbyte    ;Send the amount of bytes that will be sent
                lda buf,y       ;Send the sector data in reverse order
                dey
                bne sendloop
                beq nextsect

readsect:       stx trkbf
                sty sctbf
                ldy #RETRIES    ;Retry counter
                jsr success     ;Turn on led
retry:          lda #$80
                sta acsbf       ;Command:read sector

Here is the key to getting good loading speeds. Interrupts must only be enabled
when the command is already waiting in the command register (an interrupt has
probably been pending while we've been sending the last sector and now it will
be executed right after the CLI instruction, so sector reading commences as
fast as it can.)

                cli
poll1:          lda acsbf       ;Wait until ready
                bmi poll1
                sei
                cmp #1
                beq success     ;Also sets carry flag to 1
                lda id	        ;Check for disk ID change
                sta iddrv0
                lda id+1
                sta iddrv0+1
                dey             ;Decrease retry counter
                bne retry
failure:        clc
success:        lda $1c00
                eor #$08
                sta $1c00
                rts

And here's the disk drive side of the 2-bit transfer routine. It relies on a
table to convert 4 bits at a time to the CLK & DATA signals (a byte can be
shifted left to get the second bit pair of a nybble)

sendbyte:       sta datbf
                lsr
                lsr
                lsr
                lsr
                
                lda #$04
sendwait:       bit $1800               ;Wait for CLK==low
                beq sendwait
                
The DATA=low must not be set until the disk drive really is ready to send a
byte, because the C64 will not wait after that.

                lsr                     ;Set DATA=low
                sta $1800
                lda sendtbl,x           ;Get the CLK,DATA pairs for low nybble
                pha
                lda datbf
                and #$0f
                tax
                lda #$04
                
Here, wait for CLK to go high.

sendwait2:      bit $1800               ;Wait for CLK==high (start of high speed transfer)
                bne sendwait2
                lda sendtbl,x           ;Get the CLK,DATA pairs for high nybble
                
And start the bit-pair sending. 8 clock cycles per pair, just like on the C64
side.

                sta $1800
                asl
                and #$0f
                sta $1800
                pla
                sta $1800
                asl
                and #$0f
                sta $1800
                
Then, after some delay, set the CLK & DATA lines back to high state.

                nop
                nop
                lda #$00                ;CLK=DATA=high
                sta $1800
                rts

sendtbl:        dc.b $0f,$07,$0d,$05
                dc.b $0b,$03,$09,$01
                dc.b $0e,$06,$0c,$04
                dc.b $0a,$02,$08,$00

getbyte:        ldy #$08                ;Counter: receive 8 bits
recvbit:        lda #$85
                and $1800               ;Wait for CLK==low || DATA==low
                bmi gotatn              ;Quit if ATN
                beq recvbit
                lsr                     ;Read the data bit
                lda #2                  ;Prepare for CLK=high, DATA=low
                bcc rskip
                lda #8                  ;Prepare for CLK=low, DATA=high
rskip:          sta $1800               ;Acknowledge the bit received
                ror datbf               ;and store it
rwait:          lda $1800               ;Wait for CLK==high || DATA==high
                and #5
                eor #5
                beq rwait
                lda #0
                sta $1800               ;Set CLK=DATA=high
                dey
                bne recvbit             ;Loop until all bits have been received
                lda datbf               ;Return the data to A
                rts
gotatn:         pla                     ;If ATN goes low, exit to the operating
                pla                     ;system. Just discard the return address.
                rts

                rend

drvprogend:

mwcmd:          dc.b AMOUNT,>drive,<drive,"W-M"
lmwcmd          = . - mwcmd

mecmd:          dc.b >drive,<drive,"E-M"
lmecmd          = . - mecmd

Filename buffer, sector buffer and music data.

filename:       dc.b 0,0
loadbuffer:     dc.b 254,0

                org $1000

                incbin music.bin


With the standard sector interleave of 10, this loader achieves about 5x
loading speed compared to the KERNAL routines. Going below that interleave
results in a drop in loading speed as the disk has to spin one more round. The
next step for more speed is rewriting the sector read routine at least
partially, but that is totally outside my knowledge.

So, the end of this rant has been reached. Remember to do RESTORE protection in
actual production code!


                                                  Lasse Öörni
                                                  loorni@gmail.com