Sunday, November 5, 2017

Fast Bitmap Plotting

In order to devise a fast bitmap plotting algorithm, we need to first better understand, what does it actually take to plot a single pixel onto a bitmap. We will approach all the necessary calculations step by step, so that we can thoroughly examine any potential complications arising as we move away from the center of a screen's coordinate system with subsequent X and Y offset values.

I am assuming that a reader is already familiar with the basic concept of plotting dots onto a Commodore 64's bitmap, as I am not going to dive into specific details of a memory map organisation nor to explain why setting specific bits into destination memory locations makes a particular pixel visible on a display screen. Obviously none of the important details is going to remain untold nor omitted, however there are plenty tutorials available on the web that cover the very basics of this topic, so that we can move straight to what is really interesting.

Please make sure to put the following two constant assignments at the top of each source code file before attempting to compile any of the examples presented within this tutorial:

BITMAP      = $2000
BIT_MASK    = $fb
VECTOR      = $fd

Table data generator (explained later) requires prior memory allocation for the following lookup tables:

bitmap_y_hi   .dsb $100,$00
bitmap_y_lo   .dsb $100,$00
bitmap_shift  .dsb $100,$00
bit_and_mask  .dsb $100,$00

Plain simple approach to plotting


Given X and Y coordinates, let us begin with designing our subroutine's body that puts a single pixel onto a hires bitmap. Bear in mind that Y value must not exceed $c7, because bitmap's height is 200 pixels. We also do not want to slow down our rendering process with unnecessary validations, therefore from now on we will assume that our subroutine expects two 8-bit arguments provided in X and Y registers, having concrete location values already assigned.

For X and Y values between $00 and $07 things go as simple as:

            lda #<BITMAP
            sta VECTOR+0
            lda #>BITMAP
            sta VECTOR+1

            lda #$00
            sec
            ror
            dex
            bpl *-2
            sta BIT_MASK

            lda (VECTOR),y
            ora BIT_MASK
            sta (VECTOR),y

We first set a zero-page vector to point at the beginning of a bitmap data. Then we compute a bit mask that will be used to set a pixel at an appropriate position of a selected address. And eventually we use Y register as an index to an operation of applying a bit mask to a current byte value.

Well, that was simple, however it covers only the simplest case of putting a pixel in the top-left corner area of 8x8 pixels.

If you would like to have an accompanying method that clears a pixel, the middle part would have to be modified to generate an AND mask instead, and such mask would have to be applied in the lower part of a procedure:

            lda #<BITMAP
            sta VECTOR+0
            lda #>BITMAP
            sta VECTOR+1

            lda #$ff
            clc
            ror
            sec
            dex
            bpl *-3
            sta BIT_MASK

            lda (VECTOR),y
            and BIT_MASK
            sta (VECTOR),y

In order to be able to set pixels for any value entered into an X register, we have to add its higher 5 bits to a VECTOR pointing to a selected byte of a bitmap data, basically like this:

            lda #>BITMAP
            sta VECTOR+1
            txa
            pha
            and #$f8
            clc
            adc #<BITMAP
            sta VECTOR+0
            bcc *+4
            inc VECTOR+1

            pla
            and #$07
            tax
            lda #$00
            sec
            ror
            dex
            bpl *-2
            sta BIT_MASK

            lda (VECTOR),y
            ora BIT_MASK
            sta (VECTOR),y

Okay, this procedure has now got a little bit longer and more complicated, however it still covers only an area of 256x8 pixels, so in the next step we are going to further extend it into an entire target screen of 256x200 pixels. We will use a value entered into a Y register to move a VECTOR to point $0140 bytes farther in a memory for as many times as Y's higher 5 bits indicate:

            lda #>BITMAP
            sta VECTOR+1
            txa
            pha
            and #$f8
            clc
            adc #<BITMAP
            sta VECTOR+0
            bcc *+4
            inc VECTOR+1

            pla
            and #$07
            tax
            lda #$00
            sec
            ror
            dex
            bpl *-2
            sta BIT_MASK

            tya
            lsr
            lsr
            lsr
            tax
            beq *+19
            inc VECTOR+1
            clc
            lda VECTOR+0
            adc #$40
            sta VECTOR+0
            bcc *+4
            inc VECTOR+1
            dex
            jmp *-16           
            tya
            and #$07
            tay

            lda (VECTOR),y
            ora BIT_MASK
            sta (VECTOR),y

This is how our normal bitmap plotting procedure would look like. It is clean, it does its job, however it is slow. And for large values of Y even very slow. Thus it gives us some room to make it faster.

Please note that this procedure plots pixels only in a limited area of 256x200. Extending it to a full 320 pixels wide bitmap would not be a difficult task. Consider it an exercise to do at home to enhance it in a way to support an entire visible bitmap area of 320x200.

Tables, tables everywhere


And how are we going to optimise all of these heavy calculations, so that they ultimately require a lot less cycles to compute and set a desired pixel? The answer is provided within this paragraph, and tables are the answer! We are going to precompute as much information as we can and store it in a memory to be used as a few lookup tables.

You have probably already noticed a couple of patterns, how all of the computed values change with different X and Y coordinates. These regularities may be used to generate values for our lookup tables easily.

Let us begin with a bit mask used to determine which particular pixel needs to be set in a memory location indicated by a calculated VECTOR. This is a value that is assigned to a BIT_MASK variable (which is assumed here to be placed on a zero page). This is not the most expensive computation, but probably the easiest one to demonstrate the use of lookup tables, explain how to populate them with data, and illustrate referring to them inside our procedure. So, what is the pattern we are seeing here?

$80,$40,$20,$10,$08,$04,$02,$01

...which repeats over and over again. If we precompute a table of all possible bit mask values, indexed by an X coordinate of a target pixel, and then refer to it, our put pixel procedure simplifies to:

              lda #>BITMAP
              sta VECTOR+1
              txa
              pha
              and #$f8
              clc
              adc #<BITMAP
              sta VECTOR+0
              bcc *+4
              inc VECTOR+1

              tya
              lsr
              lsr
              lsr
              tax
              beq *+19
              inc VECTOR+1
              clc
              lda VECTOR+0
              adc #$40
              sta VECTOR+0
              bcc *+4
              inc VECTOR+1
              dex
              jmp *-16           
              tya
              and #$07
              tay

              pla
              tax

              lda (VECTOR),y
              ora bit_mask,x
              sta (VECTOR),y

The only reason an X coordinate is still put onto a stack and later pulled from it is that X register is used as an index of a loop performed to compute VECTOR position based on a Y coordinate (we are going to fix it soon). Pay attention to the most important change however. We no longer care about permanent calculation of a BIT_MASK, it is now simply fetched from a dedicated lookup table labelled "bit_mask", precomputed only once at the beginning of a program execution. This is a source code of an exemplary table generator:

              ldx #$00
              lda #$00
              sec
              ror
              bcs *+9
              sta bit_mask,x
              inx
              jmp *-7
              cpx #$00
              bne *-15

The most significant concept illustrated by this example is that an X value is taken just as it is and used only as an offset when picking up a value from a precalculated lookup table. This is crucial, because we can push this kind of simplifications much farther, and consequently speed up the code.

Moving on... In the next step let us take care of utilising a Y coordinate value to instantly set the correct pointer to a memory location holding a byte with selected pixel information. Here is the pattern you may have already noticed:

$0000,$0001,$0002,$0003,$0004,$0005,$0006,$0007
$0140,$0141,$0142,$0143,$0144,$0145,$0146,$0147
$0280,$0281,$0282,$0283,$0284,$0285,$0286,$0287

There are actually two different patterns to consider, which are superimposed onto each other. First, every next value is incremented by 1 when compared against its predecessor, with a cycle beginning at $00 and ending up with $07, and then repeating over. Second, a base value for each item is incremented by $140 for every 8th value, it is however not reset, it remains updated over the whole course of table data generation.

Also note the following fact: values we consider for a Y offset table must be split into two separate tables, since target values exceed the range of an $ff value (they grow above the $ff). Hence we end up with 16-bit offsets, which can only be stored in two separate bytes. Knowing this, here is how we are going to fix our pixel setting procedure:

              lda bitmap_y_hi,y
              sta VECTOR+1
              txa
              and #$f8
              clc
              adc bitmap_y_lo,y
              sta VECTOR+0
              bcc *+4
              inc VECTOR+1

              ldy #$00
              lda (VECTOR),y
              ora bit_mask,x
              sta (VECTOR),y

And here is an exemplary script to generate a precomputed lookup table with bitmap offsets (both low and high bytes):

              ldx #$00
              lda #<BITMAP
              sta VECTOR+0
              lda #>BITMAP
              sta VECTOR+1
              lda VECTOR+0
              sta bitmap_y_lo,x
              lda VECTOR+1
              sta bitmap_y_hi,x
              inx
              inc VECTOR+0
              lda VECTOR+0
              and #$08
              beq *-17
              inc VECTOR+1
              clc
              lda VECTOR+0
              adc #$38
              sta VECTOR+0
              bcc *+4
              inc VECTOR+1
              cpx #$00
              bne *-34

Keep in mind that each bitmap offset value is already calculated in the context of an actual memory location of a bitmap data, thus an initial assignment of a BITMAP address to a temporary vector generates pointers to subsequent values of actual bitmap offsets, and not just offsets to be added to any potential bitmap's start address. This allows us to get rid of a previously applied assignment of an initial VECTOR value pointing to the very beginning of a BITMAP, which is then shifted according to X and Y coordinates. With a bitmap lookup table we get all of this information for free.

Finally, let us address the issue of computing how much a VECTOR needs to be shifted towards the right side of a screen depending on a provided value of an X coordinate. First things first, know your pattern:

$00,$00,$00,$00,$00,$00,$00,$00
$08,$08,$08,$08,$08,$08,$08,$08
$10,$10,$10,$10,$10,$10,$10,$10

...And so on, and so on. We can see subsequent groups of 8 identical values, each next group increased by a value of $08. Our source code optimisation not only removes the need of calculation of a specific VECTOR shift. It also provides the correct Y index value for the final pixel setting procedure:

              lda bitmap_y_hi,y
              sta VECTOR+1
              lda bitmap_y_lo,y
              sta VECTOR+0

              ldy bitmap_shift,x
              lda (VECTOR),y
              ora bit_mask,x
              sta (VECTOR),y

And table data calculation is just as simple as:

              lda #$00
              tax
              ldy #$00
              sta bitmap_shift,x
              inx
              iny
              cpy #$08
              bne *-7
              clc
              adc #$08
              cpx #$00
              bne *-16

One of the greatest advantages of this procedure is its constant execution time, which allows you to plan execution time of a whole demo effect very precisely. And because it is also very fast, you can put as many as 10-20 times more pixels in the same period of time than you were able when utilising an original put pixel procedure. We ended up with an extremely simple algorithm, which is not only very fast, but also very easy to read and understand. Only eight lines of code are enough to set a pixel anywhere in a bitmap area of a 256x200 size. Pretty good, isn't it?

Further extensions


This procedure plots pixels only on a limited area of 256x200. As already mentioned earlier, consider it an exercise to extend it in a way to support an entire visible bitmap area of 320x200. You need to make sure that a 9-bit value representation is assigned an X coordinate before a subroutine call. This may be achieved by using two 8-bit variables, for example placed on a zero page. But you could also stick to an X register alone and use a single bit flag to indicate whether bit 9 is set or not (carry would be a good candidate here). If a bit is set, you will know that a higher VECTOR byte needs to be incremented by 1 in order to point $100 bytes farther.

Plotting into a multicolour bitmap could be resolved even without this kind of trickery due to the fact that widened multicolour pixels produce a bitmap of 160x200 size. A subtle difference between a hires and a multicolour mode involves the fact that we are not necessarily only setting particular bits in a bitmap memory area, we might also have to clear present bits of a previously displayed pixel colour in the same coordinate position. That was not the case in a hires mode, where a pixel is either set or clear. In a multicolour mode each pixel might assume one of four possible bit combinations representing different colour values:

00 - background colour ($d020)
01 - colour from bits 4-7 of a screen memory
10 - colour from bits 0-3 of a screen memory
11 - colour from bits 0-3 of a colour memory

Therefore, a slight modification of our put pixel algorithm would require one additional step of reseting an original pixel value before setting a selected one. We would also benefit from preparing three different versions of a subroutine, one for each possible colour combination:

              lda bitmap_y_hi,y
              sta VECTOR+1
              lda bitmap_y_lo,y
              sta VECTOR+0

              ldy bitmap_shift,x
              lda (VECTOR),y
              and bit_and_mask,x
              ora bit_and_mask_11,x
              sta (VECTOR),y

Please note that this particular example selects an ORA mask of a colour represented by "11" bits combination. I will leave experimenting with it as well as precomputed table data generation to you as an exercise to do by yourself at home.

Final conclusion


Thank you for surviving this guide up to this point. As you can see setting pixels on a C64 bitmap can be implemented in a fast and elegant way. A final algorithm presented in this tutorial is fast enough for playing around with demo effects requiring plotting of dozens or even hundreds of points per frame. You can use these techniques as a basis for further exploration. Good luck and have fun! Should any questions to this tutorial remain unanswered, please do not hesitate to contact me, and I will be happy to help you out.