January
2006, Issue 186
Third-Generation
Rabbit
A
Look at the Rabbit 4000
SPEEDING
UP THE BUS
The
maximum clock speed has increased with each new Rabbit
and has gotten to the point where it’s starting to be
expensive to find memory that can be used with zero
wait states. One obvious solution to this problem was
to increase the width of the data bus.
A
16-bit bus had to be an option and not a requirement,
though, because the majority of Rabbit 4000 designs
will still use 8-bit memories for cost reasons. This
is what makes life interesting for me as a designer.
The
16-bit bus uses one of the parallel ports for the extra
8 bits of data to keep the package pin count low. Supporting
byte writes on a 16-bit bus often leads to a bit of
external glue logic. I wanted to avoid this, so a separate
port pin can be programmed to provide the necessary
byte-lane control signal for a glueless memory interface.
Rabbit
wanted the option for 16-bit memories on two of the
device’s three chip selects, including the chip select
used as the default after reset. Because the default
is an 8-bit bus, this means that the first few bytes
of code in a 16-bit memory connected to this chip select
must be capable of switching the processor to 16-bit
mode.
An
additional complication was that the code must execute
identically whether it’s in 16-bit mode or 8-bit mode
(after a reset). This limited me to using only pairs
of 1-byte instructions. This restriction comes about
because in 8-bit mode the CPU will actually be fetching
the same instruction twice on an 8-bit bus until the
switch to 16-bit mode occurs (see Listing
1).
The
code first builds 0x02, which is the data value that
enables 16-bit operation and stores it in B. Then, the
I/O address of 0x1D is built in L. I then take advantage
of the fact that multiple I/O prefixes (the IOI) are
interpreted as one to do the actual I/O write. The first
write is to an I/O register, but the second write is
going to go to memory. But this is fine because memory
writes are disabled after reset. The two NOPs allow
time for the actual switch, and away we go with a 16-bit
bus!
Just
providing a 16-bit bus doesn’t improve performance unless
the processor can actually use all 16 bits at once.
Although major changes have been made to the register
architecture and instruction set, the CPU itself still
takes in instructions 1 byte at a time. So along with
the 16-bit bus comes a 3-byte prefetch queue.
The
prefetch mechanism is coupled with the instruction execution,
although wait states when prefetching don’t slow down
execution. Instead, the prefetch runs semi-autonomously,
attempting to always keep at least 1 byte in the prefetch
queue. But when the execution unit knows that a write
operation is coming up, it notifies the prefetch not
to start any new reads that might slow down the write.
In a similar fashion, when a branch instruction is recognized,
the prefetch is notified to stop when the branch address
has been completely fetched into the queue. All of this
leads to a measurable performance gain with a 16-bit
bus.
The
final performance option is support for 16-byte Page
mode. Page mode memories are capable of supplying data
in the same page much faster than for ordinary random
accesses. However, Page mode memory requires that both
the chip select and the output enable remain active
with only the address changing to take advantage of
the faster access time. The Rabbit 4000 supports this
Page mode operation on either an 8-bit bus or a 16-bit
bus. It includes separate wait state values for initial
and subsequent memory reads. Separate control bits are
available for each chip select.