November
1998, Issue 100
Smart
Rocket
ZIPPING
IN AND OUT
The
secret to SX performance is simple, relying as it does
on the traditional technique of pipelining. The four-stage
pipe in Figure 2 is a classic, similar to those found
in earlier (but typically larger, like 32-bit) machines.
|

(Click
here to enlarge)
|
Figure
2—SX performance is obtained via pipelining
using a time honored fetch-decode-execute-writeback
design. Since a pipeline can only run as fast as
its slowest stage, great attention was paid to the
flash-memory design to achieve the 10-ns access
time required by the 50-MHz clock rate. |
There
is one, and only one, reason to use a pipeline and that’s
to boost the clock rate, which ultimately is limited
by memory access time.
In
compatibility mode, the SX reverts to four clocks per
instruction (eight for JMPs and CALLs), same as a PIC.
Flip the turbo switch, and the pipeline kicks in.
Once
filled, the pipe delivers close to one instruction per
clock. However, as with all pipelined machines, there
are some caveats to be aware of.
The
JMP and CALL penalty is relatively worse due to the
need to refill the pipe. Where such instructions require
two cycles (eight clocks) in compatible mode, they need
three cycles (three clocks) in turbo mode, derating
the advantage to 2.66´ (8 divided by 3) for those instructions
versus 4´ for most others.
Another
example is IREAD, one of the ten new instructions added
by Scenix (see Table 1). IREAD
enables a program to read the instruction memory, something
that’s nontrivial in a Harvard design (separate program
and data memory).
Given
the complication involved, IREAD requires the same number
of clocks (four) in both compatibility and turbo mode.
But, it’s faster than previous data-lookup schemes and
can access the entire code space.
Pipelined
machines are also subject to various hazards that must
be obviated by hardware, software, or both (e.g., the
problem of trying to read data at the same time it’s
being written).
Consider
a sequence of instructions involving a back-to-back
write followed by a read of the same data. Instruction
n is writing data (write stage) even as instruction
n + 1 (execute stage) wants to read it.
With
on-chip RAM, the SX includes forwarding logic that handles
such an obstacle transparently in hardware. Thus, one
instruction can write to RAM and the next one can safely
read from the same location. However, for I/O ports,
there are precautions concerning successive operations.
For
pins configured as outputs, the SX reads the actual
pin level, not the output latch. I think the SX approach
is superior because it enables the detection of external
problems such as a shorted or excessively loaded pin.
It’s
easy to confirm that the output-pin level is or isn’t
what it’s supposed to be. By contrast, reading the output
latch, rather than the pin, leaves you blind to outside
interference.
As
a consequence, a write to a port may not propagate through
to the pin in time to be recognized by an immediate
read. Depending on the clock rate and pin loading, a
non-port instruction should be inserted to split up
a back-to-back port write and read. Similarly, the possible
difference between output latch and pin level calls
for care when using read, modify, and write instructions
like SETB and CLRB.
The
I/O pins themselves (4-bit Port A, 8-bit Port B, and,
for 28-pin devices, 8-bit Port C) are versatile. Each
pin is individually programmable as input or output,
with or without an internal pull-up resistor. All inputs
are selectable as TTL or CMOS levels, and Port B and
Port C inputs can be individually defined as Schmitt
triggered.
Outputs
can sink and source 30 mA (subject to overall device
power limit), with those on Port A featuring symmetrical
drive (i.e., centered about VDD/2 under any load). This
feature is useful for driving speakers and other pseudoanalog
functions such as using a PWM to implement a DAC.
As
inputs, pins of Port B can be individually enabled to
act as wakeups (with programmable edge selection) from
low-power sleep mode. Or, three pins of Port B can be
configured as an analog comparator. Two inputs (RB1
and RB2) are compared with the result (greater than
or less than) reflected on output RB3.
Besides
general-purpose I/O, the SX includes an 8-bit timer/counter
(RTCC) and watchdog timer, either of which (but not
both at the same time) can be mated with an 8-bit prescaler.