PIPELINED
EXECUTION
To
increase instruction throughput, the xr16 has a three-stage
pipelineinstruction fetch (IF), decode and operand
fetch (DC), and execute (EX).
In
the IF stage, it reads memory at the current PC address,
captures the resulting instruction word in the instruction
register IR, and increments PC for the next cycle.
In the DC stage, the instruction is decoded, and its
operands are read from the register file or extracted
from an immediate field in the IR. In the EX stage,
the function units act upon the operands. One result
is driven through three-state buffers onto the result
bus and is written back into the register file as
the cycle ends.
Consider
executing a series of instructions, assume no memory
wait states. In every pipeline cycle, fetch a new
instruction and write back its result two cycles later.
You simultaneously prepare the next instruction address
PC+2, fetch instruction IPC, decode instruction IPC-2,
and execute instruction IPC-4.
Table
1 shows a normal pipelined execution of four instructions.
Thats the simple case, but there are several
pipeline complications to considerdata hazards,
memory wait states, load/store instructions, jumps
and branches, interrupts, and direct memory access
(DMA).
| t1 |
t2 |
t3 |
t4 |
t5 |
| IF1 |
DC1 |
EX1 |
|
|
| |
IF2 |
DC2 |
EX2 |
|
| |
|
IF3 |
DC3 |
EX3 |
| |
|
|
IF4 |
DC4 |
| Table
1Here the processor fetches instruction
I1 at time t1 and computes its result in
t3, while I2 starts in t2 and ends in t4. Memory
accesses are in boldface. |
What
happens when an instruction uses the result of the
preceding instruction?
I1:
andi r1,7
I2: addi r2,r1,1
Referring
to time t3 of Table 1, EX1 computes r1=r1&7, while
DC2 fetches the old value of r1. In t4, EX2 incorrectly
adds 1 to this stale r1.
This
is a data hazard, and there are several ways to address
it. The assembler can reorder instructions or insert
nops to avoid the problem. Or, the control unit can
detect the hazard and stall the pipeline one cycle,
in order to write-back the result to the register
file before fetching it as a source register. However,
these techniques hurt performance.
Instead,
you do result forwarding, also known as register file
bypass. The datapath DC stage includes FWD, a 16-bit
2-1 multiplexer (mux) of AREG (register file port
A), and the result bus. Most of the time, FWD passes
AREG to the A operand register, but when the control
unit detects the hazard (DC source register equals
EX destination register), it asserts its FWD output
signal, and the A register receives the I1 result
just in time for EX2 in t4.
Unlike
most pipelined CPUs, the xr16 only forwards results
to the A operanda speed/area tradeoff. The assembler
handles any rare port B data hazards by swapping A
and B operands, if possible, or inserting nops if
not.