circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





 

April 2000, Issue 117

Building a RISC System In AN FPGA Part 1:
Part 2: Pipeline and Control Unit Design by Jan Gray


PIPELINED EXECUTION

To increase instruction throughput, the xr16 has a three-stage pipeline—instruction fetch (IF), decode and operand fetch (DC), and execute (EX).

In the IF stage, it reads memory at the current PC address, captures the resulting instruction word in the instruction register IR, and increments PC for the next cycle. In the DC stage, the instruction is decoded, and its operands are read from the register file or extracted from an immediate field in the IR. In the EX stage, the function units act upon the operands. One result is driven through three-state buffers onto the result bus and is written back into the register file as the cycle ends.

Consider executing a series of instructions, assume no memory wait states. In every pipeline cycle, fetch a new instruction and write back its result two cycles later. You simultaneously prepare the next instruction address PC+2, fetch instruction IPC, decode instruction IPC-2, and execute instruction IPC-4.

Table 1 shows a normal pipelined execution of four instructions. That’s the simple case, but there are several pipeline complications to consider—data hazards, memory wait states, load/store instructions, jumps and branches, interrupts, and direct memory access (DMA).

t1 t2 t3 t4 t5
IF1 DC1 EX1    
  IF2 DC2 EX2  
    IF3 DC3 EX3
      IF4 DC4
Table 1—Here the processor fetches instruction I1 at time t1 and computes its result in t3, while I2 starts in t2 and ends in t4. Memory accesses are in boldface.

What happens when an instruction uses the result of the preceding instruction?

I1: andi r1,7
I2: addi r2,r1,1

Referring to time t3 of Table 1, EX1 computes r1=r1&7, while DC2 fetches the old value of r1. In t4, EX2 incorrectly adds 1 to this stale r1.

This is a data hazard, and there are several ways to address it. The assembler can reorder instructions or insert nops to avoid the problem. Or, the control unit can detect the hazard and stall the pipeline one cycle, in order to write-back the result to the register file before fetching it as a source register. However, these techniques hurt performance.

Instead, you do result forwarding, also known as register file bypass. The datapath DC stage includes FWD, a 16-bit 2-1 multiplexer (mux) of AREG (register file port A), and the result bus. Most of the time, FWD passes AREG to the A operand register, but when the control unit detects the hazard (DC source register equals EX destination register), it asserts its FWD output signal, and the A register receives the I1 result just in time for EX2 in t4.

Unlike most pipelined CPUs, the xr16 only forwards results to the A operand—a speed/area tradeoff. The assembler handles any rare port B data hazards by swapping A and B operands, if possible, or inserting nops if not.