Start
Pipelined
Execution
Memory
Accesses
Branching
Out
Interrupts
Control
Unit Design
Decode
Stage
The
Execute Stage
PDF
INTERRUPTS
When an interrupt request
occurs, you must jump to the interrupt handler,
preserve the interrupt return address, retire the
current pipeline, execute the handler, and later
return to the interrupted instruction.
When INTREQ is asserted,
you simply override the fetched instruction with
int, that is, jal r14,10(r0) via the IRMUX.
This jumps to the interrupt handler at 0x0010 and
leaves the return address in r14, which is reserved
for this purpose. When the handler has completed,
it executes iret, (i.e, jal r0,0(r14)) and exection
resumes with the interrupted instruction.
There are two pipeline
issues here. First, you must not interrupt an interlocked
instruction sequence (any add, sub, shift, or imm
followed by another instruction). If an interlocked
instruction is in the DC stage, the interrupt is
deferred one cycle.
Secondly, the int must
not be inserted in a branch or jump shadow, lest
it be annulled. If a branch or jump is in the DC
stage, or if a taken branch or jump is in the EX
stage, the interrupt is deferred.
The simplicity of the
process pays off once again. The time to take an
interrupt and then return from a null interrupt
handler is only six cycles.
You might be wondering
about the interrupt priorities, non-maskable interrupts,
nested interrupts, and interrupt vectors. These
artifacts of the fixed-pinout era need not be hardwired
into our FPGA CPU. They are best done by collaboration
with an on-chip interrupt controller and the interrupt
handler software.
The last pipeline issue
is DMA. The PC/address unit doubles as a DMA engine.
Using a 16 × 16 RAM as a PC register file, you can
fetch either an instruction (AN ? PC0 += 2) or a
DMA word (AN ? PC1 += 2) per memory cycle.
After an instruction
is fetched, if DMAREQ has been asserted, you insert
one DMA memory cycle.
This PC register file
costs eight CLBs for the RAM, but saves 16 CLBs
(otherwise necessary for a separate 16-bit DMA address
counter and a 16-bit 2-1 address mux), and shaves
a couple of nanoseconds from the systems critical
path. Its a nice example of a problem-specific
optimization you can build with a customizable processor.
To recap, each instruction
takes three pipeline cycles to move through the
instruction fetch, operand fetch and decode, and
execute pipeline stages. Each pipeline cycle requires
up to three memory access cycles (mandatory instruction
fetch, optional DMA, and optional EX stage load
or store). Each memory access cycle requires one
or more clock cycles.