circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





 

August 2006, Issue 193

Turning the Core-ner



by Tom Cantrell


Start Prop Job Cog in the Machine Hubba-Hubba Spin Control It's a Cog's Life Propeller Heads Wanted Sources and PDF

HUBBA-HUBBA

At this point, you’re probably thinking: “Looks pretty simple. What’s the big deal?”

The big deal is found in Photo 2 (p. 80). The Propeller packs eight cogs worth of multicore machismo underneath its otherwise mild-mannered MCU exterior. Time to cue the The Twilight Zone music because now it starts to get a little spooky.

(Click here to enlarge)

Photo 2—Like the Cobra racecar of yore, Propeller crams a high-output eight-cylinder engine (i.e., eight cogs) in a small chassis to blow the doors off conventional MCUs.

In techspeak, Propeller is a symmetric multiprocessor, or SMP (i.e., the cogs are all the same), using “shared memory” as the communication medium. Said shared memory, comprising 32 KB of RAM and 32 KB of ROM, is found in the hub.

The mechanism by which the memory is actually shared is invariably one of the messier aspects of multiprocessor design. A traditional approach has individual processors contending for access to the memory when they want it with an arbitration mechanism imposed to resolve conflicts.

Now, I suppose arbitration is better than a jury trial (“Ladies and gentleman, my client was unfairly denied access.”), but it’s still messy. First, the arbitration logic resides in the critical path between processor(s) and memory, thus slowing everything down. Second, it introduces timing uncertainty for all processors depending on the arbitration outcome (i.e., whether a processor is immediately granted access or it has to wait for another processor to finish). Finally, although not a requirement, arbitration schemes often lead down the primrose path of architectural embellishments such as priority (some processors have more access rights than others), which themselves lead to potential problems (e.g., priority inversion), calling for more hack workarounds (e.g., dynamically programmable priority). It’s a death spiral of complexity, delay, and uncertainty.

By contrast, the Propeller sharing scheme is brutally simple. Like the distributor in an old V8, the hub simply goes round and round granting access to each of the cogs in turn (see Figure 2). The obvious downside is that cogs get access even if they don’t need it, blocking others that possibly do. However, the distributor approach minimizes the jitter (i.e., lack of determinism) that plagues traditional arbitration schemes.

(Click here to enlarge)

Figure 2—Multicores are fine when each core is doing its own thing. The challenge arises when they contend for access to shared resources. Propeller uses a round-robin hub that grants each core deterministic access to shared RAM and ROM.

From a cog’s perspective, the only uncertainty involves waiting for the first access to shared memory as the distributor spins around. After that first access is obtained, cogs can schedule their subsequent activity knowing precisely when subsequent accesses will be granted—no ifs, ands, or buts.

The hub also includes eight semaphores. However, these have nothing to do with the basic sharing mechanism. The distributor itself guarantees there can be no sharing conflicts for a single (byte, word, or long word) access. Indeed, that guarantee is exploited to implement the semaphores themselves. Rather, the semaphores are a way for applications to adjudicate shared access to higher-level structures (arrays and I/O) if necessary.

The hub is also where the clocks for the entire chip (cogs and hub) are derived and distributed. As I mentioned earlier, one option is an on-chip RC oscillator that offers nominal 12 MHz and 20 kHz selections. The 20 kHz option is useful as a sleepy mode because cogs only consume about 3 µA at that clock rate. The other option is an external crystal or oscillator, which feeds a programmable PLL, boosting the rate by up to a factor of 16.

Now is a good time to talk megahertz and MIPs. The first chips run at up to 80 MHz (e.g., 5-MHz crystal with 16× PLL clock multiplier). Virtually all cog instructions execute in four clocks, except conditional branches, which require four (branch taken) or eight (not taken) clocks. That means the performance for the entire Propeller chip approaches 160 MIPs, or roughly 16 MIPs per buck. Not bad at all.