Issue
146 September 2002
Build
Your Own 8051 Web Server
RUN-TIME
PROFILING
Web
pages are for human consumption, and 100-ms response
times appear snappy. Using the Finisar (formerly Shomiti;
go to www.finisar.com for more information) Surveyor
Lite, I measured the time required for my 8051 web server
to serve the 7-KB web page to a 500-MHz Pentium machine
running MSIE 5.5. [2] It took 60 ms. In contrast, it
takes my 100-MHz Pentium server running Apache about
32 ms to serve the same page. This demonstrates that
the response of the 8051 is respectable. For these measurements,
I had to clear the browser’s page cache each time to
make sure my browser was actually transferring the web
page rather than just displaying a cached version.
To
figure out how much time my web server spent doing various
tasks, I added debug code that set a port pin when the
CPU began the task and cleared the pin when it completed
the task. Because many of the tasks I was interested
in run a number of times while serving the web page,
I had to add up all the pulse on times. This was hard
to do with a oscilloscope, so I used an 82C54 timer
chip. The 80C51 port pin output drives the 82C54 gate
input. When in the high state, the 82C54 counts transitions
from a 1-MHz oscillator. This provides accumulated pulse-width
times with 1-µs resolution. I set up another 82C54 counter
to count the number of times an event ran. Run times
are summarized in Table 2.
The
total of 47.4 ms falls short of the 60 ms it takes to
transfer the page. When I added up the intervals between
the 8051 sending an Ethernet frame and the browser’s
500-MHz Pentium responding, I came up with 8.5 ms. This
accounts for most of the difference. It’s mind boggling,
but true, that the 8051 is waiting for a Pentium.
Searching
for and replacing tags is the most time-consuming task.
My web page uses tags as placeholders for dynamic values,
such as temperature. When it serves the page, it searches
for these tags and then replaces them with the appropriate
value.
It
turns out that the strstr() function is the time hog.
After some investigation, I found this to be true in
general of strstr(). This makes sense because it has
to parse through a lot of text, comparing each letter
of the text to the corresponding letter of the search
string. It may have many partial matches before it finally
finds a complete match. One way to speed up the process
would be to tightly limit the search range of strstr().
Another approach would be to keep an index of offsets
to the tags, but the index would need to be changed
each time a page was added or modified.
The
second most time-consuming task is copying the web page
from flash memory to RAM, using memcpy(). Why not just
skip this step and copy directly from flash memory to
the CS8900A? Again, the tags are the problem; they need
to be replaced with actual values, and you can’t replace
them while in flash memory. Perhaps a faster approach
would be to copy directly from flash memory to the CS8900A,
looking for tags as you copy. But then you would have
a thorny problem with the TCP checksum. It’s computed
over the entire segment, but must be inserted at the
beginning of the segment.
It’s
interesting to note that the checksum is computed a
whopping 38 times to transfer a single web page. This
transfer is made up of 19 Ethernet frames, 11 from my
8051 server and eight from the browser. It takes three
frames to establish the connection, two frames to transfer
the HTML page, eight frames to transfer the image, and
six frames that are just acks. For both incoming as
well as outgoing frames, two checksums are computed:
one for the IP header and the other for the TCP segment,
which makes 38 checksums. I was glad I used assembler
for the checksum code!
I
can’t help but wonder how much a 16-bit CPU would speed
things up, just by virtue of its being 16 bits. The
checksum would certainly run faster because the sum
is done over 16-bit chunks. Also, CS8900A I/O is 16
bits. Other tasks, such as memcpy() and strstr(), may
need custom library code, because many 16-bit compilers
default to doing these operations 1 byte at a time.