circuitcellar.com
Magazine Support   Digital Library   Products & Services   Suppliers Directory 
 
 





Priority Interrupt Archive

 
September 2006, Issue 194

Priority Interrupt
by Steve Ciarcia


My Kingdom for a Petabyte

 

If I had to pick the consumer device that has affected my lifestyle most in recent memory, it wouldn’t be a PDA, HDTV, satellite radio receiver, or a new car. It would be a flash drive. You know, it’s that little thumb-sized memory device that plugs into a USB port and that too many of us use to make our entire digital consciousness portable.

When I travel, I don’t always have access to the Internet, so it’s very handy being able to carry a significant portion of the My Documents folder and hundreds of pertinent stored web pages along with me. It’s amazing how much data we can stuff in a small package these days.

Of course, it wasn’t always like that. Back in the early ’80s, one of my most popular projects was a single-board CPM computer called the SB180. It was physically mounted on top of a 5.25² 10-MB hard drive, the highest capacity available at the time. Today, a consumer hard drive with smaller dimensions and a lower price has about 500 GB. To put that in perspective, if we were to try to create 500 gigs back then using available 10-MB drives, the volume would be equivalent to about 32 full-size refrigerators. Putting all that memory in the palm of my hand now is what I call progress.

Memory capacity doubles about 40% to 90% per year depending upon whom you listen too. (Wikipedia attributes Mark Kryder, a Seagate engineer, with Kryder’s law that says memory capacity doubles every 13 months.) Whatever the real statistics, the net result is that we have a bunch more data storage available now than we used to have and it opens some very interesting possibilities.

My little flash drive filled with a few hundred web pages is nothing. Consider the Library of Congress. (For our foreign readers, this is the U.S.’s national library, and it has about 850 km of bookshelves.) It is estimated that the print holdings of the Library of Congress would be, if digitized and stored as plain text, about 20 terabytes of data. Let’s be magnanimous and say that if we added all the graphics it would double it. So, with 40 terabytes of storage, we have the whole place. It’s not exactly desktop-compatible, but I think we can probably stuff 40 terabytes in a manageable-sized equipment rack using today’s technology.

The increasing amounts of storage capacity we have today portend an interesting option if the trend continues. Archiving the Library of Congress on your laptop could be child’s play. The really interesting application would be to archive the whole HTML Internet on it!

Don’t laugh. While putting the web on a hard drive isn’t to be taken literally, in theory it’s just a matter of memory. It is estimated that there are currently about 10 billion web pages. If we can say that they average 100 KB per page, that’s a total of 1 million gigabytes (a petabyte, PB). OK, it will be a while before we all have 1-million-gigabyte hard drives, but it is conceivable. (And we’ll probably need more for an even larger web by then. Interestingly, Google is reported to currently have about 4 PB in its 450,000 web servers.)

Getting increased storage like that in a single storage device will take some new technology. One promising avenue is an optical disc technology still in the research stage called Holographic Versatile Disc (HVD). HVD uses two lasers to greatly increase the storage density in three-dimensional storage medium and significantly outpaces even the latest high-definition HD DVD and Blu-ray disk technology. While not quite there yet, an HVD disk is expected to hold up to 3.9 terabytes (equal to approximately 6,000 CD-ROMs or 830 DVDs). It would only take a dozen HVDs to contain the Library of Congress. More importantly for future iPod users, consider that one HVD could hold about 10,000 hours of MPEG-4 video. Just think what happens when we go from holographic storage to being able to address every atom in a one-cubic-inch crystal. How much memory is that?

OK, I’ll admit that it might be a while before notebooks have million-gigabyte hard drives so I can have the whole Internet offline, but apparently there is an interim alternative. Under the “Why didn’t I think of that?” category, Webaroo.com is offering a free software alternative for the interim. Instead of putting a million gigabytes in the box, Webaroo heavily compresses portions of the web, sort of like Internet Explorer’s offline web caching. Webaroo doesn’t give you the whole Internet; instead, it concentrates on supplying preloaded “web packs” with cached pages on different subject areas, like big cities, daily newspapers, etc. The presumption is that we all use only the first 20 listings provided by a search engine, and Webaroo concentrates on making those predominant sources available offline. If these guys are successful, who needs a petabyte? Of course, I’ll have to get back to you on that. ;-)