CURRENT ISSUE
Contests
Test your eq
|
|
Issue #212 March 2008
Problem 4—For the data structure commonly known as a “hash” (“associative array” in some scripting languages), what is the difference in overall performance (in order-of-magnitude terms) between storing items into the array and reading them out again?
Answer 4 —The answer is somewhat dependent on the internal strategies used in the hash implementation.
A hash works by converting the index value for an item (its “key”) into a number that can be used to select a storage “bucket” for that item. The function that performs this operation is called a “hash function” and is designed to operate in constant time for a given length of key.
The storage bucket may contain more than one item, if the hash function comes up with the same value for more than one key. This is usually implemented as a linked-list structure associated with each bucket. A key design decision for a given hash storage system is whether the number of buckets is limited and/or whether the number of items per bucket is limited.
Reading data items out of a hash is generally acknowledged to be O(1), or constant time, as long as there is a fixed upper bound on the number of items per bucket. This implies that there is a mechanism for increasing the number of buckets as more items are stored in the hash. This strategy is a good overall choice as long as reads outnumber stores.
However, changing the number of buckets in a hash requires reallocating all of the items already in the hash to new buckets, requiring a pass through all of the data each time this occurs. For example, in the Perl language, the number of buckets is doubled each time the number of keys exceeds the number of buckets. This means that for large hashes, the performance of storing items is O(log N) rather than O(1).
On the other hand, if the number of buckets in the hash is fixed, the problem of reallocating all of the items from time to time is avoided, but now the size of the individual buckets can grow arbitrarily large. Storing or retrieving an item in the hash becomes O(N). If the linked list associated with each bucket is replaced with a tree structure, the performance for both reads and stores will be improved to O(log N).
Back to Issue #212 Questions | Test Your EQ Archive List
|