[Novalug] hash table theory

Clif Flynt CLIF@CFLYNT.COM
Sun Mar 15 09:27:15 EDT 2015


On Sun, Mar 15, 2015 at 03:05:59AM -0400, Jon LaBadie via Novalug wrote:
> ...
> I'm not versed in hash theory.  Is it typical for the
> number of keys to match the number of data elements?
> ...

  The trick with a hash is that you're taking a big "thing" and
reducing it to a small "thing".  Since we're using computers, our
"things" are all numbers (or ASCII strings, which are numbers).

  A hash is a fast lookup technique if 
  
  1) It's fast and easy to calculate the hash.

  2) You don't have collissions, where two "large things" convert
     to the same "small thing"
     
  3) If you do have collissions, it's fast and easy to find the
     correct element.

  Remember that 64K is 2^16 - a short word.  That's a pretty small
"thing".

  So, in this case, the code is configured to reduce a large "thing",
probably a text string to a 16 bit short word.

  The underlying code is probably allocating a "C" style array
to hold pointers to the actual data.

  ie: Say my string is "testing" at 0x1234 and that hashes to
0x0001, my "C" array would start as:

00000000
00001234
00000000

  So, my underlying code could calculate the hash, go to the table
and then straight to the original data.

  If you are very memory tight (unlikely nowadays) and have a very
sparse array of data, you might allocate two words for each hash
element and save a sorted list of hash values and pointers:

0001 00001234
0005 00005432

  Performance would be diminished, but memory efficiency would be
improved, given a small number of items in the hash table.

  Most hash libraries provide decent performance for common
size/complexity data. If you really need optimal data access speeds,
you examine the data you'll be hashing and either tweak the algorithm
or build your own hash library.

  Does this help?
  Clif

-- 
... Clif Flynt ... http://www.cwflynt.com ... clif@cflynt.com ...
.. Tcl/Tk: A Developer's Guide (3'd edition) - Morgan Kauffman ..
...22'nd Tcl/Tk Conference: Oct 19-23, 2015 - Northern VA, USA...
.............  http://www.tcl.tk/community/tcl2015/  ............








More information about the Novalug mailing list