DataStructure Hashing
When compiling, computer needs to manage variables:
- Inserting (new variables)
- Searching (references of variables)
Dynamic Searching Problems
Two main problems about Hashing:
- Calculating positions: creating a hashing function to locate the keyword
- Dealing with Collisions
Hashing Table
Name: SymbolTable
Data Object Set: Name-Attribute
Operation Set:
1 | SymbolTable InitializeTable(int TableSize); |
Loading Factor: total space of a hashing table m, the number of blocks filled by numbers is n then:
How to Create a Hashing Function?
A good hashing function must be:
- Easy enough
- The result must be distributed uniformly
Some methods:
- Direct Addressing
- Linear Congruential Method
- Digit Analysis: find the most random digits and combine them
- Folding Method: compress different parts in to a single one
- Mid-square Method
When the keyword is a char or string:
-
ASCII checksum: cannot deal with palindrome, vulnerable to collisions
-
One improvement is to move some digits forward (give them weight)
-
Another improvement is to transform the string into a number like Base Sort
-
QinJiuShao algorithm
-
Moving digits
1
2
3
4
5
6Index Hash(const char *Key, int TableSize) {
unsigned int h = 0;
while (*Key != '\0')
h = (h << 5) + *Key++;
return h % TableSize;
}
-
-
How to Deal with Collisions?
- Open Addressing
- Switch to another position!
- Separate Chaining
- Store things in the same place
Open Addressing
For different i, there are different solutions:
- Linear Probing: $ d_i=i $
ASL s and ASL u
- Quadratic Probing: $ d_i = \pm i^2 $
Theorem: if the table length is 4k+3 and is prime number, the quadratic probing can search the whole table space.
- Double Hashing: $ d_i = i\times h_2(key)
h_2(key) = p - (key\ \mod\ p) $
Code
1 | HashTable InitializeTable(int TableSize) { |
Rehashing
When the hash table is too large, the search efficiency will decrease.
0.5 < \alpha < 0.85
One solution is to double the size of the Hash Table and re calculate the whole table. This is called rehashing.
Separate Chaining
1 | typedef struct ListNode * Position, *List; |
The Efficiency of Hash Table
The factors that influence the efficiency of a hashing table can be divided into 3 parts:
- The Hashing function (uniformly)
- The way to deal with collisions
- The α
For linear probing: (insertion and unsuccessful search \ successful search)
For Quadratic probing and double hashing
For Separate Chaining:
Summary
The hashing method is actually consuming space for time.
The hashing method are not suitable for searching things in order and search in some scope or max or min.