How to implement linear hashing in Python

Hashing addresses the need to quickly locate or store an item in a collection. Hashing is a method for increasing productivity by effectively filtering the search.

What is hashing?

The process of translating keys and values into a hash table using a hash function is known as hashing. We use hashing for quicker access to elements.

The creation of hash tables is the most common application of hashing. It involves turning a string of characters into a key or a fixed-length value, which typically reflects the longer string but is shorter. Since it takes less time to discover an item using the shorter hashed key than it does to find it using the original value, databases employ hashing to index and retrieve data.

What is hashing used for?

Obtaining data: When looking for objects on an object data map, a hash may be utilized to focus our search.

For instance, developers store data in the form of key and value pairs in hash tables, which may include customer records. The hash code or the integer is then translated to a predetermined size, and the key serves as an input to the hashing method to help identify the contents. The supported function by hash tables includes: insert (key, value) (key, value), get (key) (key), and delete (key) (key).
Electronic signatures: Hashing aids in the encryption and decryption of digital signatures need to authenticate message senders and recipients in addition to enabling quick data retrieval. In this case, the digital signature is changed by a hash function before the hashed value, and the signature is transmitted separately to the recipient.

What is linear hashing?

A disk-based index structure called linear hashing, dynamically updates, grows, or reduces one bucket at a time (or you can call it a dynamic hashing scheme). The index is used to locate the record that corresponds to a certain key or to make exact match searches easier.

The method is called linear hashing because the number of buckets grows or shrinks linearly. The maximum number of hashing functions that the system can use at once varies dynamically.

The distinction between linear hashing and other hashing

There is no required directory in linear hashing.
It is capable of handling long overflow chains.
It is more flexible with respect to the timing of buckets splits
It allows us to grow one slot at a time.

Linear hashing terminology

As mentioned earlier, linear hashing is flexible with respect to the timing of the bucket split, and it allows us to choose from two splitting criteria.

Load factor: it is calculated by diving the number of entries by the number of buckets multiplied by bucket capacity. We can set the split trigger to any load factor value we want, giving us greater control over the space utilization of the hashing table.
Load factor = 4/3*3 = 40%
Overflow: it is very simple to understand, once we detect an overflow after an incursion, we trigger the split.

Linear hashing implementation

Let's walk through an example to see how linear hashing works:

#list definition
init_list = [120 , 111 , 80 , 260 , 118 , 110 , 100 , 97 , 85]
HashValues = []
#define a function to accept the data from the list
def linear_hash_function(init_list):
    #another list with a none datatype 
    second_list = [None for i in range(10)]
    for i in init_list:
        #let's append the values from the second list to the HashTable
        HashValues.append(i % len(second_list))
        second_list[i % len(second_list)] = i
    print(second_list)
    print(init_list)
    print(linear_hash_function)
    
    
print(linear_hash_function(init_list))

Code explanation

Line 2: We define a list with the name init_list and add nine values to the list.
Line 5: Another list HashValues is defined so as to save the final values after the linear hashing is done.
Line 8: We define a function with init_list as the parameter and went ahead to define a local list within the function name, which has a None datatype because we want to handle null datatypes in the list within the range of 10.
Line 11: We create a for loop. We append the values from the second_list to the HashTable. Then for all values (i) in second_list find the modulo and set it equal to i.
Line 23: Then, we print the function linear_hash_function(init_list) .

The output:

When we print the output, we get a None output because it contains null values or no values at all.

Note: None is not the same as an empty string, 0 or False, it is a datatype of its own (NoneType).

Conclusion

We learned about linear hashing in this Answer. We talked about what hashing is, what linear hashing itself means, how it is different from other hashing techniques, terminologies, and its implementation.

Terminology	Description
h_0,h_1,h₂	A family of hash functions, where each funtion's range it twice of its predecessor
N	Initial number of buckets
d₀	The number of bit to represent N
Level	Iindicate the number of spilt cycle completed, initially 0
Next	Pointer to the next bucket inline to be split

Properties
N	= 2^d0, for some d₀
d_i	d₀ + i
h_i(Key)	h(key) mod(2ⁱN)

N	= 4
Level	= 0
Split by Overflow
Next	= 0 (the first bucket)

h₀
00	32 (100000)	44 (101100)	36 (100100)
01	9 (1001)	25 (11001)	5 (0101)	37 (100101)
10	14 (1110)	18 (10010)	10 (1010)	30 (11110)
11	31 (11111)	35 (100011)	7 (0111)	11 (1011)

h₀					Overflow
00	32 (100000)	44 (101100)	36 (100100)
01	9 (1001)	25 (11001)	5 (0101)	37 (100101)
10	14 (1110)	18 (10010)	10 (1010)	30 (11110)
11	31 (11111)	35 (100011)	7 (0111)	11 (1011)	43 (101011)

How to implement linear hashing in Python

What is hashing?

What is hashing used for?

What is linear hashing?

The distinction between linear hashing and other hashing

Linear hashing terminology

Terminologies in Linear Hashing

_{Linear hashing math properties}

Linear Hashing Math Properties

Linear hashing implementation

Linear Hashing Implementation

Insert 37

Inserting 37

Insert 43

Inserting 43

Splitting

Splitting table

Code example

Code explanation

Conclusion

h₁	h₀					Overflow
	00	32 (100000)	44 (101100)	36 (100100)
	01	9 (1001)	25 (11001)	5 (0101)	37 (100101)
	10	14 (1110)	18 (10010)	10 (1010)	30 (11110)
	11	31 (11111)	35 (100011)	7 (0111)	11 (1011)	43 (101011)
100	00	44 (101100)	36 (100100)