iOS in-depth analysis of the application of Hash in iOS

iOS in-depth analysis of the application of Hash in iOS, 해시게임 That is to say, it accesses records by mapping the key value to a position in the table to speed up the search.

First, the definition of the hash table

A hash table (also called a hash table) is a data structure that directly accesses and accesses the memory storage location according to the key.

This mapping function is called a hash function, and the array storing the records is called a hash table.


Given a table M, there is a function f (key), for any given keyword value key,

if the address of the record containing the keyword in the table can be obtained after substituting the function,

then the table M is called a hash (Hash) Table, and the function f(key) is a hash (Hash) function.


If the keyword is k, its value is stored in the storage location of f(k),

so that the checked record can be obtained directly without comparison,

and the corresponding relationship f is called a hash function.

The table established according to this idea is the hash table.


The same hash address may be obtained for different keywords,

that is, k1≠k2 and f(k1)=f(k2), this phenomenon is called collision (English translation: Collision).

Keywords with the same function value are called synonyms for that hash function.

To sum up, according to the hash function f(k) and the method of dealing with conflicts,

a set of keywords is mapped to a limited continuous address set (interval),

and the “image” of the keyword in the address set is used as a record The storage location in the table is called a hash table,

and this mapping process is called hash table creation or hashing,

and the resulting storage location is called a hash address.


If for any keyword in the keyword set, iOS in depth analysis of the application.

the probability of being mapped to any address in the address set by the hash function is equal, then such a hash function is called a uniform hash function (Uniform Hash function).

It is to make the keyword get a “random address” through the hash function, thereby reducing the conflict.


The hash table is essentially an array,

each element in the array becomes a box,

and the box stores key-value pairs,

and the value is taken from the array according to the subscript index.

The key is how to get the index,

which requires a fixed function (hash function) to convert the key into an index.


No matter how perfect the hash function is,

it may happen that different keys get the same hash value after hashing,

and hash collisions need to be dealt with at this time.

Second, the advantages and disadvantages of hash table

Advantages: Hash tables can provide fast operations.


shortcoming:


Hash tables are usually based on arrays,

which are difficult to expand after arrays are created;


There is no easy way to iterate through the data items in a table in any order (say from smallest to largest).


To sum up, if there is no need to traverse the data in order,

and the size of the data volume can be predicted in advance,

then the hash table is unmatched in terms of speed and ease of use.

third, Hash lookup steps

Use the hash function to map (convert) the key to be searched into the index of the array.

Ideally (the hash function is designed reasonably),

the array subscript of different keymaps is also different, and all the search time complexity is O(1).

But this is not the case in practice, so the second step of hash lookup is to deal with hash collisions.


There are many ways to deal with hash collision collisions,

such as the zipper method and the linear detection method.

Fourth, the hash table stores the procedure

Use the hash function to get the hash value h according to the key;


If the number of bins is n, then the value should be stored in the bottom (h%n) bins, and the value range of h%n is [0, n-1];


If the box is not empty (a value has been stored),

that is, different keys get the same h, resulting in a hash conflict. At this time,

the zipper method or the open-addressing linear detection method needs to be used to resolve the conflict.

Five commonly used hash function

The first step in a hash lookup is to use a hash function to map keys to indexes.

This mapping function is a hash function.


If you have an array holding 0 ~ M, then you need a hash function that can convert any key to an index (0 ~ M-1) in the range of that array,

the hash function needs to be easy to compute and be able to distribute all keys evenly.


To give a simple example, it is better to use the last three digits of the mobile phone number than the first three as keys,

because the repetition rate of the first three mobile phone numbers is very high; The number of digits is better.


In practice, our keys are not all numbers, they may be strings, or a combination of several values, etc.,

so you need to implement your own hash function:


direct addressing method;


digital analysis;


The square is the Chinese method;


folding method;


random number method;


Divide with the remainder method.


It is not easy to design an excellent hashing algorithm.

Based on experience, we summarize several requirements that need to be met:


The original data cannot be reversely derived from the hash value (so the hash algorithm is also called a one-way hash algorithm);


Very sensitive to the input data, even if only one Bit is modified in the original data, the final hash value is very different;


The probability of a hash collision is very small. For different original data, the probability of the same hash value is very small;


The execution efficiency of the hash algorithm should be as efficient as possible,

and the hash value can also be quickly calculated for longer texts.