Hashing


The term hashing refers to the transformation of a dataset of any size into a string with a fixed, shorter length, which references the original dataset. By means of a hash function, the individual elements from the dataset are initially assigned to a key and then to the hash values, which represent the original data in a certain way. The dataset can consist of strings, lists, files, or other content. The keys for that data indicate the position of the elements in the dataset that are linked to them through hashes.

The hash function maps the elements of the dataset via a key and generates hash values. In the case of hashing, the data is divided into small parts and arranged in a data structure. The hash values allow you to find certain elements ​​in databases much faster, for example, because only the data structure, not the entire dataset, needs to be searched. If you want to find or modify individual data, only the structure has to be searched, the actual data does not need to be reconstructed.

General information

Hashing is used in various areas to facilitate the handling of large amounts of data and to enable secure digital communication. Hashing is used, for example:

  • when storing large amounts of data in databases,
  • in checksums to check the integrity of data,
  • in the programming for the dynamic referencing of program commands (associative arrays),
  • encryption and decryption to ensure the integrity and authenticity of messages, broadcasters, and recipients.

The application possibilities of hash functions and associated concepts are extremely diverse. However, the principle is usually similar. Data is transformed and shortened or modified to such an extent that storage space is saved and access to “hashed” datasets will be faster.

In encryption, hash functions also have special properties that make it impossible to reconstruct the data without having a key. One of these properties is also known as collision-resistant. It is impossible to decrypt the hash function and reconstruct the associated datasets by means of correspondences between hash values ​​and/or datasets with conventional computing power. For this purpose, various operations are carried out on the data, such as blending, compression or mixing. An attacker is not supposed to be able to decrypt the data subsequently.

How it works

To clarify the principle of hashing, two important applications should explained:

Hashing in databases

In databases, hashing is used to form index or data structures. These structures are described as hash tables, whereby the entries on the hash table are connected to the data elements with a function. If an entry is made into a database, it will be assigned a key by the hash function. The key indicates the position an entry in the database and facilitates the search for that entry. If an entry is to be removed, the hash function uses this detour to find the correct entry and then deletes it. Hashing in databases, however, is only one way of organizing and managing data. The more entries exist, the more hash values ​​have to be formed. The probability of collisions increases. Very large databases can also be managed with hashing, but only if the hash table is increased in size and each entry is re-hashed. Hashing is used in a variety of different ways in the areas of business intelligence, OLAP, and data warehousing.

Example

Customers with names, addresses, and other parameters can be stored in a customer database. A search for a particular customer would take a relatively long time if the entire database has to be searched. The computer would have to search character by character for a match. To prevent this, data blocks are formed which are referred to as keys. These keys are translated into hash values ​​using a hash function. For example, each customer could be sorted alphabetically or by other properties.

The alphabetical order of the data is the data or index structure that simplifies the search for an entry. The computer jumps to a point in the data structure for a request because it knows the position through the hash value. If the information found there matches the search input of the user, the search operation is completed. Otherwise the assignment is not unique or there is no entry for the search input.

Hashing in Encryption

In the case of encryption and decryption, hashing is used in the form of hashing algorithms. Digital documents, such as files of the most diverse content types (text, audio, and visual media, for example), can be written as a sequence of zeros and ones. If a digital document is to be hashed, IT specialists perform different operations to transfer the sequence of zeros and ones into a substantially shorter sequence of zeros and ones of fixed length. For this purpose, the information is modified and compacted so that it can be identified, but not reconstructed without knowledge of the key. Each result of a hash function needs to be unambiguously referenced with a dataset. Therefore, cryptological hash functions are also typically one-way functions. Assignment are clear and there will be no collision. If a message is transmitted that has a unique hash value, the sender and recipient can check the integrity of a message, including the fact that it has not been altered by third parties and the sender and recipient can provide a digital signature for each other to determine whether the message sent to the recipient remains unchanged and has been transmitted by the actual sender. Cryptological hash functions have specific properties which make it virtually impossible to compute the keys and thus the source information from a hash value. Therefore, a third party cannot tap into the message transmission and “listen” in.

Example

The MD5 hash algorithm (message-digest algorithm 5) is an encryption and decryption algorithm that generates hash values which are always 128 bits long for strings of any size. A small change in a string results in a completely different hash value.

  • The string “John Doe” is transferred to the hash value 6f9ba3588f545844f2eeeaa71d6e5ada
  • “Jane Smith” gets assigned the hash value 93638fd8b5127c0ed5ec74549646b209
  • “Superhero” gets assigned the hash value a98c85f741fc5d7194fd9e0b9add2230
  • “Random samples” is translated into the hash value d4fd38d5ecf1359de8d795bf5c8b16ca.

MD5 algorithms are currently considered to be no longer secure if they are not refined by other methods (such as Salt for storing passwords). However, to what extent hash functions are secure depends not only on whether two strings produce the same hash (collision) but it also depends on which methods are used for integrity and authenticity testing. For this reason, cryptosystems are now used instead of individual algorithms or hashing methods.

Relevance to programming

Hashing is not only applied in different areas of information technology and security, but hashing algorithms also exist in various different versions. Particularly in encryption, various hashing methods and algorithms are used to increase security. In principle, however, the level of security is only relative, cryptologists are constantly working to increase the security of systems. Hackers and crackers try to uncover security gaps and to expose errors in the functioning of the algorithms.[1] When referring to safe hashing algorithms, such statements are always based on current research. In addition, the choice of hashing algorithms also depends on computing power and the time required for decrypting. For other applications, such as databases or the verification of transmitted data, partial algorithms can be used whose hash values ​​are significantly shorter than cryptological hash procedures.

References

  1. An Illustrated Guide to Cryptographic Hashes unixwiz.net. Accessed on 08/29/2016

Web Links