Prefetching

In computer architecture, prefetching refers to the retrieving and storing of data into the buffer memory (cache) before the processor requires the data. When the processor wants to process the data, it is readily available and can be processed within a very short period of time. If the data were not stored in the cache, the processor would have to download directly from the memory address - accordingly, there would be delays. Prefetching aims to reduce access speed and is used at several levels of a system, for example, even with DDR, main memories or boot operations of operating systems. Which data will be loaded into the cache is often determined using a branch prediction algorithm.

General information on the topic

Processors run a program by retrieving data from memory and executing the instructions contained in the data. The operating speed of a processor (measured in clock cycles) is usually faster than the main memory. A clock cycle is usually made up of four phases:

Loading the data
Decoding the data
Executing the data
Writing out the results

Very often more data needs to be obtained so that the program can continue. At this point prefetching mechanisms come into play to reduce the number of clock cycles. The processor operates sequentially which means it processes one command at a time. If it is busy with an instruction, it cannot load data for the next command. If the processor would have to retrieve the data for the next command first, it could not carry out any further instructions and the CPU would be idle.

In order for the processor to be able to use its resources as effectively as possible, information which will most likely be needed next by the processor is loaded into the cache. Thus, the CPU has the data required for executing the next instruction beforehand. The general rule is:

Prefetching mechanisms can retrieve both data and instructions.
The number of clock cycles can be reduced by up to 30% with prefetching.
Prefetching can be utilized in the areas of hardware, software, and compilers.^[1]

Functionality

The calculation of which data or instructions are needed next occurs in hardware prefetching often via algorithms. Modern computer architectures use pipelines for parallel processing of tasks. If one pipeline is busy with an instruction, an additional pipeline can simultaneously retrieve data and instructions.

For example, many programs work with loops in the source code. These loops contain conditions that the processor has to verify first. The branch predictor acts according to the empirical information that loops always contain conditions and preloads the branch instruction and/or the branch target address to prefetch the data stored there in a parallel pipeline. An example:

 for (int i = 0; i <prerequisite; i ++) {
// Precondition is a variable that was first initialized.
    tueA();
}

tueB();

When the processor checks the prerequisite, it is faced with the question of whether it should keep the instructions for “tueA” in the cache or load the instructions for “tueB” into the cache. It uses a heuristic to predict which data will be needed in the intermediate storage. The reliability of the forecast significantly depends on which heuristic has been implemented. Mostly this is done using a special prefetcher module (prefetch input Que; PIQ). The identification of likely needed instructions or data is also known as branch prediction. It is a key factor for improving performance, because the prediction of instructions and memory addresses can also be wrong, even though with modern branch prediction algorithms it adds up to only about 2%.

Practical relevance

If prefetching is connected to branch prediction, the processor attempts to predict which instructions and data will be needed in the future. The processor anticipates the result of the calculation at runtime on the computer and retrieves the data or instructions which the algorithm deems relevant for the subsequent calculation. This may be typical command sequences or program procedures. In boot operations it is known which programs are necessary for the runtime of the system. The relevant instructions or data is loaded into the cache and accessed as needed by the processor to speed up the boot process. The computer starts up faster because the system programs are executed more quickly based on the available data.

However, there are various types of branch predictions. The simplest type works quite statically by predicting the result of an instruction. Since usually more conditions will be associated with the result, the processor can estimate what conditions should be checked next. More complex branch prediction algorithms work dynamically; and partly with a branch history and pattern recognition. Already running command lines or programs are stored in binary form and when retrieved, the processor knows the commands or data that these commands or programs refer to. In the areas of data mining and machine learning, methods are now being used that are based on neural networks and artificial intelligence.

Importance for programming

Chipmakers are constantly working on ways to optimize performance. Hardware prefetching is one of them. Programmers can take advantage of prefetching by adding “prefetching hints” manually in the program sequence. It is necessary for them to know what, when, where and how has to be prefetched. Because prefetching takes up a certain amount of memory resources it is important for the program flow that the correct data is retrieved in the right place.

The principle of prefetching is also applied to websites. Link prefetching can prefetch hyperlinks when it is likely that these links will be clicked by a user. Complicated algorithms do not get used in this context, web analytics statistics on user behavior or surfing history are usually utilized. Based on this data it is then decided by the webmaster how useful it would be to prefetch websites or other resources such as CSS files and media content.

References

↑ Computer Architecture Lecture 24: Prefetching ece.cmu.edu. Accessed on 10/26/2015

Weblinks

[1] Computer Architecture Lecture 24: Prefetching ece.cmu.edu. Accessed on 10/26/2015

[1]