Computer Organization and Architecture | Pipelining | Set 1 (Execution, Stages and Throughput)
Last Updated :
13 Sep, 2024
Pipelining is a technique used in modern processors to improve performance by executing multiple instructions simultaneously. It breaks down the execution of instructions into several stages, where each stage completes a part of the instruction. These stages can overlap, allowing the processor to work on different instructions at various stages of completion, similar to an assembly line in manufacturing.
In this article, you will get a detailed overview of Pipeline in Computer Organization and Architecture.
What is Pipelining?
Pipelining is an arrangement of the CPU’s hardware components to raise the CPU’s general performance. In a pipelined processor, procedures called ‘stages’ are accomplished in parallel, and the execution of more than one line of instruction occurs. Now let us look at a real-life example that should operate based on the pipelined operation concept. Consider a water bottle packaging plant. For this case, let there be 3 processes that a bottle should go through, ensing the bottle(I), Filling water in the bottle(F), Sealing the bottle(S).
It will be helpful for us to label these stages as stage 1, stage 2, and stage 3. Let each stage take 1 minute to complete its operation. Now, in a non-pipelined operation, a bottle is first inserted in the plant, and after 1 minute it is moved to stage 2 where water is filled. Now, in stage 1 nothing is happening. Likewise, when the bottle is in stage 3 both stage 1 and stage 2 are inactive. But in pipelined operation, when the bottle is in stage 2, the bottle in stage 1 can be reloaded. In the same way, during the bottle 3 there could be one bottle in the 1st and 2nd stage accordingly. Therefore at the end of stage 3, we receive a new bottle for every minute. Hence, the average time taken to manufacture 1 bottle is:
Therefore, the average time intervals of manufacturing each bottle is:
Without pipelining = 9/3 minutes = 3m
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)
With pipelining = 5/3 minutes = 1.67m
I F S | |
| I F S |
| | I F S (5 minutes)
Thus, pipelined operation increases the efficiency of a system.
Design of a basic Pipeline
- In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are multiple stages/segments such that the output of one stage is connected to the input of the next stage and each stage performs a specific operation.
- Interface registers are used to hold the intermediate output between two stages. These interface registers are also called latch or buffer.
- All the stages in the pipeline along with the interface registers are controlled by a common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined processor can be visualized using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be executed. We can visualize the execution sequence through the following space-time diagrams:
Non-Overlapped Execution
Stage / Cycle |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
S1 |
I1 |
|
|
|
I2 |
|
|
|
S2 |
|
I1 |
|
|
|
I2 |
|
|
S3 |
|
|
I1 |
|
|
|
I2 |
|
S4 |
|
|
|
I1 |
|
|
|
I2 |
Total time = 8 Cycle
Overlapped Execution
Stage / Cycle |
1 |
2 |
3 |
4 |
5 |
S1 |
I1 |
I2 |
|
|
|
S2 |
|
I1 |
I2 |
|
|
S3 |
|
|
I1 |
I2 |
|
S4 |
|
|
|
I1 |
I2 |
Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set. Following are the 5 stages of the RISC pipeline with their respective operations:
- Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions from the address present in the memory location whose value is stored in the program counter.
- Stage 2 (Instruction Decode): In this stage, the instruction is decoded and register file is accessed to obtain the values of registers used in the instruction.
- Stage 3 (Instruction Execute): In this stage some of activities are done such as ALU operations.
- Stage 4 (Memory Access): In this stage, memory operands are read and written from/to the memory that is present in the instruction.
- Stage 5 (Write Back): In this stage, computed/fetched value is written back to the register present in the instructions.
Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’ tasks are executed on the same processor is:
S = Performance of non-pipelined processor /
Performance of pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k
S = n * k / n
S = k
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput = Number of instructions / Total time to complete the instructions So, Throughput = n / (k + n – 1) * Tp Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1 Please see Set 2 for Dependencies and Data Hazard and Set 3 for Types of pipeline and Stalling.
Performance of pipeline is measured using two main metrices as Throughput and latency.
What is Throughout?
- It measure number of instruction completed per unit time.
- It represents overall processing speed of pipeline.
- Higher throughput indicate processing speed of pipeline.
- Calculated as, throughput= number of instruction executed/ execution time.
- It can be affected by pipeline length, clock frequency. efficiency of instruction execution and presence of pipeline hazards or stalls.
What is Latenecy?
- It measure time taken for a single instruction to complete its execution.
- It represents delay or time it takes for an instruction to pass through pipeline stages.
- Lower latency indicates better performance .
- It is calculated as, Latency= Execution time/ Number of instruction executed.
- It in influenced by pipeline length, depth, clock cycle time, instruction dependencies and pipeline hazards.
Advantages of Pipelining
- Increased Throughput: Pipelining enhance the throughput capacity of a CPU and enables a number of instruction to be processed at the same time at different stages. This leads to the improvement of the amount of instructions accomplished in a given period of time, thus improving the efficiency of the processor.
- Improved CPU Utilization: From superimposing of instructions, pipelining helps to ensure that different sections of the CPU are useful. This gives no time for idling of the various segments of the pipeline and optimally utilizes hardware resources.
- Higher Instruction Throughput: Pipelining occurring because when one particular instruction is in the execution stage it is possible for other instructions to be at varying stages of fetch, decode, execute, memory access, and write-back. In this manner there is concurrent processing going on and the CPU is able to process more number of instructions in a given time frame than in non pipelined processors.
- Better Performance for Repeated Tasks: Pipelining is particularly effective when all the tasks are accompanied by repetitive instructions, because the use of the pipeline shortens the amount of time each task takes to complete.
- Scalability: Pipelining is RSVP implemented in different types of processors hence it is scalable from simple CPU’s to an advanced multi-core processor.
Disadvantages of Pipelining
- Pipeline Hazards: Pipelining may result to data hazards whereby instructions depends on other instructions; control hazards, which arise due to branch instructions; and structural hazards whereby there are inadequate hardware facilities. Some of these hazards may lead to delays hence tough strategies to manage them to ensure progress is made.
- Increased Complexity: Pipelining enhances the complexity of processor design as well as its application as compared to non-pipelined structures. Pipelining stages management, dealing with the risks and correct instruction sequence contribute to the design and control considerations.
- Stall Cycles: When risks are present, pipeline stalls or bubbles can be brought about, and this produces idle times in certain stages in the pipeline. These stalls can actually remove some of the cycles acquired by pipelining, thus reducing the latter’s efficiency.
- Instruction Latency: While pipelining increases the throughput of instructions the delay of each instruction may not necessarily be reduced. Every instruction must still go through all the pipeline stages and the time it takes for a single instruction to execute can neither reduce nor decrease significantly due to overheads.
- Hardware Overhead: It increases the complexity in designing the pipelining due to the presence of pipeline registers and the control logic used in managing the pipe stages and the data. This not only increases the cost of the wares but also forces integration of more complicated, and thus costly, hardware.
Conclusion
Pipelining is one of the most essential concepts and it improves CPU’s capability to process several instructions at the same time across various stages. It increases immensely the system’s throughput and overall efficiency by effectively determining the optimum use of hardware. On its own it enhances the processing speed but handling of pipeline hazards is critical for enhancing efficiency. It is thus crucial for any architect developing systems that will support HPC to have a war chest of efficient pipelining strategies that they can implement.
Frequently Asked Questions on Pipelining |(Execution, Stages and Throughput)
What are the benefits of Pipelining?
Pipelining enhances CPU’s ability to streamline instruction processing and at the same time enhance the level of speed that characterizes a CPU.
What are pipeline hazards?
Other pipeline impediments include data and control conflicts and structural conflicts that will affect the normal flow of instruction execution with potential for stalling.
How does pipelining affect latency and throughput?
Pipelining increase the number of instruction completed per clock cycle because an executing instruction is always separated into stages. However, this causes a problem in the latency since every instruction will pass through all stages.
What is the difference between throughput and latency?
By throughput time is understood how many instructions are performed in a time interval while by latency is meant the time it takes to perform an instruction.
Similar Reads
Computer Organization and Architecture | Pipelining | Set 3 (Types and Stalling)
Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 2 for Dependencies and Data Hazard. Types of pipeline Uniform delay pipeline In this type of pipeline, all the stages will take same time to complete an operation. In uniform delay pipeline, Cycle Time (Tp) = Stage Delay If buffers are included between the stages then, Cycl
3 min read
Computer Organization and Architecture | Pipelining | Set 2 (Dependencies and Data Hazard)
Please see Set 1 for Execution, Stages and Performance (Throughput) and Set 3 for Types of Pipeline and Stalling. Dependencies in a pipelined processor There are mainly three types of dependencies possible in a pipelined processor. These are : 1) Structural Dependency 2) Control Dependency 3) Data Dependency These dependencies may introduce stalls
6 min read
Pipelining vs Non-Pipelining
What is Pipelining Pipelining is accumulating the instructions from the processor through a pipeline or a data pipeline. A Pipeline is a set of data processing units arranged in series such that the output of one element is the input of the subsequent element. Pipelining is a technique in which multiple instructions are overlapped during execution.
2 min read
Differences between Computer Architecture and Computer Organization
It is very important for any person who is interested or has a leaning towards computer science or anybody who has to work with computers to understand the difference between computer architecture and computer organization. What needs to be understood is that although these two concepts are interconnected they are different, one deals with the smal
5 min read
Memory Stack Organization in Computer Architecture
A stack is a storage device in which the information or item stored last is retrieved first. Basically, a computer system follows a memory stack organization, and here we will look at how it works. A portion of memory is assigned to a stack operation to implement the stack in the CPU. Here the processor register is used as a Stack Pointer (SP). The
4 min read
Stages of Multi-threaded Architecture in OS
Prerequisite - Multi-threaded Architectures The implementation of threads in the multithreaded model is divided into various stages, each of which performs a unique function. The various execution stages of every thread and the relationship between every thread are shown as follows: 1. Continuation Stage: (i) Once a thread is initiated by its prede
5 min read
Computer Organization | Basic Computer Instructions
Computer organization refers to the way in which the components of a computer system are organized and interconnected to perform specific tasks. One of the most fundamental aspects of computer organization is the set of basic computer instructions that the system can execute. Basic Computer InstructionsBasic computer instructions are the elementary
7 min read
Performance of Computer in Computer Organization
In computer organization, performance refers to the speed and efficiency at which a computer system can execute tasks and process data. A high-performing computer system is one that can perform tasks quickly and efficiently while minimizing the amount of time and resources required to complete these tasks. Here are several factors that can impact t
6 min read
Computer Organization | Amdahl's law and its proof
It is named after computer scientist Gene Amdahl( a computer architect from IBM and Amdahl corporation) and was presented at the AFIPS Spring Joint Computer Conference in 1967. It is also known as Amdahl's argument. It is a formula that gives the theoretical speedup in latency of the execution of a task at a fixed workload that can be expected of a
6 min read
RISC and CISC in Computer Organization
RISC is the way to make hardware simpler whereas CISC is the single instruction that handles multiple work. In this article, we are going to discuss RISC and CISC in detail as well as the Difference between RISC and CISC, Let's proceed with RISC first. Reduced Instruction Set Architecture (RISC) The main idea behind this is to simplify hardware by
5 min read
Computer Organization | Instruction Formats (Zero, One, Two and Three Address Instruction)
In computer organization, instruction formats refer to the way instructions are encoded and represented in machine language. There are several types of instruction formats, including zero, one, two, and three-address instructions. Each type of instruction format has its own advantages and disadvantages in terms of code size, execution time, and fle
11 min read
Computer Organization | Different Instruction Cycles
Introduction : Prerequisite - Execution, Stages and Throughput Registers Involved In Each Instruction Cycle: Memory address registers(MAR) : It is connected to the address lines of the system bus. It specifies the address in memory for a read or write operation.Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It co
11 min read
Computer Organization | Asynchronous input output synchronization
Introduction : Asynchronous input/output (I/O) synchronization is a technique used in computer organization to manage the transfer of data between the central processing unit (CPU) and external devices. In asynchronous I/O synchronization, data transfer occurs at an unpredictable rate, with no fixed timing or synchronization between the CPU and ext
7 min read
MPU Communication in Computer Organization
MPU communicates with the outside world with the help of some external devices which are known as Input/Output devices. The MPU accepts the binary data from input devices such as keyboard and analog/digital converters and sends data to output devices such as printers and LEDs. For performing this task, MPU first need to identify the input/output de
4 min read
Purpose of an Interrupt in Computer Organization
Interrupt is the mechanism by which modules like I/O or memory may interrupt the normal processing by CPU. It may be either clicking a mouse, dragging a cursor, printing a document etc the case where interrupt is getting generated. Why we require Interrupt? External devices are comparatively slower than CPU. So if there is no interrupt CPU would wa
2 min read
Peripherals Devices in Computer Organization
Generally peripheral devices, however, are not essential for the computer to perform its basic tasks, they can be thought of as an enhancement to the user's experience. A peripheral device is a device that is connected to a computer system but is not part of the core computer system architecture. Generally, more people use the term peripheral more
4 min read
Data Manipulation Instructions in Computer Organization
Data Manipulation Instructions Data manipulation instructions perform operations on data and provide computational capabilities for the computer. The data manipulation instructions in a typical computer are usually divided into three basic types as follows. Arithmetic instructionsLogical and bit manipulation instructionsShift instructions Let's dis
3 min read
Control Logic Gates in Computer Organization
Control Logic Gates is the hardware component of a basic computer. It comprises some inputs and outputs. The diagram given below is related to the hardwired control organization. The diagram below consists of the instruction register which has three parts: the I bit (15), the operation code (12, 13, 14), and bits 0 through 11. The symbols as D0 to
2 min read
Stack Frame in Computer Organization
Stack Frame :Stack is one of the segments of application memory that is used to store the local variables, function calls of the function. Whenever there is a function call in our program the memory to the local variables and other function calls or subroutines get stored in the stack frame. Each function gets its own stack frame in the stack segme
3 min read
Synchronous Data Transfer in Computer Organization
In Synchronous Data Transfer, the sending and receiving units are enabled with the same clock signal. It is possible between two units when each of them knows the behaviour of the other. The master performs a sequence of instructions for data transfer in a predefined order. All these actions are synchronized with the common clock. The master is des
4 min read
BUS Arbitration in Computer Organization
Introduction : In a computer system, multiple devices, such as the CPU, memory, and I/O controllers, are connected to a common communication pathway, known as a bus. In order to transfer data between these devices, they need to have access to the bus. Bus arbitration is the process of resolving conflicts that arise when multiple devices attempt to
7 min read
Computer Organization | Booth's Algorithm
Booth algorithm gives a procedure for multiplying binary integers in signed 2’s complement representation in efficient way, i.e., less number of additions/subtractions required. It operates on the fact that strings of 0’s in the multiplier require no addition but just shifting and a string of 1’s in the multiplier from bit weight 2^k to weight 2^m
7 min read
Computer Organization | Micro-Operation
In computer organization, a micro-operation refers to the smallest tasks performed by the CPU's control unit. These micro-operations helps to execute complex instructions. They involve simple tasks like moving data between registers, performing arithmetic calculations, or executing logic operations. Each micro-operation is completed in a single clo
3 min read
Cache Memory in Computer Organization
Cache memory is a small, high-speed storage area in a computer. The cache is a smaller and faster memory that stores copies of the data from frequently used main memory locations. There are various independent caches in a CPU, which store instructions and data. The most important use of cache memory is that it is used to reduce the average time to
8 min read
Computer Organization | Hardwired v/s Micro-programmed Control Unit
Introduction : In computer architecture, the control unit is responsible for directing the flow of data and instructions within the CPU. There are two main approaches to implementing a control unit: hardwired and micro-programmed. A hardwired control unit is a control unit that uses a fixed set of logic gates and circuits to execute instructions. T
5 min read
68000 Family Registers and Addressing In Computer Architecture
The 68000 processor is characterized by a 16-bit external word length as the processor chip has 16 data pins for connection to the memory. However, data are manipulated inside the processor in registers that contain the 32 bits. Other models for this family are 68020, 68030, and 68040 processors., which come in both large chip packages and have 32-
5 min read
Memory Organisation in Computer Architecture
The memory is organized in the form of a cell, each cell is able to be identified with a unique number called address. Each cell is able to recognize control signals such as “read” and “write”, generated by CPU when it wants to read or write address. Whenever CPU executes the program there is a need to transfer the instruction from the memory to CP
2 min read
Co-Processor | Computer Architecture
Introduction :If in microprocessor chip, new circuitry can be added with special purpose to perform special tasks or to perform operations on numbers in order to offload the work of the core CPU. The CPU can then work faster. We may use a conveyor belt to do some extra work while motor is running. So , the motor is more effectively utilized. Simila
4 min read
Handler's Classification in Computer Architecture
In 1977, Wolfgang Handler presented a computer architectural classification scheme for determining the degree of parallelism and pipelining built into the computer system hardware. Parallel systems are complicated to the program as compared to the single processor system because parallel system architecture varies according to the multiple CPUs and
3 min read
Computer Architecture | Flynn's taxonomy
Parallel computing is a computing where the jobs are broken into discrete parts that can be executed concurrently. Each part is further broken down to a series of instructions. Instructions from each part execute simultaneously on different CPUs. Parallel systems deal with the simultaneous use of multiple computer resources that can include a singl
4 min read