Literature Review On History Of Risc Architecture
Type of paper: Literature Review
Topic: Computers, Instruction, Architecture, Transportation, Pipeline, Design, Control, Software
Pages: 7
Words: 1925
Published: 2020/12/25
INTRODUCTION
This write up gives details about implementation techniques that have been adopted by CPU manufacturers to improve the speed of information processing of CPU especially with the use of techniques as pipelining and the implementation of Reduced Instruction Set Computers (RISC) architecture. Most of the high performance computers make use of these techniques in some form. The details of the RISC architecture where fewer machine language instructions are used, and all instruction formats are kept of the same length in order to simplify the fetching and decoding process of instructions.
The history of the RISC architecture can be traced to the need for more efficient computer systems as far back as 1964. The conception of the architecture of computers in 1964 led to three main branches that evolved with the years; Complex Instruction Set Computing (CISC), RISC and the IBM S/3090 as illustrated in figure 1.
Figure 1: Main branches in development of computer architecture
The CISC development evolved from the PDP-11 and VAX-11 machine architecture as developed by Digital Equipment Corporation (DEC). The branch in the middle of the diagram is the IBM 360/370 line of computers which has a combination of CISC and RISC features. The RISC evolved from the Control Data Corporation (CDC) 6600, Cyber and ultimately CRAY-I super-computer.
The pioneer work at the Thomas J. Watson Research centre at IBM starting in 1975 to develop an emulator for System /36- code resulted in the production of the first prototype in 1980 and the RISC Superscalar RS/6000 was first introduced into the market in 1990. This succeeded in providing a lower cost of carrying out high-performance calculations for scientists and engineers.
The idea of simpler computer to be implemented on a single chip started in the early 1980s and birthed two architectures – the Sun Microsystems’ SPARC and the Stanford University MIPS.
RISC MACHINES CHARACTERISTICS
The motivation for the development of the RISC architecture was based on the need to get better performance from the Complex Instruction Set Computers (CISC) processors. Asides having a simple design, load / store approach and fixed length instructions, RISC processors were designed to exploit the advantages of pipelining in order to optimize the performance of the processors. Pipelining is a technique designed to maximize the efficiency of processors by breaking the tasks to be performed by the processor into smaller subtasks which are then executed in an overlapping manner. The processor before completing the execution of an instruction fetches another instruction to be fetched resulting in a higher number of instructions executed within a specific number of clock cycles compared to when pipelining is not implemented. The concept is referred to as an assembly line in manufacturing. The pipelining technique benefits a great deal from the Harvard architecture which is present in most RISC processors and ensures that instructions are executed in a single clock cycle. Most microprocessors that implement a Harvard architecture do so with smaller on-chip memory arrays that can store segments of program and data that are fetched from and written back to a unified memory structure external to the microprocessor chip [5]. Disruptions to the pipeline are minimized using delay control transfer instructions.
Figure 2: Typical five stage RISC pipeline [2]
The characteristics of RISC architecture with respect to CISC are based on a concept and as such, it is not expected that all processors based on RISC architecture will possess all the features; these characteristics serve as a reference point for all processor architecture classification and identification. “RISC is not a machine but a philosophy of machine design”[1]. With the design paradigm of making the hardware simpler and pushing complexity to the software (the compiler), the main distinguishing features of a RISC architecture are as stated as follows. Instruction fetching is made simple using fixed length instructions and few instruction formats are utilized in order to simplify the process of decoding before execution, efficient pipelining and also to simplify the hardware control unit [2]. Instructions decode logic for a typical RISC microprocessor can be much simpler than for a CISC counterpart, because there are fewer instructions to decode and fewer operand complexities to recognize and coordinate [5]. In order to optimize the processor for speed, asides the pipelined implementation, it uses a hardwired control unit which also takes less chip space for design implementation and simpler than the programmed control unit. The hardwired control unit makes it possible for a large register set for the compiler optimization of the code used for the load / store architecture. The hardwired control also makes it possible to use on-chip cache in order to speed up fetching instructions and minimize the latency incurred in load / store operations.
For the load / store operations of the RISC architecture, the internal operations are actually based on register-register operation since all the operands reside in the General Purpose register (GPR). Restricting the locations of the operands to the GPR only, allows for determinism in the RISC operation [2] where a potential multi-cycle and unpredictable memory reference has been divorced from the operation. Figure 2 below shows pipeline flow of a register-to-register operation. Memory addressing is performed by load and store instructions.
Figure 3: Pipeline flow of a register-to-register operation [2].
Furthermore, since complex addressing modes slow the computer down, few simple addressing modes are employed in RISC architecture. The complexity is taken away from the hardware in order not to affect every program that is run on the machine, and the complexity resides with the compiler. To make execution easier for the compiler, however, the RISC architecture makes use of “three operand instructions which are designed to make the things easier for complier”[1].
Furthermore, the RISC architecture also exploits the principle of temporal locality and spatial locality where it is expected that data that was accessed recently will soon be accessed again in order to speed up execution time. Spatial locality simply refers to a situation where it is expected that the next location in memory to be referenced is close the last referenced location while temporal locality is based on the expectation that a recently referenced location in memory will be referenced next. In exploiting this, the recently referenced memory locations are kept in cache memory for faster access. This principle of locality is based on the observation that a program spends 90% of the time in 10% of the code [2]. Since only a small set of instructions is utilized it was easy to determine the most efficient pipeline organization especially using three classes of instructions namely access to cache, arithmetic / logical operations and branch operations. It is known that not all instructions in a CISC microprocessor are used with the same frequency but there is a core set of instructions that are called most of the time [5]. Those that are used less often the permutations that the decode logic must handle in any given clock cycle. By removing the infrequently used operations the microprocessor’s control logic is simplified and can therefore be made to run faster.
IMPROVING PIPELINED CPU
The processing performance of the hardware is no doubt improved with the use of pipelining technique. Improved pipeline performance can be achieved using some techniques as presented below.
Superpipelined architecture
Pipelining performance is limited on the number of subtasks it is divided into. " A three-stage pipeline can at best yield a speedup approaching a factor of 3; a five-stage pipeline can only approach a 5:1 speed ratio" [1]. In order to obtain an improvement over this limitation, there is the need to increase the number of smaller stages the pipeline is divided into.
Superpipelining entails the use of very high speed pipeline for instruction processing by increasing the pipeline's temporal parallelism. The temporal parallelism is achieved by increasing the number of stages of the pipeline. “Simulation studies have suggested that a pipeline depth of more than 8 stages tends to be counter- productive”[4] which is a drawback for the use of superpipelining. The MIPS R4000 which has eight stage pipeline was one of the first superpipelined processor and was introduced into the market in the year 1991. The MIPS R4000 was a typical superpipelined processor as it had both the advantages and disadvantages of a superpipelined architecture. It has very high clock frequency and low space on the chip as advantages, but deeply pipelined instruction execution branching as a main disadvantage as well as increased branch delays.
Superscalar architectures
The features of superscalar machines include the use of sequential programming model, multiple pipelines and execution of multiple instruction per cycle using instruction set like the von Neumann instruction set. Compared to the superpipelined CPUs where temporal parallelism is increased, superscalar CPUs increase the spatial parallelism that exists in sequential code. Superscalar CPUs have the advantage of higher clock frequency owing to the use of multiple pipelines. Since superscalar CPU improves the limitation of executing an instruction per cycle by using multiple pipelines, the classification is done based on the number of instruction executed at the same time. In order to execute multiple instructions per cycle however, the CPU needs to check for instructions whose execution depends on another instruction. This makes the design more complex and increases the overall chip size. Examples of superscalar CPU are the two-way superscalar DEC’s Alpha 21064 and four-way superscalar DEC’s Alpha 21164 using two-issue and four-issue chip respectively. Some other examples of the four-way superscalar processor are the MIPS R10000, R12000, R14000 and R16000. One major disadvantage of superscalar design however is scheduling the operations that can be concurrently executed and the ones that can be executed sequentially.
Very long instruction word (VLIW) architectures
The problem of scheduling encountered in superscalar processor can be overcome by determining the amount of instruction level parallelism, a task which is made more difficult with the need to calculate it on the fly and within the limitation of hardware. To obtain instruction level parallelism and benefits of superscalar design of executing multiple instructions per cycle requires a complex design of circuit. VLIW was designed to solve this problem of the requirement of complex circuitry on chip before the benefits of superscalar CPUs can be derived. VLIW thus answers the million dollar question of being able to execute multiple instructions without using complex circuitry to analyze the instructions for parallelism.
The VLIW is a fixed-length machine language instruction format found in RISC but with much longer than the 64-bit length maximum found in RISC or CISC architecture. The very long length of each instruction is a result of the fact that each instruction word carries more than one machine operation to be executed simultaneously.
Figure 4: An example of a VLIW format [1]
Some advantages of the very long instruction word format include the elimination of the scheduling task from the hardware thereby improving the hardware efficiency and space for more on-chip cache which is a result of the free space for more functional units. A simplified control logic also translates to higher clock speed than the counterpart superscalar CPU which is based on the same technology as VLIW.
CONCLUSION
This write up has presented in details the characteristics of the Reduced Instruction Set Computers (RISC) architectures which was developed as an improvement over the shortcomings of the Complex Instruction Set Computers (CISC) architecture in order to improve the overall performance of the CPU of computer systems.
The implementation of CISC aims at exploiting some features to achieve overall increase in processing speed and efficiency. The use of pipelining technique where new instructions can be fetched while still executing one ensures that each instruction is executed in a single clock cycle. The RISC architecture also uses a fixed length instruction format for ease of decoding, few instruction format to exploit locality of presence, simple address modes and load /store architecture for ease of memory reference and a hardwired control unit for simplicity of hardware.
Techniques to improve pipelining use in RISC computers were also considered. Techniques such as superpipelining, superscalar architectures and Very Long Instruction Word (VLIW) architectures are employed to improve pipelining.
RISC is an improvement over CISC, newer processors that aim to combine the strong points of the CISC and RISC processors are already being designed and will improve on the current performance of RISC computers.
REFERENCES
[1] Joseph D. Dumas (2006). Computer Architecture: Fundamentals and Principles of Computer Design, CRC Press, Taylor & Francis Group.
[2] Vojin G. Oklobdzija (1999). Reduced instruction set computers [PDF document], Retrievedfrom online website: http://www.ece.ucdavis.edu/~vojin/CLASSES/EEC180B/Fall99/Writings/RISC-Chaptr.PDF
[3] Mark Brehob, Travis Doom, Richard Enbody, William H. Moore, Sherry Q. Moore, Ron Sass, Charles Severance, [n.d], Beyond RISC-The post-RISC architecture, Retrieved from online website:
http://www.cse.msu.edu/~enbody/postrisc/postrisc2.htm
[4] John Morris, (1998). Computer Architecture- The Anatomy of Modern processors, Retrieved from online website:
https://www.cs.auckland.ac.nz/~jmor159/363/html/superpipelined.html
[5] Balch, M. (2003). Complete Digital Design: A comprehensive Guide to Digital Electronics and Computer System Architecture, The McGraw-Hill Companies, Inc., ISBN: 0-07-140927-0.
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA