Types Of Computer Systems : Flynn's Taxonomy Literature Reviews Examples
Type of paper: Literature Review
Topic: Vector, Computers, Processor, Architecture, Array, Vehicles, Processing, Instruction
Pages: 6
Words: 1650
Published: 2021/02/23
1.0 Introduction
Running computers with a single CPU seems not sufficient to satisfy the need for higher performance from computers. Running several CPUs concurrently to solve application tasks will provide an improvement in performance over a single CPU operation. This write up examines the various arrangement and architecture of processors to achieve better performance using special architectures on one part and multiple processors on the other.
The different architectures of processor are based on the taxonomy released in an article by Flynn in 1966. Flynn's taxonomy of computer machines classified computer systems into four categories namely Single Instruction Single Data Stream machines (SISD), Single Instruction Multiple Data Stream (SIMD) machine, Multiple Instruction Single Data Stream machine (MISD) and Multiple Instruction Multiple Data Stream (MIMD) machine.
SISD machines are built around the von Neumann or Harvard architecture are machines that process a single instruction on a single operand at once and are also known as uniprocessor machine. SIMD on the other hand executes a sequential program with instructions for operations that are done in parallel on more than one operand at once. MISD machines are expected to perform different operations on the same operands. MIMD machines execute different instructions on different operands simultaneously.
The different architectures discussed in this write up fall into one of these taxonomic classifications as published by Flynn
1.1 Vector Processors
As the name implies, vector processors are processors that operate on an entire vector when executing an instruction. In the execution of an instruction, the opcode operates on vectors as operand instead of a single element operand. The instruction set of the CPU is designed to perform mathematical operations on multiple data sets simultaneously. This arrangement is different from a scalar processor that handles a single element at a time with multiple instructions. According to Flynn's taxonomy, a single vector processor is a single instruction multiple data stream machine.
Figure 1: Architecture of a Vector processor
The architecture of a vector processor is shown in figure 1. The vector registers which are first-in-first-out (FIFO) queues hold the data and can hold between 50 and 100 floating point values. Memory references and computations on the vector processor are overlapped in order to achieve multiple fold increase in their actions. The instruction set is such that a vector register is loaded from a location in memory and operations are performed on the elements located on the vector registers. The results of the operations performed are then stored back into the memory from the vector registers.
The vector instructions available to the processor depends on the components contained in the processor. For the architecture in figure 1, the components include Floating Point (FP) Multiply, Floating Point Add / Subtract, Floating point divide and integer and logical components. The execution time of vector instructions depends on the length of the operand, data dependencies and structural hazards. Since vector processor functional units consume one element per clock cycle, the execution time is approximately the length of the vector.
Some terms used in the description of a vector processor include initiation rate, vector start-up time, chime and convoy. Initiation rate is the rate at which operands are consumed and new results produced. Convoy is a set of vector instructions that could be executed together in one clock cycle and must be completely executed before new instructions can begin execution.
Chime is a measure of the timing for a vector sequence as a vector sequence of n convoys executes in n chimes. The overhead to start the execution of vector instructions is the vector start-up time. Some terms used in the description of a vector processor include initiation rate, vector start-up time, chime and convoy. Initiation rate is the rate at which operands are consumed and new results produced. Convoy is a set of vector instructions that could be executed together in one clock cycle and must be completely executed before new instructions can begin execution.
Chime is a measure of the timing for a vector sequence as a vector sequence of n convoys executes in n chimes. The overhead to start the execution of vector instructions is the vector start-up time.
The architecture in figure 1 is based on the Cray-1 supercomputer. Each of the registers holds a 64-element vector with 64 bits per element and the register file consists of 8 write ports and 16 read ports. The vector functional units are fully pipelined with detection for data and control hazards. In the vector load-store unit, words move between registers one word per clock cycle. The scalar registers consists of 32 general purpose registers and 32 floating-point registers.
Some vector instructions are as shown below;
LV: vector load from address
ADDVV.D: add two vectors
ADDVS.D: add vector to a scalar
SV: vector store to address
1.2 Array Processors
A vector processor operates on vector operands which is a single array. An array processor is a processor with the capability of processing array elements. Array processors carry out a single instruction on multiple execution units in the same clock cycle with the different execution units having the same instruction using same set of vectors in the array. Vector processors are suitable for scientific calculations especially ones involving two dimensional matrices.
Figure 2: An example of Array processor machine
Array processors make use of many processing elements (PE) to carry ot the instructions on the multiple operands that are structured as multidimensional arrays. Each of the processing elements operate on an element of the operand array and the activities of the processing elements are coordinated by a single control unit.
Owing to the singular control unit used in the coordination of the actions of each of the processing units, the processing units operate in a lockstep synchronized manner, performing at the same time the same computation on different elements of the array operand.
Array processors fall under the categorization of single instruction multiple data stream machine according to Flynn's taxonomic classification of computer systems. The SIMD machines under Flynn's taxonomy fall under two classification based on the memory system used - shared memory or distributed memory computers. The shared memory architecture used a single globally accessible memory that is used while each processing element has its own local memory unit in the distributed memory architecture.
Figure 3: The principle of a SIMD machine
The fundamental operation principle of a SIMD machine is as shown in figure 3. This arrangement is also known as a processor array. The processing element interconnection network is designed to be flexible in order to provide free choice of source and destination of the data read into the processing element and the data stored in memory from the processing element.
The processing elements can be arranged in more than one topology depending on the requirements of the applications for which the architecture is designed. some of these topologies are examined;
Mesh connected arrays: this is the most common topology of the SIMD processor array connected in two dimensional mesh. The first implemented processor arrays supercomputer was the ILLIAC IV as designed by Daniel Slotnik at the University of Illinois. The original design was to have 4 quadrants of 64 processing elements each but only one quadrant was implemented. The topology of the IILIAC IV as shown in figure 4 shows an arrangement of the processing elements in a two dimensional mesh that is effectively described as a one-dimensional nearest neighbour structure because of the modified horizontal wrap-around connections.
Figure 4: IILIAC IV topology
Distributed Array Processor (DAP) is another topology of the processor array manufactured in 32 by 32 and 64 by 64 processing element arrays. Another similar topology was used by the Massive Parallel Processor (MPP) that had 128 by 128 processing element array.
SIMD array processors have a wide range of applications owing to the parallel architecture that scales well based on the size of the chip feature.
1.3 Multiprocessor systems
In a multiprocessor architecture, the memory system is a shared global memory module that is addressable by each of the processor. The organization of the processors is either loosely coupled or tightly coupled.
The central memory system, which can be a single large memory module or a set of memory modules, in a tightly coupled multiprocessor system provides the same access time for each of the processors. In addition to the global memory, each of the processors may also maintain a memory cache that helps to make the system more efficient by reducing memory contention - a situation where there are memory access delays caused by many processors making access request to memory within a very short period of time. This memory contention is further encouraged by the architecture that used multiple memory modules for the main memory.
In loosely coupled architecture however, the memory system is partitioned between the processors in a manner that each processor has a local memory as against sharing a global memory in tightly coupled system. Each processor has a direct access to its own local memory and the local memories of all the other processors which are remote to it.
The key factors that determining the success or otherwise of multiprocessor computers hinge on the choice of the interconnection network of the processors and updating of the multiple caches from the cache of each of the processors. Some of the interconnection networks in use include shared bus, multiple bus crossbar switch and ring interconnection networks.
Shared bus interconnection network: the processors and memory are connected to a single bus line. A single line called the system link and interrupt controller (SLIC) is connected parallel with the system bus to send interrupts and other low-priority communications between the units connected to the system bus. The architecture of a shared bus multiprocessor architecture is shown in figure 5.
Figure 5: Shared bus multiprocessor architecture
Multiple bus connection: For a multiple bus architecture, the components of the system are connected to multiple bus lines either in a one dimensional, two dimensional or three dimensional manner.
Figure 6: Multiple bus interconnection network
REFERENCES
1) Hennessy, J.L. and Patterson, D.A., 1990. "Computer Architecture, A Quantitative Approach", Morgan Kaufmann.
2) Kyo, S., Okazaki, S. and Arai, T., 2005. "An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems", IEEE.
2) Brin, S. and Page, L., 1998. "The Anatomy of a Large-Scale Hypertextual Web Search Engine", Proceedings of the Seventh International Conference on World Wide Web, pp. 107–117.
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA