Access patterns, the statistical behavior of a sequence of memory operations 248—253
Access sequence, the sequence of memory addresses produced during the execution of a program 26—28
Access, a memory operation that is either a READ or a WRITE 23
Action at a distance, a physical force exerted at a point due to the influence of a remote source of the force 196
Address generation, during the execution of an instruction, the cycle in which an effective address is calculated by means of indexing or indirect addressing 105
Address mapper, the device that transforms a virtual address to a physical (real) address 70—71 74—81 165—167 see mapping"
Address trace, a recorded sequence of the memory addresses visited during the execution of a program 39—58
Address-reference stream, the sequence of memory addresses accessed during the execution of a program 40—41 see
Aho, A.E. 369
Algol 60 347
Algorithm (interaction with architecture) 17—19
Alignment network, a network that selects a subset of items read simultaneously from memory and permutes them to permit them to be manipulated in parallel 257—259
Allen, F. 375
Allen, J.R. 375 377
Amdahl Corporation 39 168
Amdahl, G.M. 19—20
AND, a boolean operation 203
Archibald, J. 326
Architecture see "Computer architecture"
Arithmetic pipeline, a multistage arithmetic unit that is capable of starting a new operation while one or more operations are currently in execution, with the time interval between successive outputs less than the total time required to produce a single output 236—239
Array processor, a parallel computer, usually with near-neighbor connections between processors and capable of executing a single stream of instructions broadcast simultaneously to all processors 118—121
Artificial Intelligence, the study of computational techniques for solving difficult problems for which human-like approaches are required in their solutions 12
Assignment problem, a combinatorial problem whose solution assigns N tasks to N workers such that each worker is assigned a single task and such that the sum of the values of the worker-task assignments is maximized 370
Associative access, a memory access in which the access is made to an item whose key matches an access key as distinct from an access to an item at a specific address in memory 32 see
Associative memory, a memory whose contents are accessed by key rather than by address 32
Atlas computer 25—26 70
Attached vector-processor, a processor specialized for vector computations that is designed to be connected to a general-purpose host processor, which supplies input/output functions, a file system, and other aspects of a computing system environment 261—266
Auxiliary memory, a bulk memory that is usually large, slow, and inexpensive, often a rotating magnetic or optical memory, whose main function is to store large volumes of data and programs that are not currently being accessed by a processor 72—73 85 90—94
Baer, J.L. 20 326
Balance (of a computer system's components), a state in which the processor bandwidth matches closely the bandwidths of the memory, interconnection network, and input/output system so that no specific component strongly limits the system throughput 340
Bandwidth of communication system 185 191 273 300—301 312—317 384
Bandwidth of input/output system 185 191 273 384
Bandwidth of memory 22 176 184—185 191 235 245 255 273 383—384
Bandwidth of processor 176 185 191 235 272—273 383
Bandwidth, the number of bits per second that can be processed by a memory, arithmetic unit, input/output processor or communication system 116 163—164
Bank (of memory), a module of memory that can sustain a single access to one physical cell of memory per memory cycle 116
Barrier synchronization, a means for synchronizing a set of processors in a multiprocessor system by halting processors in that set at a specified barrier point in a program until every processor in the set reaches the barrier 321 336—338 379 381 387
Base address (of a page), the physical address of the start of a page 75—76
Batcher, K.E. 213 223
BBN Butterfly 225—226
Beetem, J. 269
Belady, L. 52 84
Bell, C.G. 20
Benes network, a switching network proposed by V. Benes that is capable of producing an arbitrary permutation of its inputs at its outputs 269
Benes, V. 269
Berkeley RISC 168 170—171
Bidiagonal system of equations, a linear system in which the only nonzero coefficients lie on the major diagonal and on one diagonal immediately below or above the major diagonal 207
Binary search, a search algorithm in which the region to be searched shrinks by half at each step 367
Binomial distribution, the probability distribution that describes independent tosses of a fair coin 62 64
Bitonic sequence, a sequence of numbers that is the concatenation of an ascending and a descending sequence, or is a cyclic shift of such a sequence 223 232
Bitonic sorter, a sorting network whose subnetworks sort bitonic subsequences into fully sorted subsequences 223—225 232
Block (of a cache) 31 see
Bolt, Beranek, and Newman see "BBN"
Booth's algorithm, an efficient algorithm for integer multiplication 171
Booth, A.D. 171
Bottleneck 21 102 298 300—302 309 313—316 340—341 344 363 383
Branch prediction, the use of history, statistical methods, or heuristic rules to predict the outcome of conditional branches 152—153 155
Branch-and-bound search, a search technique in which the search eliminates large numbers of cases by determining that the solutions eliminated fall above a computed bound 369—373
Branch-history table, a hardware device that saves the recent history of conditional branches so that this information can be used for branch prediction 151 153—155
Breakeven point, the number of processors in a multiprocessor system whose combined throughput is equal to a single processor of the same power 296
Briggs, F.A. 326 358
Broadcast, a form of communication in which one transmitter sends one message simultaneously to many receivers 199—200 326 328
Budnik, P.P. 257
Buffer effects (in virtual memory), a phenomenon that causes a fraction of real memory to serve as a buffer for pages flowing to and from auxiliary memory 90—94
Buneman, O. 206
Burks, A.W. 103
Burroughs' B5700 172
Burroughs' B6700 172
Burroughs' Scientific Processor (BSP) 257—259 277
Bus (interconnection), an interconnection in which all transmitters and receivers are directly connected to a common set of interconnection lines that comprise the bus 299—303 309 323 331 384
Butterfly operation, the core operation of a Fast Fourier Transform that consists of forming the weighted sum and difference of two operands 312—314
Buzbee, B.L. 206
C.mmp multiprocessor 309
Cable density 226
Cache coherence, the state that exists when all caches within a multiprocessor have identical values for any shared variable that is simultaneously in two or more caches 324—383
Cache directory, the collection of tags in a cache that are used for associative access to cached data 32 67—68
Cache for data 116
Cache for instructions 116
Cache hit, a cache access that successfully finds in the cache the data requested 31 54—58
Cache miss, a cache access that fails to find in the cache the data requested 30 36—37 45—52 54—66 96 116
Cache, a small capacity, high-speed buffer memory 22 29—68 115 165—168 245 247 298—299 322—323
Cache, coherence 324—329 383
Cache, design of 94 98—99
Cache, miss ratio 80
Cache, replacement policy 43 52—58
Cache, set of data in 32—35
Cache, simulation of 41
Cache, structure of 29—39
Cache, tag (in directory) 31 35 41 47 57
Cache, techniques for analysis of 39—52
Cache, two-level 39
Cache, vector operands stored in 276—277
Cache, writing to 66—69
Cache-reload transient, the cache misses that occur when a program formerly in execution is restarted after other programs have used the cache 58—66
Carnegie — Mellon University 309
CDC 6600 103 110—112 114—116 161 163—164 172
CDC STAR 124 240 246 267 271
Central Limit Theorem, the theorem that states that the distribution of the sum of identical and independently distributed random variables asymptotically approaches a normal distribution 290
Chaining (of computations), the technique in which an output stream of vector results is directed to the input of another vector operation without being returned to intermediate storage between operations 125
Charlesworth, A.E. 262 264
Checkerboard ordering (for a mesh calculation), an ordering of operations in which an iterative calculation is performed first on the "Red" nodes and then on the "Black" nodes in the mesh 379—381
Chen, P.Y. 315
Chen, T.C. 117 122
Chickens 298
Chu, W.W. 87—88
Chunksize, the number of iterations to be grouped together as a single task in order to increase task granularity 341—345 375 379 386
CISC (Complex Instruction-Set Computer), a computer with an instruction set that includes complex (multicycle) instructions 168—171
Clark, D.W. 80
Coarse-grain parallelism, parallel execution in which the amount of computation per task is several times larger than the overhead and communication expended per task 283—297
Cocke John 168
Coffman, E.G., Jr. 41 86 90
Coherence (of cache) see "Cache coherence"
Collision vector, a binary control-vector whose bits indicate when an operation can be initiated safely in a pipeline computer 132 134—138 141 174
Collision, an event in which two or more different operations require the use of the same pipeline stage at the same clock cycle 134
Column access, a concurrent memory access to all elements of a column of a matrix 251—255 274—275 331
Colwell, R.P. 171
Combining switch, a switching element of an interconnection network that has the ability to combine certain types of requests into one request, and to produce a response that mimics serial execution of the requests 310—312 318—324 339 384
Common data-bus, a hardware mechanism for transmitting results produced by a collection of arithmetic units to machine registers and reservation stations 162—163
Communication cost 283—299
Compare-and-Swap, an instruction that is used for processor synchronization 348 355—362 378 387
Compatibility 19
compiler optimization 374—382
Complex instruction-set computer see "CISC"
Computer architecture and technology 2
Computer architecture, cost of 8—9
Computer architecture, evaluation of 8—9 19
Computer architecture, performance of 8—9
Computer architecture, special purpose vs. all purpose 13
Computer architecture, textbooks 20
Computer architecture, the study of computer structures, their applications, and their performance 1—2 13
Computer vision 12
Concert multiprocessor 346
Conditional branch in a pipeline 150—155
Conditional branch, a computer instruction that alters the sequence of execution if a condition is true, and otherwise falls through to the next instruction in sequence 107 113 176 230—231
Confidence interval, an interval based on statistical sampling that shows where a population of random variables lies to within a specified level of confidence 49
Conflict in a network 312—317
Conflict in a pipeline 113—115 142—148 156
Conflict, a situation in which two or more operations require the same resource, forcing one operation to wait for the other to complete 113—115 142—148 156 312—317 see "READ/WRITE "WRITE/READ "WRITE/WRITE
Connection Machine 279 284 324
Contention, interference among tasks caused by tasks competing for shared resources, thereby forcing one or more tasks to become idle momentarily while waiting for resources to become available 306—307 311—315 342 384
Context switch, the process of saving the state of one task and restoring the state of a second task to enable a computer system to change execution from one program to another 81
Continuum model, a model of physical systems in which continuous quantities are modeled at discrete points and physical interactions are modeled as interactions among neighboring mesh points 180—184 186—209 227—229 233 274
Cooley, J.W. 228
Coonen, J.T. 172
Cosmic Cube 194—195 202 210 228—229 231 324
Cost 4—11
Cost of development 5
Cost per-unit 5
Cost-performance ratio 10—11 15—16 235 296
Cray I 125 244—247 259 271 282 284
| Cray II 125—126 261
Cray XMP 282 284
Critical section, a section of a program that can be executed by at most one process at a time 308 311 318 320 347—365
Crossbar (interconnection), an interconnection in which each input is connected to each output through a path that contains a single switching node 305—310 313 323 331
Crosspoint, a switching node in a crossbar that connects a single input to a single output 306
Crowther, W. 225
Cvetanovic, Z. 312 314 316 344
Cycle (in reduced state-diagram), a path in a reduced state-diagram that specifies a steady-state schedule for introducing operations to a pipeline 140—141 174
Cycle (of computer clock), an electronic signal that counts a single unit of time within a computer 16 151 236—239 254 343—344
Cycle time, effective 38
Cycle time, the length of a single cycle of a computer function such as a memory cycle or processor cycle 24 29 31
Cyclic reduction, an algorithm used to solve linear systems that have a particular structure 208—209 229—230
Cytron, R.G. 375 377 382
Data cache, a cache that holds data, but does not hold instructions 116
Data flow (analysis of requirements), the sequence of processes and data transmissions that are performed on a collection of data during a computation 195—198
Database system 12
Davidson, E.S. 127 135 140
de Kooning, W. 3—4
Dead line, a line of a cache that will be discarded from cache before it will be the target of a cache access 55
Deadlock, the state in which two or more processes are deferred indefinitely because each process is awaiting another process to make progress, and no process is able to make progress 308 387
DEC PDP-11 19 309
DEC VAX 19 77 80—81 83—84 168 171
Decode-history table, a small cache-like memory that saves the recent history of decoding information for conditional-branch instructions so that this information can be used by a branch prediction mechanism 154—155
Decrement (for synchronization) 352—355 387
Delay (in pipeline), a logic device used to store and synchronize data in a pipeline 142—148 242—244
Delayed branch, a branch instruction that defers altering the flow of control until one more instructions that follow it have completed execution 151—152
Denneau, M. 269
Denning, P.J. 41 85—86 90
Dependency analysis, an analysis that reveals which portions of a program depend on the prior completion of other portions of the program 375—377 385—386
DEQUEUE, a high-level function that removes an item from a queue 351—352 355 358 360 363—365 387
Development cost 5
Dias, D.M. 315
Digital Equipment Corporation see "DEC"
Dijkstra, E.M. 347 370 374 387
Direct mapping, a cache that has a set associativity of one so that each item has a unique place in the cache at which it can be stored 34
Directory (of a cache), the portion of a cache that holds the access keys that support associative access 32 56 see tag"
Disk buffer, a high-speed buffer memory resident within a disk controller that is used as a private cache for the disk system 92—94 97—98
Disk cache see "Disk buffer"
Disk memory 90—94 see
Division 174
do par, a program statement that permits the iterations of a loop to be executed in parallel 335—336 345 386
do seq, a program statement that forces the iterations of a loop to be executed sequentially 335—336 345
Dubois, M. 326
ECL (emitter-coupled logic) 15
Efficiency of array computer 118—120
Efficiency of multiprocessor computer 280—299 309 338—341
Efficiency of pipeline computer 122—123
Emer, J.S. 80
ENQUEUE, a high-level function that adds an item to a queue 351—352 355 357 360—361 363—365 387
Exchange see "Pair-wise exchange" "Shuffle-exchange"
Exclusive access, a state in which some single process is granted the right to read, modify, and write a shared datum, and no other processor can access the datum while the first program has exclusive access to the shared datum 308 311 318 320 347—365 see
Execute stage, the stage in a pipelined processor at which an instruction is executed 112 157 161—162
EXPONENT 127—130
Fan-in, the number of-logic signals that directly drive a given logic gate 198
Fan-out, the number of logic gates driven by a specific logic gate 198
Feedback path, a path from the output of a functional unit to an input of the same unit 129
Fetch-and-Add, a computer instruction that updates a memory operand, returns the value of the operand before the update, and if executed concurrently by several processors simultaneously, produces a set of results as if the processors executed in some serial order 318—322 348 362—365 378 386—388
FFT (fast Fourier transform) see "Fourier transform"
Fine-grain parallelism, a form of parallel execution in which the amount of work per task is small compared to the amount of work per task required for communication and overhead 283—297
Finite-element method, a numerical technique in which physical systems are analyzed mathematically by modeling the system at the nodes of a mesh of data points 12 267
Floating-point arithmetic 127—137 169 177—178
Floating-point arithmetic, addition 129—134
Floating-point arithmetic, multiplication 127—129 132—134
Fluid flow 12
Flynn, M.J. 172 279—280
Footprint size, the number of lines of a process footprint held in a cache 60
Footprint, the distinct lines of a process held in an infinite cache that are touched during the execution of the process 60—66 85 99
Forbidden cell, a cell of a reservation table for one operation that cannot be used by another operation because of a timing conflict 142—148
Ford, L.R., Jr. 370
Forsythe, G. 173 252
FORTRAN 180 375
Forwarding register, a register that is temporarily assigned the role of a different register 157 see
Fourier transform 196—197 213 228—229 311—316 368
FPS-164 262—266 268—269 271
Free pool, a collection of registers available for use as forwarding registers 158—160
Freeable, the state of a forwarding register after its contents have been used and the register can be returned to the free pool 158—160
Fujitsu Corporation 168 262 375
Fulkerson, D.R. 370
Full-information function, a multi-output function each of whose outputs depends on every input 196—197
Fully associative, a cache structure in which every tag in the cache is compared to the tag of the datum being accessed 34
Gauss — Seidel iteration, an iterative scheme for solving linear equations in which each interior point is updated with two neighboring values from the present iteration and two neighboring values from the prior iteration 379—381
Gaussian elimination, a method for solving linear systems of equations 249—251 255 275 331—332
GCD (greatest common divisor) 256
GF-11 see "IBM GF-11"
Gflops (Gigaflops), a computation rate of one billion floating-point operations per second 269
Gigaflops see "Gflops"
Global memory, a memory directly accessible by every processor in a multiprocessor 299 342—343 see
Golub, G.H. 206
Goodman, J. 326
Gottlieb, A. 226 318 364
Granularity, a measure of the size of an individual task to be executed on a parallel processor 283—297 299—300 338—341
Gravitation 182
greatest common divisor (GCD) 256
Greedy strategy, a strategy that initiates a new pipeline operation at the earliest opportunity 138—140
Grosch's Law, an empirical rule that says that the cost of computer systems increases as the square root of the computational power of the systems 14
Gupta, S.C. 41
Gustafson, J.L. 262 264
Halstead, R. 346
Hash lookup, a search technique in which the search key is transformed to an address at which the search begins 268
Hayes, J.P. 20
Heller, D.E. 209 249
Hierarchy (of memory system), a multilevel memory structure in which successive levels are progressively larger, slower, and less costly 22 26 100—101
High-speed buffer memory, a memory that holds data en route between a large main memory and the registers of a high-speed processor 259 see
Hill, M. 170
Hillis, W.D. 279 324
Hit see "Cache hit"
Hit ratio, the ratio of the number of cache hits to the total number of cache accesses 38
Hitachi Corporation 39 262
Hoevel, L.W. 58
Hopcroft, J.E. 369
Hoshino, T. 178 180
Hot-spot contention, an interference phenomenon observed in multiprocessors due to memory access statistics being slightly skewed from a uniform distribution to favor a specific memory module 316—317 320 384
Hwang, K. 172 357
Hypercube, a parallel processor whose interconnection structure treats individual processors as the nodes of a multidimensional cube and interconnects two processors if the corresponding nodes of the cube are neighbors 231 323—324 see
IBM 3090 155 282 375
IBM 801 168
IBM Corporation 168 262
IBM GF-11 268—271 279 284
IBM RP3 226 311 322—324 339 344
IBM STRETCH 103
IBM System 360-370 19—20 168 357
IBM System 360/91 161—164
IEEE 802.5 Token-Ring Standard 304
IEEE Standard for Floating-Point Arithmetic 172
ILLIAC IV 123—124 179 189—195 199—200 269 271 275
Image processing, a computation performed on a digitized representation of an image whose purpose is to enhance the image or to extract information about the image 12
Increment (for synchronization) 352—355 387
Indurkhya, B. 283 290—291
Inferencing system, a programming system that produces results by following a logical chain of inferences 12
Initialization (of cache simulation) 41—42
Inner product, the sum of the component = by = component products of the elements of two vectors 149 263
Input/output overlap, the act of performing input/output processing concurrently with other processing 103
Input/output processor, a processor whose function is specialized to input/output processing 67—69 168
Instruction buffer, a small high-speed memory that holds instructions recently executed or about to be executed 245
Instruction cache, a cache memory dedicated to the storage of instructions 116
Instruction decode, the machine cycle during which an instruction is examined and the control signals required for the execution of the instruction are produced 105 156—157
Instruction fetch, the machine cycle dedicated to the access and retrieval of the next instruction to execute 105 116 156—157 169
Instruction set, the repertoire of instructions executable by a computer 2 see RISC"
INTEL 8086 103 194
Intel 8087 194
Intel 808X 19
Interconnection network, the system of logic and conductors that connects together the processors in a parallel computer system 199 210—226 280 see "Crossbar" "Hypercube" "Near-neighbor "Perfect "Ring" "Shuffle-exchange"
Interlock, a control device or signal that defers the execution of one function until a conflicting function has completed execution 114 156 161 263—264
Interlock, elimination of 148—150
Intermediate memory 245—248 see
Internal forwarding, an execution technique in which special registers are temporarily assigned the function of physical machine registers to hold operands while awaiting execution in order to reduce conflicts for machine resources that otherwise would occur 155—164 173
Interprocessor communication, the data and control information that passes among the processors of a parallel computer during the execution of a parallel program 281
Interrupt, a temporary suspension of the normal sequence of program execution to perform a function that has been initiated by an external event or by an internal trap or monitor function 59 357 360
Interrupt-driven, a program function that is initiated by an interrupt caused by an external event 59 64
Invalidate, the process that removes a cache entry by changing its directory entry into an empty entry 68
|