Inverse mapper, a device that computes a virtual address from a physical (real) address 166
Inverse perfect shuffle see "Perfect shuffle inverse
Jacobi iteration, an iterative method for solving linear equations that updates each point in a new iteration only after all points have been updated for the prior iteration 379
Jump, J.R. 315
Karp, R.M. 366—368 373
Kilburn, T. 25—26
Knowledge base, a collection of rules and data used by inferencing programs during computations 12
Kogge, P.M. 127 140 149 204 243 264
Kruskal, C.P. 315
Kuck, D.J. 257 375
Kung, H.T. 226
Lamport, L. 380—381
Latch, a one-bit storage device that saves the contents of its input at the instant a clock signal changes state 112
Latency, the delay between the request for information and the time the information is supplied to the requester 72 91
Lawler, E.L. 370
Lawrie, D.H. 311 315
Least-recently used see "LRU"
Leiserson, C.E. 226
Line (of a cache), a collection of contiguous data that are treated as a single entity of cache storage 31—32
Line size, the number of bytes in a cache line 34 46—52
Linear equation, an equation that depends on its variables only through the addition of a multiple of each variable 262
Linear programming, an optimization technique for solving constrained problems in which behavior equations and constraint equations are linear functions of the variables 262
Linear recurrence, a recurrence relation in which each successive result is a linear function of past results 203—205
Linear-equation solver, an algorithm for solving linear equations 175
Lisp 346
Livelock, a state in which actions taken by concurrently executing processes prevent computation from proceeding, but computation can proceed if some processes alter their execution behavior 354—355 364—365 387
Local memory, the private memory directly connected to a processor in a parallel computer 299 322—323 343
Locality (of memory references), the characteristic tendency for programs to access regions in the near future that were accessed in the recent past 26—29 74 81—84
Lock, a primitive operation that grants a process the exclusive right to continue execution only if no other processor currently holds that exclusive right 308 318 339—341 343 349—353 357 359 378 see
Loop interconnection see "Ring"
Losq, J.J. 154
LRU (Least-Recently Used) replacement policy, a memory management strategy that purges the least recently used candidate from memory, while retaining candidates used more recently 44 52—58 62 84—85 248
LU decomposition, a method for solving linear equations based on Gaussian elimination 173 207 249—251 330—331
Mantissa, the significant-bit field of a floating-point operand 127—130
Mapper see "Address mapper"
Mashburn, H.H. 309
Mattson, R.L. 43—46 52
MAX 203
Maximum compatible set, a set of integers, no two of which are incompatible and to which no other compatible integer can appended 147—148
Megaflops see "Mflops"
Memory 21—101 see
Memory access 26—29 248—253 see
Memory address, the unique location for each item in a memory by which that item is accessed 23—24
Memory hierarchy 22 26 100—101 see
Memory management, the process of controlling the flow of data among the levels of memory hierarchy 70—73 84—90 92—94
Memory, access patterns 26—29
Memory, bandwidth 22
Memory, bottleneck 21 102 309 344
Memory, cycle time 24 29 31
Memory, Hierarchy 22 26 100—101 see
Memory, Random Access 23—24 26
Memory, sequential access 24
Memory, structure for a pipeline computer 115—117
Mesh calculation 123 179 187—189 197—199 229—230 see "Finite-element
Mflops (Megaflops), an execution rate of millions of floating-point instructions per second 234 249
MIMD (Multiple Instruction-stream, Multiple Data-stream), a parallel computer structure composed of multiple independent processors 279—280 see "SIMD"
MIN 203
MIPS (Millions of Instructions per Second), a measure of the maximum computation rate of a computer 14—16 333 345 350—352
Miranker, W.L. 366—368 373
Miss ratio, steady-state 58
Miss ratio, the ratio of cache misses to total cache accesses 42 58 80 325 see "Hit
MIT Multics 172
Miura, K. 375
Model (of multiprocessor performance) 285—299 330—331 338—341 see
Moler, C.B. 173 252
Monte Carlo simulation, a computational method in which physical calculations are performed by simulating the statistical behavior of elementary components of a physical system 12
MOS (Metal-Oxide Semiconductor) 14
Motorola 680XX 19 168 171
MSYPS (Millions of SYnchronizations Per Second), a measure of the maximum rate at which a multiprocessor can perform synchronizations among its processors 333 339 345 350—352 363 384—385
MULTICS 172
Multiple instruction-stream, multiple data-stream see "MIMD"
Multiple-purpose architecture, a computer structure that can perform a broad variety of computations 185 186 191 273 385
Multiplier tree 171
Multiprocessor, a parallel computer composed of multiple independent processors and facilities for controlling their interaction and cooperation 278—287 see
Multiprocessor, cache coherence in 324—329 383
Multiprocessor, compiler optimization for 374—382
Multiprocessor, efficiency of 280—299 309 338—341 365—374
Multiprocessor, interconnections 299—324 341—345
Multiprocessor, parallel execution of 333—338 365—374
Multiprocessor, parallel search in 365—374
Multiprocessor, performance of 283—299 338—341 365—374
Multiprocessor, synchronization of 338—341 347—365
Multiprocessor, task initiation 345—347 386
Multiprogramming, a technique for executing more than one program at a time in a single processor by periodically changing the program currently being executed by the processor 71—72
Near-neighbor interconnection, an interconnection structure for a parallel processor in which each processor is connected directly to its near neighbors 183 198 200 226—228 233 274
NEC (Nippon Electric Corporation) 262
Nelson, V.P. 315
Newell, A. 20
Nicol, D.M. 291
Nielson, C.W. 206
NMOS Negatively doped MOS (Metal-oxide semiconductor) 14
Nonlinear systems of equations, a system of equations in which the variables are linked by one or more nonlinear relations 262
Normal distribution, the statistical distribution whose probability density follows a bell-shaped curve 64
Normalization, the process that transforms a floating-point number into a representation such that the leading digit of a nonzero mantissa is nonzero 127—130
Norton, V.A. 315—316
NP-complete, a class of problems for which there exists no current algorithm that can solve any problem in the class in a time guaranteed to be less than exponential in the size of the problem 333 369
NYU Ultracomputer 226 311 339
Offset, a small integer whose value is the relative displacement from a base address to the address at which an access is to be made 75—76
One-level store, a multilevel memory hierarchy that functions as if there were a single level in the memory hierarchy 25
Opderbeck, H. 87—88
Operand fetch, the machine cycle dedicated to the access and retrieval of an operand 105 116 156—157
OPT, a nonrealizable optimum replacement policy for cache and virtual memory 52—58
Optical transmission 226 305
OR 203
Organick, E.I. 172
Overflow, the state in which a numerical value exceeds the maximum representable numerical value 127—130 172
Overlap, the ability to perform two or more functions concurrently 123—126 283—299
Overlay 25
Padmanabhan, K. 315
Padua, D.A. 375 377
Page fault, an access to a page that is not resident in main memory 26 30 72—73 75 77 84—88 91
Page number, the field of a virtual address that identifies the page to be accessed 76—77
Page replacement, the process that determines which page to move from main memory to auxiliary memory to make room for a new page in main memory 84—90 92
Page size, the number of bytes in a page 29
Page table, a table used by a page mapper in a virtual memory system that contains the physical (real) address for each page, and is accessed by page number 77
Page, a contiguous region of memory that is treated as a single entity in virtual-memory systems 26
Page-fault frequency replacement (PFF), an algorithm for managing a virtual memory that increases the number of pages assigned to a process when page faults occur at a rate above a fixed threshold 87—90 96—97
Pair-wise exchange, an interconnection switch that swaps data between adjacent processors 219 222
Parallel architecture 18—19
Parallel computation 12 123—124 179 189—195 199—200 202 206—209 228—229 231 332—335
Parallel time, the elapsed execution time for a parallel computation 123
Partial differential equation, an equation that expresses the relations among variables and their partial derivatives 181
Particle model, a computational process in which physical behavior is modeled through the simulation of discrete particles acted upon by physical forces produced remotely 180—184 233 see
Partitioning (of programs to pages or segments), the process of grouping related portions of programs together to force them to reside in contiguous regions of memory so that they tend to be transferred together among the levels of a memory hierarchy 74
Patel, J.H. 127 140
Patterson, D.A. 169—170
PDP-11 see "DEC PDP-11"
Pease, M.C. 213—214 217
Per-unit cost, the manufacturing cost of one additional item 5
Perfect-shuffle interconnection, an interconnection structure that connects processors according to a permutation that corresponds to a perfect shuffle of a deck of cards 210—226 228—229 231—232 269—270 310—324
Perfect-shuffle interconnection, inverse of 217—218
Performance model, an idealized mathematical model that is useful for predicting the performance of a computer system 285—297
Performance model, fully overlapped communication 293—295
Performance model, linear communication costs 291—293
Performance model, multiple communication links 295—297
Performance model, N processors with overlapped communication 286—290
Performance model, stochastic 290—291
Performance model, two processors with overlapped communication 285—286
Permutation memory (in the GF-11), a memory that stores the control settings for a collection of permutations, each of which is to be used for routing information among processors and memories 271
Permutation, a one-to-one mapping from a set of objects onto the same set of objects 370—373
PFF see "Page-fault frequency"
Pfister, g. 226 316—317 322
Physical address, the address of an item in physical (real) memory 70 75 77—78 81 165—167
Pipeline (in a computer system) in vector computer 234—235
Pipeline (in a computer system), a structure that consists of a sequence of stages through which a computation flows with the property that new operations can be initiated at the start of the pipeline while other operations are in progress through the pipeline 102—176 234—243 263—266
Pipeline (in a computer system), adding delays to 140—148
Pipeline (in a computer system), arithmetic units 263—266
Pipeline (in a computer system), conditional branches in 150—155
Pipeline (in a computer system), conflicts in 113—115
Pipeline (in a computer system), control of 127—137 174—176
| Pipeline (in a computer system), design of 103—115
Pipeline (in a computer system), maximum performance of 138—148
Pipeline (in a computer system), performance of 117—127 298
Pipeline (in a computer system), streaming operation of 236—243 276—277
Pivot (in Gaussian elimination), the largest element in a region of an array, which is chosen to serve as the element around which a transformation of a subarray is performed 251 274
Poisson's equation, an equation that describes physical potential as a function of charge density 187—191 197 206 229—230 334—335 379
Polynomial 232
Pomerene, J. 56
Preparata, F. 225
Process tag (in a cache memory), a field that gives the identity of the specific process that created a particular line in the cache 81
Program partitioning 25 see
Propagation effects, physical effects that tend to degrade signal quality and to increase propagation delays 226
Purge (of cache and TLB), the process that removes all entries in a cache or cache-like memory that are associated with a process that has relinquished its use of a processor 81
Puzak, T.R. 45—52
Quantum chromodynamics, a branch of theoretical physics concerned with the behavior and properties of elementary particles 270
Queue (for shared access) 318—321 338—339 341 351—352 355—362 386
R/C ratio, the ratio of a task's running time to its overhead and communications time, a measure of task granularity 283—297 299—300 324 338—343 377—379 382
Radin, G. 168
RAM (Random-Access Memory) 23—24 72 100
Random-access memory (RAM), a memory in which the time required to access an item is independent of the past history of accesses 23—24 72 100
Rao, G.S. 154
Ray tracing, an algorithm used to render life-like graphic images by tracing the path of rays of light from source to illuminated object 13
READ/MODIFY/WRITE, a noninterruptible sequence of operations required for operations that synchronize access to shared variables 308 339—341 347—353 357 359
READ/WRITE conflict 113—115 161 376—377 379 386 see
Real address see "Physical address"
Real Time 12
Rechtschaffen, R. 393
Recurrence relation, a relation that expresses the next item of a sequence as a function of the earlier items in the sequence 175 200—205
Recursive doubling, a technique for parallel execution that at each stage doubles the number of variables that influence the partial results at that stage of the computation 206—207 209 229 231 276
Red-black ordering (for a mesh calculation) 379—381 see
Reduced instruction-set computer see "RISC"
Reduced state-diagram, a diagram that describes the possible sequences of initiation of operations in a pipelined processing unit 138—140 174
Reformatting (of data structures), the process of transforming a data structure from one storage representation to another to facilitate parallel access to substructures of the data structure 260 see "Row
Register windows, a processor mechanism in which sets of registers automatically change their function when procedures are entered and exited 170—171
Remote effects, physical effects caused by interactions that are not near-neighbor interactions 183 see
Replacement policy, a policy that governs which items are to be removed from one level of a memory hierarchy when new items are put there 43—45 52—58 74 84—90 92 see "OPT"
Reservation station, a collection of hardware registers that hold data or reservations for data to be used in a future operation 158 162
Reservation table, a table that describes which resources are needed at each step of a pipelined computation 131 133—134 141 174
Reverse-binary operation, a permutation that maps Item i to the item whose is index is obtained by reversing the bits in the binary representation of i 312—314
Ring (interconnection), an interconnection structure in which nodes are connected in a loop structure 304—305 330 384
RISC (Reduced instruction-set computer), a computer in which all instructions are simple instructions that take one cycle to execute, except for instructions that require conditional execution and for delays access memory 151—155 168—171
Routing register, in ILLIAC IV, a register used for exchanging data among neighboring processors 190—191
Row access, a concurrent memory access to all elements of a row of a matrix 251—255 274—275 331
RP3 see "IBM RP3"
Sachar, H.E. 154
Savage, J.E. 41
Scalar arithmetic, arithmetic operations that manipulate individual data as opposed to arithmetic operations in which one operation manipulates an entire vector or matrix 264—265
Scalar operation, any operation performed on individual data 178 245
Scalar processor, a processor whose basic operations manipulate individual data elements rather than vectors or matrices 266
Scalar register, a register whose function is to hold scalar operands 244—245
Scheduling 281 383
Scoreboard, a hardware device that maintains the state of machine resources to enable instructions to execute without conflict at the earliest opportunity to do so 114—115 161—162
Search techniques 365—374
Segment number, the field of a virtual address that specifies which segment of a program is to be accessed 76
Segment table, the table in a virtual-memory system that is used to translate segment references in a virtual address to physical (real) addresses in main memory 77—79
Segment, a method for partitioning data into variable-length blocks of memory so that items grouped together are logically related 76 82
Segmented memory, a virtual memory system whose address space is partitioned into a disjoint collection of regions known as segments 82
Seitz, C.L. 194
Semaphore, a variable that is used to control access to shared data 349—355
Sequential-access memory, a memory system such as a magnetic tape memory in which items must be accessed sequentially, and in which the access time to a random item depends on which item in memory was accessed immediately prior to the given access 24 248
Sequin, C.H. 169—170
Serial access see "Sequential access"
Serial correlation, the statistical correlation among the addresses in a sequence of addresses in an address trace from which it is possible to predict future accesses 27—29 see
Serial time, the time it takes to execute an efficient version of an algorithm on a serial computer 123
Serialization, the process that forces a collection of complex tasks to take place one at a time rather than in parallel 319 321
Set see "Cache set"
Set associative, a cache structure in which all tags in a particular set are compared with an access key in order to access an item in cache. The set may have as few as one element or as many elements as there are lines in the full cache 32—34 38 43—52
Shadow directory, a cache directory that contains cache tags only, and no data 56—58
Shadow miss, a cache miss for which an entry exists in a shadow directory 57
Shar, L.E. 127
Shared memory 299—301 325—329 339—341 347—365 see
Shared page, a page of a virtual memory system that is shared by two or more programs 79
Shared segment, a segment of a virtual memory system that is shared by two or more programs 83—84
Shemer, J.E. 41
Shift-register analogy, a method for predicting the trajectory of an item in a perfectshuffle network by observing the successive states of a cyclic shift register 215—217 222
Shift-register controller (for a pipeline) 135—139 see
Shippey, B. 41
Shortest-path problem, a problem that requires the discovery of the shortest path between two nodes of a graph 370 374 387—388
Shuffle-exchange (interconnection), an inter-connection network that consists of a perfect shuffles and pair-wise exchanges 310—324
Siewiorek, D.P. 20
SIMD (Single Instruction-stream, Multiple Data-stream), a processor structure in which a single instruction manipulates an entire data structure 279 324 346—347 see "Connection "Vector
Single Instruction-stream, multiple data-stream see "SIMD"
Singleton, R.C. 213 217
Sites, R. 357
Skewed storage, a technique for storing matrices to facilitate parallel access to rows and columns 255—259
Slotnick, D.L. 189 195
Smith, A.J. 37 326
Smith, D.R. 369 373—374
Snir, M. 315
Solomon 192
Sorting 196 198 213 223—225
Sparse matrix, a matrix whose elements are mostly zeros 12 250 266—268
Sparse vector, a technique used in the CDC STAR for representing vectors whose elements are mostly zeros 267
Speech recognition 12
Speedup, the ratio of the time to execute an efficient serial program for a calculation to the time to execute a parallel program for the same calculation on N processors identical to the serial processor 108 121 193 205 209 289 294 298 332 339—340 343—344 367 373
Spin lock, a implementation of the LOCK primitive that causes a processor to retest a semaphore continuously until the semaphore changes value 351
Spirn, J.R. 41
Stable (numerically), an algorithm that produces small changes in the numerical answers in response to small changes in input data 251
Stack-replacement policy, a memory-replacement policy for which items that are retained in a small memory are a subset of the items retained if the memory size is increased 45
Stage (of a pipeline) 106
Stale data, data that remain in a cache when a process is moved to a different processor 326
Startup transient, the period immediately after the initiation of a vector instruction during which a pipeline produces no results or produces results at a low rate 117
Statistical sampling, a trace-reduction technique that predicts full cache performance by sampling the performance on a small number of cache sets 49 see
Steele, G.L., Jr. 324
Sterbenz, P.H. 172
Stone, H.S. 20 59 204 206 214—215 283 290 364
Stream (of data), a set of successive data presented to a pipeline arithmetic unit 234—239 276
Strecker, W.D. 58
Stride, the constant difference between successive addresses in a stream of data generated by a vector access 256
Sullivan, H.T. 322
Sussenguth, E. 153
Sweazey, P. 326
Synch, an elementary synchronization operation 339
Synchronization, an operation in which two or more processors exchange information to coordinate their activity 185 191—192 273 281 302 321 332—333 335—341 347—365 384—385
Synonym (in a cache), a situation in which two different items have the same virtual address but reside at different physical addresses 166—167
Synthetic workload 73
SYPS (SYnchronizations Per Second), a processing rate of one synchronization per second 333 339 see
Systolic array, a parallel computer with a highly structured, iterative interconnection pattern 226
Tag see "Cache tag"
Tanenbaum, A.S. 20
Tapped delay-line, a device whose taps produce delayed versions of the input data with each tap associated with a different delay 242
Test-and-Set, a primitive instruction that performs a READ/MODIFY/WRITE operation for synchronization of processors 348—352 387
Thanawastien, S. 315
Thiebaut, D. 59
Thirty-percent rule 37
Thornton, J.E. 112 161 172
Thrashing, a state in which multiple programs compete for real memory and no program is able to obtain enough memory to reduce its fault rate to a low value 85—86
Three-address instruction format, an instruction format with two fields for input operands and one field for a result operand 114 163
Threshold (for page-fault frequency) 88
Threshold phenomenon, for some physical systems, the situation in which behavior changes dramatically when a parameter crosses a threshold 64
TLB see "Translation-lookaside buffer"
Token Ring see "Ring interconnection"
Token, a unique data symbol used to control transmission for a parallel computer system connected as a ring 304—305
Tomasulo, R.M. 161—162
Tour, a path on a graph that visits every node exactly once and terminates at the starting node 370—373
Trace reduction, a technique for reducing the number of address references on an address trace while retaining the ability to use the trace to analyze cache performance 45—52 95
Trace stripping see "Trace reduction"
Trace-driven analysis, a performance analysis technique based on simulating the behavior of a computer system responding to stimuli obtained from a program trace 80
Transaction system 12
Transient (of cache simulation), the misses that occur during the beginning of a cache simulation due to incorrect initialization of the cache 40—42 50
Transient miss, a cache miss due to the improper initialization of a cache during a cache simulation 57
Translation-lookaside buffer (TLB), a cache-like memory that holds recently used mappings of virtual addresses to physical (real) addresses 75—76 80
Travelling Salesman Problem, a problem whose solution is the shortest path among N cities such that the path begins and ends at City 1 and no city is visited twice 332—333 369—374
Treiber, R.K. 358
Triangular matrix, a matrix whose nonzero elements lie on the major diagonal and in a triangular region that lies either above or below the major diagonal 249—252
Tridiagonal system of equations, a linear system whose defining matrix is a triangular matrix 173 206—209 229 231 249—252
Trivedi, K. 40
Tukey, J.W. 228
Two-address instruction format, an instruction format in which one field specifies an operand and a second field specifies an operand that also receives the result of the operation 163 173
Two-level mapping, a mapping from virtual addresses to physical (real) addresses that requires two successive table accesses 75—76
|