Stone H.S. — High-performance Computer Architecture :: Электронная библиотека попечительского совета мехмата МГУ

Главная Ex Libris Книги Журналы Статьи Серии Каталог Wanted Загрузка ХудЛит Справка Поиск по индексам Поиск Форум

Авторизация

Поиск по указателям

Stone H.S. — High-performance Computer Architecture

Обсудите книгу на научном форуме

Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter

Название: High-performance Computer Architecture

Автор: Stone H.S.

Аннотация:

This book on computer architecture presents design ideas embodied in many high-performance machines and stresses techniques for evaluating them. Stone develops a proper understanding of the design process by treating the various trade-offs that exist in design choices and shows how good designs make efficient use of technology.

Язык:

Рубрика: Computer science/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Издание: Repr. with corrections

Год издания: 1988

Количество страниц: 424

Добавлена в каталог: 03.02.2015

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID

Предметный указатель

Access patterns, the statistical behavior of a sequence of memory operations      248—253
Access sequence, the sequence of memory addresses produced during the execution of a program      26—28
Access, a memory operation that is either a READ or a WRITE      23
Action at a distance, a physical force exerted at a point due to the influence of a remote source of the force      196
Address generation, during the execution of an instruction, the cycle in which an effective address is calculated by means of indexing or indirect addressing      105
Address mapper, the device that transforms a virtual address to a physical (real) address      70—71 74—81 165—167 see mapping"
Address trace, a recorded sequence of the memory addresses visited during the execution of a program      39—58
Address-reference stream, the sequence of memory addresses accessed during the execution of a program      40—41 see
Aho, A.E.      369
Algol 60      347
Algorithm (interaction with architecture)      17—19
Alignment network, a network that selects a subset of items read simultaneously from memory and permutes them to permit them to be manipulated in parallel      257—259
Allen, F.      375
Allen, J.R.      375 377
Amdahl Corporation      39 168
Amdahl, G.M.      19—20
AND, a boolean operation      203
Archibald, J.      326
Architecture      see "Computer architecture"
Arithmetic pipeline, a multistage arithmetic unit that is capable of starting a new operation while one or more operations are currently in execution, with the time interval between successive outputs less than the total time required to produce a single output      236—239
Array processor, a parallel computer, usually with near-neighbor connections between processors and capable of executing a single stream of instructions broadcast simultaneously to all processors      118—121
Artificial Intelligence, the study of computational techniques for solving difficult problems for which human-like approaches are required in their solutions      12
Assignment problem, a combinatorial problem whose solution assigns N tasks to N workers such that each worker is assigned a single task and such that the sum of the values of the worker-task assignments is maximized      370
Associative access, a memory access in which the access is made to an item whose key matches an access key as distinct from an access to an item at a specific address in memory      32 see
Associative memory, a memory whose contents are accessed by key rather than by address      32
Atlas computer      25—26 70
Attached vector-processor, a processor specialized for vector computations that is designed to be connected to a general-purpose host processor, which supplies input/output functions, a file system, and other aspects of a computing system environment      261—266
Auxiliary memory, a bulk memory that is usually large, slow, and inexpensive, often a rotating magnetic or optical memory, whose main function is to store large volumes of data and programs that are not currently being accessed by a processor      72—73 85 90—94
Baer, J.L.      20 326
Balance (of a computer system's components), a state in which the processor bandwidth matches closely the bandwidths of the memory, interconnection network, and input/output system so that no specific component strongly limits the system throughput      340
Bandwidth of communication system      185 191 273 300—301 312—317 384
Bandwidth of input/output system      185 191 273 384
Bandwidth of memory      22 176 184—185 191 235 245 255 273 383—384
Bandwidth of processor      176 185 191 235 272—273 383
Bandwidth, the number of bits per second that can be processed by a memory, arithmetic unit, input/output processor or communication system      116 163—164
Bank (of memory), a module of memory that can sustain a single access to one physical cell of memory per memory cycle      116
Barrier synchronization, a means for synchronizing a set of processors in a multiprocessor system by halting processors in that set at a specified barrier point in a program until every processor in the set reaches the barrier      321 336—338 379 381 387
Base address (of a page), the physical address of the start of a page      75—76
Batcher, K.E.      213 223
BBN Butterfly      225—226
Beetem, J.      269
Belady, L.      52 84
Bell, C.G.      20
Benes network, a switching network proposed by V. Benes that is capable of producing an arbitrary permutation of its inputs at its outputs      269
Benes, V.      269
Berkeley RISC      168 170—171
Bidiagonal system of equations, a linear system in which the only nonzero coefficients lie on the major diagonal and on one diagonal immediately below or above the major diagonal      207
Binary search, a search algorithm in which the region to be searched shrinks by half at each step      367
Binomial distribution, the probability distribution that describes independent tosses of a fair coin      62 64
Bitonic sequence, a sequence of numbers that is the concatenation of an ascending and a descending sequence, or is a cyclic shift of such a sequence      223 232
Bitonic sorter, a sorting network whose subnetworks sort bitonic subsequences into fully sorted subsequences      223—225 232
Block (of a cache)      31 see
Bolt, Beranek, and Newman      see "BBN"
Booth's algorithm, an efficient algorithm for integer multiplication      171
Booth, A.D.      171
Bottleneck      21 102 298 300—302 309 313—316 340—341 344 363 383
Branch prediction, the use of history, statistical methods, or heuristic rules to predict the outcome of conditional branches      152—153 155
Branch-and-bound search, a search technique in which the search eliminates large numbers of cases by determining that the solutions eliminated fall above a computed bound      369—373
Branch-history table, a hardware device that saves the recent history of conditional branches so that this information can be used for branch prediction      151 153—155
Breakeven point, the number of processors in a multiprocessor system whose combined throughput is equal to a single processor of the same power      296
Briggs, F.A.      326 358
Broadcast, a form of communication in which one transmitter sends one message simultaneously to many receivers      199—200 326 328
Budnik, P.P.      257
Buffer effects (in virtual memory), a phenomenon that causes a fraction of real memory to serve as a buffer for pages flowing to and from auxiliary memory      90—94
Buneman, O.      206
Burks, A.W.      103
Burroughs' B5700      172
Burroughs' B6700      172
Burroughs' Scientific Processor (BSP)      257—259 277
Bus (interconnection), an interconnection in which all transmitters and receivers are directly connected to a common set of interconnection lines that comprise the bus      299—303 309 323 331 384
Butterfly operation, the core operation of a Fast Fourier Transform that consists of forming the weighted sum and difference of two operands      312—314
Buzbee, B.L.      206
C.mmp multiprocessor      309
Cable density      226
Cache coherence, the state that exists when all caches within a multiprocessor have identical values for any shared variable that is simultaneously in two or more caches      324—383
Cache directory, the collection of tags in a cache that are used for associative access to cached data      32 67—68
Cache for data      116
Cache for instructions      116
Cache hit, a cache access that successfully finds in the cache the data requested      31 54—58
Cache miss, a cache access that fails to find in the cache the data requested      30 36—37 45—52 54—66 96 116
Cache, a small capacity, high-speed buffer memory      22 29—68 115 165—168 245 247 298—299 322—323
Cache, coherence      324—329 383
Cache, design of      94 98—99
Cache, miss ratio      80
Cache, replacement policy      43 52—58
Cache, set of data in      32—35
Cache, simulation of      41
Cache, structure of      29—39
Cache, tag (in directory)      31 35 41 47 57
Cache, techniques for analysis of      39—52
Cache, two-level      39
Cache, vector operands stored in      276—277
Cache, writing to      66—69
Cache-reload transient, the cache misses that occur when a program formerly in execution is restarted after other programs have used the cache      58—66
Carnegie — Mellon University      309
CDC 6600      103 110—112 114—116 161 163—164 172
CDC STAR      124 240 246 267 271
Central Limit Theorem, the theorem that states that the distribution of the sum of identical and independently distributed random variables asymptotically approaches a normal distribution      290
Chaining (of computations), the technique in which an output stream of vector results is directed to the input of another vector operation without being returned to intermediate storage between operations      125
Charlesworth, A.E.      262 264
Checkerboard ordering (for a mesh calculation), an ordering of operations in which an iterative calculation is performed first on the "Red" nodes and then on the "Black" nodes in the mesh      379—381
Chen, P.Y.      315
Chen, T.C.      117 122
Chickens      298
Chu, W.W.      87—88
Chunksize, the number of iterations to be grouped together as a single task in order to increase task granularity      341—345 375 379 386
CISC (Complex Instruction-Set Computer), a computer with an instruction set that includes complex (multicycle) instructions      168—171
Clark, D.W.      80
Coarse-grain parallelism, parallel execution in which the amount of computation per task is several times larger than the overhead and communication expended per task      283—297
Cocke John      168
Coffman, E.G., Jr.      41 86 90
Coherence (of cache)      see "Cache coherence"
Collision vector, a binary control-vector whose bits indicate when an operation can be initiated safely in a pipeline computer      132 134—138 141 174
Collision, an event in which two or more different operations require the use of the same pipeline stage at the same clock cycle      134
Column access, a concurrent memory access to all elements of a column of a matrix      251—255 274—275 331
Colwell, R.P.      171
Combining switch, a switching element of an interconnection network that has the ability to combine certain types of requests into one request, and to produce a response that mimics serial execution of the requests      310—312 318—324 339 384
Common data-bus, a hardware mechanism for transmitting results produced by a collection of arithmetic units to machine registers and reservation stations      162—163
Communication cost      283—299
Compare-and-Swap, an instruction that is used for processor synchronization      348 355—362 378 387
Compatibility      19
compiler optimization      374—382
Complex instruction-set computer      see "CISC"
Computer architecture and technology      2
Computer architecture, cost of      8—9
Computer architecture, evaluation of      8—9 19
Computer architecture, performance of      8—9
Computer architecture, special purpose vs. all purpose      13
Computer architecture, textbooks      20
Computer architecture, the study of computer structures, their applications, and their performance      1—2 13
Computer vision      12
Concert multiprocessor      346
Conditional branch in a pipeline      150—155
Conditional branch, a computer instruction that alters the sequence of execution if a condition is true, and otherwise falls through to the next instruction in sequence      107 113 176 230—231
Confidence interval, an interval based on statistical sampling that shows where a population of random variables lies to within a specified level of confidence      49
Conflict in a network      312—317
Conflict in a pipeline      113—115 142—148 156
Conflict, a situation in which two or more operations require the same resource, forcing one operation to wait for the other to complete      113—115 142—148 156 312—317 see "READ/WRITE "WRITE/READ "WRITE/WRITE
Connection Machine      279 284 324
Contention, interference among tasks caused by tasks competing for shared resources, thereby forcing one or more tasks to become idle momentarily while waiting for resources to become available      306—307 311—315 342 384
Context switch, the process of saving the state of one task and restoring the state of a second task to enable a computer system to change execution from one program to another      81
Continuum model, a model of physical systems in which continuous quantities are modeled at discrete points and physical interactions are modeled as interactions among neighboring mesh points      180—184 186—209 227—229 233 274
Cooley, J.W.      228
Coonen, J.T.      172
Cosmic Cube      194—195 202 210 228—229 231 324
Cost      4—11
Cost of development      5
Cost per-unit      5
Cost-performance ratio      10—11 15—16 235 296
Cray I      125 244—247 259 271 282 284

Cray II      125—126 261
Cray XMP      282 284
Critical section, a section of a program that can be executed by at most one process at a time      308 311 318 320 347—365
Crossbar (interconnection), an interconnection in which each input is connected to each output through a path that contains a single switching node      305—310 313 323 331
Crosspoint, a switching node in a crossbar that connects a single input to a single output      306
Crowther, W.      225
Cvetanovic, Z.      312 314 316 344
Cycle (in reduced state-diagram), a path in a reduced state-diagram that specifies a steady-state schedule for introducing operations to a pipeline      140—141 174
Cycle (of computer clock), an electronic signal that counts a single unit of time within a computer      16 151 236—239 254 343—344
Cycle time, effective      38
Cycle time, the length of a single cycle of a computer function such as a memory cycle or processor cycle      24 29 31
Cyclic reduction, an algorithm used to solve linear systems that have a particular structure      208—209 229—230
Cytron, R.G.      375 377 382
Data cache, a cache that holds data, but does not hold instructions      116
Data flow (analysis of requirements), the sequence of processes and data transmissions that are performed on a collection of data during a computation      195—198
Database system      12
Davidson, E.S.      127 135 140
de Kooning, W.      3—4
Dead line, a line of a cache that will be discarded from cache before it will be the target of a cache access      55
Deadlock, the state in which two or more processes are deferred indefinitely because each process is awaiting another process to make progress, and no process is able to make progress      308 387
DEC PDP-11      19 309
DEC VAX      19 77 80—81 83—84 168 171
Decode-history table, a small cache-like memory that saves the recent history of decoding information for conditional-branch instructions so that this information can be used by a branch prediction mechanism      154—155
Decrement (for synchronization)      352—355 387
Delay (in pipeline), a logic device used to store and synchronize data in a pipeline      142—148 242—244
Delayed branch, a branch instruction that defers altering the flow of control until one more instructions that follow it have completed execution      151—152
Denneau, M.      269
Denning, P.J.      41 85—86 90
Dependency analysis, an analysis that reveals which portions of a program depend on the prior completion of other portions of the program      375—377 385—386
DEQUEUE, a high-level function that removes an item from a queue      351—352 355 358 360 363—365 387
Development cost      5
Dias, D.M.      315
Digital Equipment Corporation      see "DEC"
Dijkstra, E.M.      347 370 374 387
Direct mapping, a cache that has a set associativity of one so that each item has a unique place in the cache at which it can be stored      34
Directory (of a cache), the portion of a cache that holds the access keys that support associative access      32 56 see tag"
Disk buffer, a high-speed buffer memory resident within a disk controller that is used as a private cache for the disk system      92—94 97—98
Disk cache      see "Disk buffer"
Disk memory      90—94 see
Division      174
do par, a program statement that permits the iterations of a loop to be executed in parallel      335—336 345 386
do seq, a program statement that forces the iterations of a loop to be executed sequentially      335—336 345
Dubois, M.      326
ECL (emitter-coupled logic)      15
Efficiency of array computer      118—120
Efficiency of multiprocessor computer      280—299 309 338—341
Efficiency of pipeline computer      122—123
Emer, J.S.      80
ENQUEUE, a high-level function that adds an item to a queue      351—352 355 357 360—361 363—365 387
Exchange      see "Pair-wise exchange" "Shuffle-exchange"
Exclusive access, a state in which some single process is granted the right to read, modify, and write a shared datum, and no other processor can access the datum while the first program has exclusive access to the shared datum      308 311 318 320 347—365 see
Execute stage, the stage in a pipelined processor at which an instruction is executed      112 157 161—162
EXPONENT      127—130
Fan-in, the number of-logic signals that directly drive a given logic gate      198
Fan-out, the number of logic gates driven by a specific logic gate      198
Feedback path, a path from the output of a functional unit to an input of the same unit      129
Fetch-and-Add, a computer instruction that updates a memory operand, returns the value of the operand before the update, and if executed concurrently by several processors simultaneously, produces a set of results as if the processors executed in some serial order      318—322 348 362—365 378 386—388
FFT (fast Fourier transform)      see "Fourier transform"
Fine-grain parallelism, a form of parallel execution in which the amount of work per task is small compared to the amount of work per task required for communication and overhead      283—297
Finite-element method, a numerical technique in which physical systems are analyzed mathematically by modeling the system at the nodes of a mesh of data points      12 267
Floating-point arithmetic      127—137 169 177—178
Floating-point arithmetic, addition      129—134
Floating-point arithmetic, multiplication      127—129 132—134
Fluid flow      12
Flynn, M.J.      172 279—280
Footprint size, the number of lines of a process footprint held in a cache      60
Footprint, the distinct lines of a process held in an infinite cache that are touched during the execution of the process      60—66 85 99
Forbidden cell, a cell of a reservation table for one operation that cannot be used by another operation because of a timing conflict      142—148
Ford, L.R., Jr.      370
Forsythe, G.      173 252
FORTRAN      180 375
Forwarding register, a register that is temporarily assigned the role of a different register      157 see
Fourier transform      196—197 213 228—229 311—316 368
FPS-164      262—266 268—269 271
Free pool, a collection of registers available for use as forwarding registers      158—160
Freeable, the state of a forwarding register after its contents have been used and the register can be returned to the free pool      158—160
Fujitsu Corporation      168 262 375
Fulkerson, D.R.      370
Full-information function, a multi-output function each of whose outputs depends on every input      196—197
Fully associative, a cache structure in which every tag in the cache is compared to the tag of the datum being accessed      34
Gauss — Seidel iteration, an iterative scheme for solving linear equations in which each interior point is updated with two neighboring values from the present iteration and two neighboring values from the prior iteration      379—381
Gaussian elimination, a method for solving linear systems of equations      249—251 255 275 331—332
GCD (greatest common divisor)      256
GF-11      see "IBM GF-11"
Gflops (Gigaflops), a computation rate of one billion floating-point operations per second      269
Gigaflops      see "Gflops"
Global memory, a memory directly accessible by every processor in a multiprocessor      299 342—343 see
Golub, G.H.      206
Goodman, J.      326
Gottlieb, A.      226 318 364
Granularity, a measure of the size of an individual task to be executed on a parallel processor      283—297 299—300 338—341
Gravitation      182
greatest common divisor (GCD)      256
Greedy strategy, a strategy that initiates a new pipeline operation at the earliest opportunity      138—140
Grosch's Law, an empirical rule that says that the cost of computer systems increases as the square root of the computational power of the systems      14
Gupta, S.C.      41
Gustafson, J.L.      262 264
Halstead, R.      346
Hash lookup, a search technique in which the search key is transformed to an address at which the search begins      268
Hayes, J.P.      20
Heller, D.E.      209 249
Hierarchy (of memory system), a multilevel memory structure in which successive levels are progressively larger, slower, and less costly      22 26 100—101
High-speed buffer memory, a memory that holds data en route between a large main memory and the registers of a high-speed processor      259 see
Hill, M.      170
Hillis, W.D.      279 324
Hit      see "Cache hit"
Hit ratio, the ratio of the number of cache hits to the total number of cache accesses      38
Hitachi Corporation      39 262
Hoevel, L.W.      58
Hopcroft, J.E.      369
Hoshino, T.      178 180
Hot-spot contention, an interference phenomenon observed in multiprocessors due to memory access statistics being slightly skewed from a uniform distribution to favor a specific memory module      316—317 320 384
Hwang, K.      172 357
Hypercube, a parallel processor whose interconnection structure treats individual processors as the nodes of a multidimensional cube and interconnects two processors if the corresponding nodes of the cube are neighbors      231 323—324 see
IBM 3090      155 282 375
IBM 801      168
IBM Corporation      168 262
IBM GF-11      268—271 279 284
IBM RP3      226 311 322—324 339 344
IBM STRETCH      103
IBM System 360-370      19—20 168 357
IBM System 360/91      161—164
IEEE 802.5 Token-Ring Standard      304
IEEE Standard for Floating-Point Arithmetic      172
ILLIAC IV      123—124 179 189—195 199—200 269 271 275
Image processing, a computation performed on a digitized representation of an image whose purpose is to enhance the image or to extract information about the image      12
Increment (for synchronization)      352—355 387
Indurkhya, B.      283 290—291
Inferencing system, a programming system that produces results by following a logical chain of inferences      12
Initialization (of cache simulation)      41—42
Inner product, the sum of the component = by = component products of the elements of two vectors      149 263
Input/output overlap, the act of performing input/output processing concurrently with other processing      103
Input/output processor, a processor whose function is specialized to input/output processing      67—69 168
Instruction buffer, a small high-speed memory that holds instructions recently executed or about to be executed      245
Instruction cache, a cache memory dedicated to the storage of instructions      116
Instruction decode, the machine cycle during which an instruction is examined and the control signals required for the execution of the instruction are produced      105 156—157
Instruction fetch, the machine cycle dedicated to the access and retrieval of the next instruction to execute      105 116 156—157 169
Instruction set, the repertoire of instructions executable by a computer      2 see RISC"
INTEL 8086      103 194
Intel 8087      194
Intel 808X      19
Interconnection network, the system of logic and conductors that connects together the processors in a parallel computer system      199 210—226 280 see "Crossbar" "Hypercube" "Near-neighbor "Perfect "Ring" "Shuffle-exchange"
Interlock, a control device or signal that defers the execution of one function until a conflicting function has completed execution      114 156 161 263—264
Interlock, elimination of      148—150
Intermediate memory      245—248 see
Internal forwarding, an execution technique in which special registers are temporarily assigned the function of physical machine registers to hold operands while awaiting execution in order to reduce conflicts for machine resources that otherwise would occur      155—164 173
Interprocessor communication, the data and control information that passes among the processors of a parallel computer during the execution of a parallel program      281
Interrupt, a temporary suspension of the normal sequence of program execution to perform a function that has been initiated by an external event or by an internal trap or monitor function      59 357 360
Interrupt-driven, a program function that is initiated by an interrupt caused by an external event      59 64
Invalidate, the process that removes a cache entry by changing its directory entry into an empty entry      68

1 2 3

О проекте