Morgan kaufmann high performance embedded computing architectures applications and methodologies sep 2006 ISBN 012369485x pdf

Thông tin tài liệu

About the Author Wayne Wolf is a professor of electrical engineering and associated faculty in computer science at Princeton University Before joining Princeton, he was with AT&T Bell Laboratories in Murray Hill, New Jersey He received his B.S., M.S., and Ph.D in electrical engineering from Stanford University He is well known for his research in the areas of hardware/software co-design, embedded computing, VLSI, and multimedia computing systems He is a fellow of the IEEE and ACM and a member of the SPIE He won the ASEE Frederick E Terman Award in 2003 He was program chair of the First International Workshop on Hardware/Software Co-Design Wayne was also program chair of the 1996 IEEE International Conference on Computer Design, the 2002 IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, and the 2005 ACM EMSOFT Conference He was on the first executive committee of the ACM Special Interest Group on Embedded Computing (SIGBED) He is the founding editor-in-chief of ACM Transactions on Embedded Computing Systems, He was editor-in-chief of IEEE Transactions on VLSI Systems (1999-2000) and was founding co-editor of the Kluwer journal Design Automation for Embedded Systems He is also series editor of the Morgan Kaufmann Series in Systems on Silicon Preface This book's goal is to provide a frame of reference for the burgeoning field of high-performance embedded computing Computers have moved well beyond the early days of 8-bit microcontrollers Today, embedded computers are organized into multiprocessors that can run millions of lines of code They so in real time and at very low power levels To properly design such systems, a large and growing body of research has developed to answer questions about the characteristics of embedded hardware and software These are real systems—aircraft, cell phones, and digital television—that all rely on high-performance embedded systems We understand quite a bit about how to design such systems, but we also have a great deal more to learn Real-time control was actually one of the first uses of computers—Chapter mentions the MIT Whirlwind computer, which was developed during the 1950s for weapons control But the microprocessor moved embedded computing to the front burner as an application area for computers Although sophisticated embedded systems were in use by 1980, embedded computing as an academic field did not emerge until the 1990s Even today, many traditional computer science and engineering disciplines study embedded computing topics without being fully aware of related work being done in other disciplines Embedded computers are very widely used, with billions sold every year A huge number of practitioners design embedded systems, and at least a half million programmers work on designs for embedded software Although embedded systems vary widely in their details, there are common principles that apply to the field of embedded computing Some principles were discovered decades ago while others are just being developed today The development of embedded computing as a research field has helped to move embedded system design from xvii xvili Preface a craft to a discipline, a move that is entirely appropriate given the important, sometimes safety-critical, tasks entrusted to embedded computers One reasonable question to ask about this field is how it differs from traditional computer systems topics, such as client-server systems or scientific computing Are we just applying the same principles to smaller systems, or we need to something new? I beUeve that embedded computing, though it makes use of many techniques from computer science and engineering, poses some unique challenges First, most if not all embedded systems must perform tasks in real time This requires a major shift in thinking for both software and hardware designers Second, embedded computing puts a great deal of emphasis on power and energy consumption While power is important in all aspects of computer systems, embedded applications tend to be closer to the edge of the energy-operation envelope than many general-purpose systems All this leads to embedded systems being more heavily engineered to meet a particular set of requirements than those systems that are designed for general use This book assumes that you, the reader, are familiar with the basics of embedded hardware and software, such as might be found in Computers as Components This book builds on those foundations to study a range of advanced topics In selecting topics to cover, I tried to identify topics and results that are unique to embedded computing I did include some background material from other disciplines to help set the stage for a discussion of embedded systems problems Here is a brief tour through the book: • Chapter provides some important background for the rest of the chapters It tries to define the set of topics that are at the center of embedded computing It looks at methodologies and design goals We survey models of computation, which serve as a frame of reference for the characteristics of applications The chapter also surveys several important applications that rely on embedded computing to provide background for some terminology that is used throughout the book • Chapter looks at several different styles of processors that are used in embedded systems We consider techniques for tuning the performance of a processor, such as voltage scaling, and the role of the processor memory hierarchy in embedded CPUs We look at techniques used to optimize embedded CPUs, such as code compression and bus encoding, and techniques for simulating processors • Chapter studies programs The back end of the compilation process, which helps determine the quality of the code, is the first topic We spend a great deal of time on memory system optimizations, since memory behavior is a prime determinant of both performance and energy consumption We consider performance analysis, including both simulation and worst-case Preface xix execution time analysis We also discuss how models of computing are reflected in programming models and languages • Chapter moves up to multiple-process systems We study and compare scheduling algorithms, including the interaction between language design and scheduling mechanisms We evaluate operating system architectures and the overhead incurred by the operating system We also consider methods for verifying the behavior of multiple process systems • Chapter concentrates on multiprocessor architectures We consider both tightly coupled multiprocessors and the physically distributed systems used in vehicles We describe architectures and their components: processors, memory, and networks We also look at methodologies for multiprocessor design • Chapter looks at software for multiprocessors and considers scheduling algorithms for them We also study middleware architectures for dynamic resource allocation in multiprocessors • Chapter concentrates on hardware and software co-design We study different models that have been used to characterize embedded applications and target architectures We cover a wide range of algorithms for co-synthesis and compare the models and assumptions used by these algorithms Hopefully this book covers at least most of the topics of interest to a practitioner and student of advanced embedded computing systems There were some topics for which I could find surprisingly little work in the literature: software testing for embedded systems is a prime example I tried to find representative articles about the major approaches to each problem I am sure that I have failed in many cases to adequately represent a particular problem, for which I apologize This book is about embedded computing; it touches on, but does not exhaustively cover, several related fields: • Applications—Embedded systems are designed to support applications such as multimedia, communications, and so on Chapter introduces some basic concepts about a few applications, because knowing something about the application domain is important An in-depth look at these fields is best left to others • VLSI—Although systems-on-chips are an important medium for embedded systems, they are not the only medium Automobiles, airplanes, and many other important systems are controlled by distributed embedded networks • Hybrid systems—The field of hybrid systems studies the interactions between continuous and discrete systems This is an important and interesting area, and many embedded systems can make use of hybrid system techniques, but hybrid systems deserve their own book XX Preface Software engineering—Software design is a rich field that provides critical foundations, but it leaves many questions specific to embedded computing unanswered I would like to thank a number of people who have helped me with this book: Brian Butler (Qualcomm), Robert P Adler (Intel), Alain Darte (CNRS), Babak Falsafi (CMU), Ran Ginosar (Technion), John Glossner (Sandbridge), Graham Hellestrand (VaSTSystems), Paolo lenne (EPFL), Masaharu Imai (Osaka University), Irwin Jacobs (Qualcomm), Axel Jantsch (KTH), Ahmed Jerraya (TMA), Lizy Kurian John (UT Austin), Christoph Kirsch (University of Salzburg), Phil Koopman (CMU), Haris Lekatsas (NEC), Pierre PauUn (ST Microelectronics), Laura Pozzi (University of Lugano), Chris Rowen (Tensilica), Rob Rutenbar (CMU), Deepu Talla (TI), Jiang Xu (Sandbridge), and Shengqi Yang (Princeton) I greatly appreciate the support, guidance, and encouragement given by my editor Nate McFadden, as well as the reviewers he worked with The review process has helped identify the proper role of this book, and Nate provided a steady stream of insightful thoughts and comments I'd also Uke to thank my longstanding editor at Morgan Kaufmann, Denise Penrose, who shepherded this book from the beginning I'd also like to express my appreciation to digital libraries, particularly those of the IEEE and ACM I am not sure that this book would have been possible without them If I had to find all the papers that I have studied in a bricks-andmortar library, I would have rubbery legs from walking through the stacks, tired eyes, and thousands of paper cuts With the help of digital libraries, I only have the tired eyes And for the patience of Nancy and Alec, my love Wayne Wolf Princeton, New Jersey Chapter Embedded Computing • Fundamental problems in embedded computing • Applications that make use of embedded computing • Design methodologies and system modeling for embedded systems • Models of computation • Reliability and security • Consumer electronics The Landscape of High-Performance Embedded Computing The overarching theme of this book is that many embedded computing systems are high-performance computing systems that must be carefully designed so that they meet stringent requirements Not only they require lots of computation, but they also must meet quantifiable goals: real-time performance, not just average performance; power/energy consumption; and cost The fact that it has quantifiable goals makes the design of embedded computing systems a very different experience than the design of general-purpose computing systems for which their users are unpredictable When trying to design computer systems to meet various sorts of quantifiable goals, we quickly come to the conclusion that no one system is best for all appUcations Different requirements lead to making different trade-offs between performance and power, hardware and software, and so on We must create different implementations to meet the needs of a family of applications Solutions should be programmable enough to make the design flexible and long-lived, but Chapter architectures, applications, methodologies architectures applications methodologies Embedded Computing need not provide unnecessary flexibility that would detract from meeting system requirements General-purpose computing systems separate the design of hardware and software, but in embedded computing systems we can simultaneously design the hardware and software Often, a problem can be solved by hardware means, software means, or a combination of the two Various solutions can have different trade-offs; the larger design space afforded by joint hardware/software design allows us to find better solutions to design problems As illustrated in Figure 1-1 the study of embedded system design properly takes into account three aspects of the field: architectures, applications, and methodologies Compared to the design of general-purpose computers, embedded computer designers rely much more heavily on both methodologies and basic knowledge of applications Let us consider these aspects one at a time Because embedded system designers work with both hardware and software, they must study architectures broadly speaking, including hardware, software, and the relationships between the two Hardware architecture problems can range from special-purpose hardware units as created by hardware/software codesign, microarchitectures for processors, multiprocessors, or networks of distributed processors Software architectures determine how we can take advantage of parallelism and nondeterminism to improve performance and lower cost Understanding your application is key to getting the most out of an embedded computing system We can use the characteristics of the application to optimize the design This can be an advantage that enables us to perform many powerful optimizations that would not be possible in a general-purpose system But it also means that we must have enough understanding of the application to take advantage of its characteristics and avoid creating problems for system implementers Methodologies play an especially important role in embedded computing Not only must we design many different types of embedded systems, but we Modeling Analysis and simulation • Performance, power, cost Synthesis Verification Figure 1-1 Aspects of embedded system design Hardware architectures • CPUs, co-design, multiprocessors, networks Software architectures • Processes, scheduling, allocation Characteristics Specifications Reference designs 1.1 Embedded Computing also must so reliably and predictably The cost of the design process itself is often a significant component of the total system cost Methodologies, which may combine tools and manual steps, codify our knowledge of how to design systems Methodologies help us make large and small design decisions The designers of general-purpose computers stick to a more narrowly defined hardware design methodology that uses standard benchmarks as inputs to tracing and simulation The changes to the processor are generally made by hand and may be the result of invention Embedded computing system designers need more complex methodologies because their system design encompasses both hardware and software The varying characteristics of embedded systems—system-on-chip for communications, automotive network, and so on— also push designers to tweak methodologies for their own purposes Steps in a methodology may be implemented as tools Analysis and simulation tools are widely used to evaluate cost, performance, and power consumption Synthesis tools create optimized implementations based on specifications Tools are particularly important in embedded computer design for two reasons First, we are designing an application-specific system, and we can use tools to help us understand the characteristics of the application Second, we are often pressed for time when designing an embedded system, and tools help us work faster and produce more predictable tools The design of embedded computing systems increasingly relies on a hierarmodeling chy of models Models have been used for many years in computer science to provide abstractions Abstractions for performance, energy consumption, and functionality are important Because embedded computing systems have complex functionality built on top of sophisticated platforms, designers must use a series of models to have some chance of successfully completing their system design Early stages of the design process need reasonably accurate simple models; later design stages need more sophisticated and accurate models Embedded computing makes use of several related disciplines; the two core embedded ones are real-time computing and hardware/software co-design The study of computing is multidisciplinary real-time systems predates the emergence of embedded computing as a discipline Real-time systems take a software-oriented view of how to design computers that complete computations in a timely fashion The scheduling techniques developed by the real-time systems conmiunity stand at the core of the body of techniques used to design embedded systems Hardware/software codesign emerged as a field at the dawn of the modem era of embedded computing Co-design takes a holistic view of the hardware and software used to perform deadline-oriented computations history of Figure 1-2 shows highlights in the development of embedded computing embedded We can see that computers were embedded early in the history of computing: computing * Many of the dates in this figure were found in Wikipedia; others are from http://www.motofuture.motorola.com and http://www.mvista.com Chapter Embedded Computing Applications CD/MP3 (late 1990s) Fly-by-wire (1950S-1960S) Cell phones (1973) Automotive engine control (1980) Techniques Rate-monotonic analysis (1973) RTOS (1980) Central Processing Units 1950 Data flow Synchronous languages languages (1987) (1991) HW/SW Statecharts co-design (1992) (1987) ACPI (1996) Motorola 68000 (1979) ARM (1983) Intel Intel MIPS 4004 8080 (1981) (1971) (1974) AT&T DSP-16 (1980) Whirlwind (1951) 1960 1970 1980 Flash MP3 player (1997) Portable video player (early 2000s) PowerPC (1991) Trimedia (mid-1990s) 1990 2000 2005 Figure 1-2 Highlights in the history of embedded computing one of the earliest computers, the MIT Whirlwind, was designed for artillery control As computer science and engineering solidified into a field, early research established basic techniques for real-time computing Some techniques used today in embedded computing were developed specifically for the problems of embedded systems while others, such as those in the following list, were adapted from general-purpose computing techniques • Low-power design began as primarily hardware-oriented but now encompasses both software and hardware techniques • Programming languages and compilers have provided tools, such as Java and highly optimized code generators, for embedded system designers • Operating systems provide not only schedulers but also file systems and other facilities that are now commonplace in high-performance embedded systems 1.2 Example Applications Networks are used to create distributed real-time control systems for vehicles and many other applications, as well as to create Internet-enabled appliances Security and reliability are an increasingly important aspect of embedded system design VLSI components are becoming less reliable at extremely fine geometries while reliability requirements become more stringent Security threats once restricted to general-purpose systems now loom over embedded systems as well Example Applications Some knowledge of the applications that will run on an embedded system is of great help to system designers This section looks at several basic concepts in three common applications: communications/networking, multimedia, and vehicles 1.2.1 Radio and Networking Modem communications systems combine wireless and networking As illuscombined wireless/network trated in Figure 1-3 radios carry digital information and are used to connect to communications networks Those networks may be specialized, as in traditional cell phones, but increasingly radios are used as the physical layer in Internet protocol systems networking The Open Systems Interconnection (OSI) model [Sta97a] of the International Standards Organization (ISO) defines the following model for network services Physical layer—The electrical and physical connection Data link layer—Access and error control across a single link Network layer—Basic end-to-end service Transport layer—Connection-oriented services Session layer—Control activities such as checkpointing Presentation layer—Data exchange formats Application layer—The interface between the application and the network Although it may seem that embedded systems are too simple to require use of the OSI model, it is in fact quite useful Even relatively simple embedded networks provide physical, data link, and network services An increasing number Index networks, operating systems, reliability, 47-51 security, Embedded file systems, 255-56 Embedded multiprocessors, 268, 269-75 constraints, 269 flexibility and efficiency, 275 performance and energy, 272-74 requirements, 270-71 specialization and, 274-75 See also Multiprocessors Energy-aware scheduling, 421 Energy model, 91-92 Energy/power consumption, 21 ENSEMBLE, 369 Error-correction codes, 7, 50 Esterel, 210-11 compiling, 210-11 defined, 210 example program, 210 Event-driven state machine, 207-8 Event function model, 354 Events, 20, 353 input, 354 models, 353-54 output, 354 output timing, 354-55 sporadic, 354 Faults actions after, 50 design, 48 operational, 48 physical, 48 sources, 48 Feasibility factor, 417 as constraint, 418 as objective genetic algorithms, 418 Field-based encoding, 106 Fine-grained scheduling, 371 Finite-state machines (FSMs), 35 asynchronous, 35 507 communication, 43-44 defined, 35 nondeterministic, 36-37 streams, 35-36 verification and, 36 Finite versus infinite state, 34-38 Firing rules, 214 First-come-first-served (FCFS) scheduling, 389-90 First-reaches table, 203 Flash-based memory, 256-57 NAND, 256 NOR, 256 virtual mapping, 257 wear leveling, 257 FlexRay, 316-24 active stars, 318-19 arbitration grid, 317 block diagram, 317 bus guardians, 316, 323 clock synchronization procedure, 324 communication cycle, 318 controller host interface, 323 defined, 316 dynamic segments, 321, 322 frame encoding, 320 frame fields, 320-21 frame format, 321 levels of abstraction, 318 macrotick, 316-17 microtick, 316 network stack, 318 physical layer, 319-20 static segments, 321-22 system startup, 322-23 timekeeping, 323 timing, 316-17 FlexWare compiler, 163, 164, 170 Floorplanning, 426 Flow control, interconnection networks, 296 Flow facts, 193, 194 508 Index Force-directed scheduling, 390-92 calculation, 391 defined, 390 distribution graph, 390, 391 forces, 391 predecessor/successor forces, 392 self forces, 392 See also Multiprocessor scheduling Forward channel, FTL-LITE, 258 General-purpose computing systems, 362 Genetic algorithms, 418 Giotto, 245-47 actuator ports, 245 defined, 245 execution cycle, 246-47 mode switch, 246 program implementation, 247 sensor ports, 245 switch frequency, 246 target mode, 246 Global Criticality/Local Phase (GCLP) algorithm, 408 Global optimizations, 174-76 Global slack, 406 Global time service, 364 GMRS, 375 GOPS, 417 Gordian knot, 407 Guards, 398 H.26x standard, 13 H.264 standard, 13, 302-4 Halting problem, 38 Hardware abstractions, 330-32 architectures, design methodologies, 25-26 event handlers, 20 resources, tabular representation, 396 Hardware abstraction layer (HAL), 367 Hardware radio, Hardware/software co-design, 26, 27, 32, 383-430 algorithms, 396-428 AnnapoUs Micro Systems WILDSTARII Pro, 386-87 ARM Integrator logic module, 386 custom integrated circuit, 385 custom-printed circuit board, 384 high-level synthesis, 387-92 memory management, 425 memory systems, 422-25 as methodology, 384 multi-objective optimization, 416-21 PC-based system, 384 performance analysis, 387-96 platforms, 384-87 Xiilnx Virtex-4 FX platform FPGA family, 385-86 Hardware/software co-simulation, 428-29 backplanes, 428 co-simulators, 428-29 with simulation backplane, 429 Hardware/software partitioning, 400 Hazard function, 49 Heterogeneously programmed systems, 34 Heterogeneous memory systems, 307-9 Heterogeneous multiprocessors, 274 in co-design, 385 problems, 329, 338 See also Multiprocessors Hierarchical co-synthesis, 414 High-level services, 58-60 High-level synthesis, 387-92 behavioral description, 387 control step, 389 cost reduction using, 406-7 critical-path scheduling, 390 Index for estimates, 411 FCFS scheduling, 389-90 force-directed scheduling, 390-92 goals, 387 list scheduling, 390 path-based scheduling, 392 register-transfer implementation, 388 Hot swapping, 234 HP DesignJet printer, 308 Huffman coding, 11, 12 defined, 101 illustrated, 101 Hunter/Ready OS, 247 IBM CodePack, 100 IBM Coral tool, 327 Ideal parallelism, 171 Index rewriting, 172 Instruction caching, 193-95 Instruction formation, 145 Instruction issue width, 68 Instruction-level paralleUsm, 45, 84 Instructions in combination, 147 large, 148 models for, 157-60 performance from, 187 scheduUng, 68 template generation, 145 template size versus utilization, 147 Instruction scheduling, 131, 163-66 Araujo and Malik algorithm, 166 constraint modehng, 164-65 defined, 157 Instruction selection, 157-60 defined, 156 as template matching, 158 See also Code generation Instruction set design space, 143 metrics, 144 search algorithms, 145 509 style, 68 synthesis, 143-50 Integer linear programming (ILP), 188 abstract interpretation and, 196 path analysis by, 188 Intel Flash File System, 258-59 mote, 18-19 XScale, 88 Interactive data language (IDL), 363 Interconnection networks, 289-304 application-specific, 295-96 area, 289 buses, 292-93 Clos, 294 crossbars, 293-94 energy consumption, 289 flow control, 296 H.264 decoder, 302-4 latency, 289 link characteristics, 290 mesh, 294-95 metrics, 289 models, 290-92 NetChip, 302 NoCs, 296-304 Nostrum, 297-98 Poisson model, 292 QNoC, 301-2 routing, 296 Sonics SiliconBackplane III, 304 SPIN, 298 throughput, 289 TIMcBSP,291 topologies, 289, 292-96 xpipes, 302 See also Multiprocessors Inter-event stream context, 356 Interface co-synthesis, 422 Intermodule connection graph, 343-44 Internet, Internet Protocol (IP), Internetworking standard, 510 Index Interprocess communication (IPC) mechanisms, 254 Interprocessor communication modeling graph, 344, 345 Interrupt-oriented languages, 199-200 methodologies, 200 NDL, 199 video drivers, 199 Interrupt service routine (ISR), 250 Interrupt service thread (1ST), 250 Interval scheduling, 228 Intra-event stream context, 358 Iterative improvement schedulers, 224,409-10 Java, 211-14 bytecodes, 211-12 JIT, 212-13 memory management, 213-14 mnemonics, 212 Java Virtual Machine (JVM), 211 JETTY, 311 Jini, 58 Jitter event model, 353-54 Joint Tactical Radio System (JTRS), 10 Joumaling, 258 Joumaling Flash File System (JFFS), 258 JPEG standard, 11 DCT and, 12 JPEG 2000, 12 Just-in-time (JIT) compilers, 212 Kahn processes, 40-41 defined, 40 illustrated, 41 network, 41, 214 Latency service, 364 L-block, 189 Least common multiple (LCM), 348 Least-laxity first (LLF) scheduUng, 232 Lempel-Ziv coding, 116, 117 Lempel-Ziv-Welch (LZW) algorithm, 116 Lifetime cost, 22 Limited-precision arithmetic, 149-50 Linear-time temporal logic, 216, 259 Link access control (LAC), 10 LISA system, 140-42 hardware generation, 141-42 sample modeling code, 141 See also Configurable processors List scheduling, 227-28, 390 Load balancing, 360-61 algorithm, 361 defined, 360 Load threshold estimator, 358 Local slack, 406 Logical link control and adaptation protocol (L2CAP), 54, 55 Log-structured file systems, 257-58 Loop conflict factor, 181 fusion, 174 iterations, 191-92 nests, 171 padding, 172 permutation, 172, 173 splitting, 172 tiling, 172 unrolling, 172 Loop-carried dependencies, 171 Looped containers, 375 Loop transformations, 171-74 buffering and, 177 matrices and, 173 non-unimodular, 173-74 types of, 172 Low-density parity check (LDPC), Low-power bus encoding, 117-22 Low-power design, Lucent Daytona multiprocessor, 283-84 LUSTRE, 203-4 Index LYCOS, 404 BSBs, 406 partitioning, 406 profiling tools, 404 Macroblocks, 13 Macrotick, 316-17, 323 Mailboxes, 254 Main memory-oriented optimizations, 182-85 Markov models, 109-10 arithmetic coding and, 108-9 for conditional character probabilities, 110 defined, 108 of instructions, 110 uses, 109 Maximum distance function, 354 Maximum mutator utilization (MMU), 213 Mealy machine, 35 Mean time to failure (MTTF), 48 MediaBench suite, 84-85, 128 Medium access control (MAC), 10 Memory area model, 91 arrays, 94-95 banked, 182, 184-85 block structure, 90 bottleneck, 170 caches, 95-98 cells, 90 component models, 89-95 consistency, 311 delay model, 91 energy model, 91-92 flash-based, 256-57 hierarchy, 89-99 layout transformations, 174 long-term, 403 multiport, 92 MXT system, 116 non-RAM components, 175 paged, 182 511 register files, 95, 96 scratch pad, 98-99, 180-82 short-term, 403 system optimizations, 170 Memory management embedded operating systems, 248^9 hardware/software co-design, 425 Windows CE, 348-49 Memory-oriented optimizations, 170-85 buffer, data transfer, and storage management, 176-78 cache-/scratch pad, 178-82 global, 174-76 loop transformations, 171-74 main, 182-85 Memory systems, 304-12 average access rate, 305 consistent parallel, 309-12 hardware/software co-design, 422-25 heterogeneous, 307-9 models, 306-7 multiple-bank, 304-5 parallel, 304-5 peak access rate, 305 power consumption, 308 power/energy, 308-9 real-time performance, 307-8 See also Multiprocessors Mentor Graphics Seamless system, 429 Mesh networks, 294-95 MESH simulator, 378 Message nodes, Metagenerators, 217 Metamodels, 216-17 Methodologies, 2-3 design, 3, 22-33 interrupt-oriented languages, 200 multiprocessor design, 276-77 standards-based design, 28-30 step implementation as tools, 512 Index Metropolis, 215 Microarchitecture-modeling simulators, 131-32 Microtick, 316 Middleware, 361-75 defined, 362 group protocol, 55 MPI, 366 for multiparadigm scheduling, 372 resource allocation, 362 SoC, 366-70 Minimum distance function, 354 Mobile supercomputing, 272 Modal processes, 422 Model-integrated computing, 216 Models of computation, 33^6 control flow versus data flow, 34, 38-41 defined, 33 finite versus infinite state, 34-38 reasons for studying, 33-34 sequential versus parallelism, 34, 41-46 MOGAC, 419-20 components, 419 constraints, 420 genetic algorithm, 419-20 genetic model, 419 optimization procedure, 420 Moore machine, 35 Motion compensation, 13 estimation, 13,14 vectors, 14 MP3 standard, 15 MPEG standards, 13, 14, 30 MPI (Multiprocessor Interface), 366 Multiplex, 368-69 Multihop routing, 18 Multimedia algorithms, 85 applications, 11-15 Multi-objective optimization, 416-21 Multiparadigm scheduling, 371 Multiple-instruction, multiple-data (MIMD), 68 Multiple-instruction, single-data (MISD), 68 Multiplexers, 389 Multiport memory, 92 Multiprocessing accelerators and, 274 real time and, 274 uniprocessing versus, 274 Multiprocessors, 267-333 architectures, 279-88 ARM MPCore, 311-12 connectivity graph, 399 core-based strategy, 326-27 design methodologies, 326-32 design techniques, 275-79 embedded, 268, 269-75 generic, 268 heterogeneous, 274, 329, 338 interconnection networks, 289-304 Lucent Daytona, 283-84 memory systems, 304-12 modeling and simulation, 278-79 MPSoC, 279 PES, 267, 288 Philips Nexperia, 281-83 Qualcomm MSM5100, 280-81 scheduling, 339 simulation as parallel computing, 278-79 specialization and, 274-75 STMicroelectronics Nomadik, 284-86 subsystems, 267 TIOMAP, 286-88 Multiprocessor scheduling, 342-58 AND activation, 355-56 communication and, 341 contextual analysis, 358 cyclic task dependencies, 358 data dependencies and, 347-48 delay estimation algorithm, 349-51 Index distributed software synthesis, 358-59 with dynamic tasks, 359-61 event model, 353-54 event-oriented analysis, 353 intermodule connection graph, 343-44 interprocessor conmiunication modeling (IPC) graph, 344, 345 limited information, 340^1 models, 343 network flow, 343-44 NP-complete, 342-43 OR activation, 356-58 output event timing, 354-55 phase constraints, 351-53 preemption and, 347 static scheduling algorithm, 349 system timing analysis, 355 task activation, 355 See also Scheduling Multiprocessor software, 337-79 design verification, 376-78 embedded, 337-39 master/slave, 340 middleware, 361-75 PE kernel, 340 quality-of-service (QoS), 370-75 real-time operating systems, 339-61 role, 339-42 Multiprocessor system-on-chip (MPSoC), 279 Multitasking caches and, 239 scratch pads and, 240-41 Multithreading, 68 Mutators, 213 MXT memory system, 116 Myopic algorithm, 360 NAND memory, 256 NDL, 199 513 NetChip, 302 Network design, 32 Networked consumer devices, 56-57 Network layer, Networks, ad hoc, aircraft, 324-25 interconnection, 289-304 personal area, 54 physically distributed, 312-25 wireless, Networks-on-chips (NoCs), 296-304 defined, 297 design, 300 energy modeling, 299-300 Nostrum, 297-98 OCCN, 300-301 QoS and, 375 QoS-sensitive design, 300 services, 370 SPIN, 298 Nimble, 426-27 Nonblocking communication, 45 Nonconvex operator graph, 147 NOR memory, 256 Nostrum network, 297-98 resources, 298 stack, 370 Notification service, 374-75 Object Constraint Language (OCL), 217 Object request broker (ORB), 363 OCCN, 300-301 Open Systems Interconnection (OSI) model, Operating systems, 4, 223-64 design, 247-59 embedded, 248-49 Hunter/Ready, 247 IPC mechanisms, 254 memory management, 248-49 multiprocessor, 339-61 514 Index Operating systems (Cont'd.) overhead, 251-52 power management, 255 real-time (RTOSs), 223, 247-48 scheduling support, 253-54 simulation, 251 TIOMAP, 341^2 Operational faults, 48 OR activation, 356-58 illustrated, 357 jitter, 357-58 period, 356 Ordered Boolean decision diagrams (OBDDs), 36 Output line energy, 93 Paged addressing mechanisms, 183-84 Page description language, 308 Paged memories, 182 Parallel execution mechanisms, 77-86 processor resource utilization, 83-86 superscalar processors, 80 thread-level parallelism, 82-83 vector processors, 80-81 VLIW processors, 77-80 ParalleUsm, ^ architecture and, 42 communication and, 41-45 data-level, 45 defined, 41 ideal, 171 instruction-level, 45 Petri net, 42-43 subword, 81 task graphs, 42 task-level, 46 thread-level, 82-83 Parametric timing analysis, 192 Pareto optimality, 417 Path analysis, 186, 187, 188-90 cache behavior and, 189 bylLP, 188 user constraints, 189-90 Path-based estimation, 393, 394 Path-based scheduling, 392 Path ratio, 85 Paths cache behavior and, 189 crossing critical, 197 execution, 186 Path timing, 186, 190-97 abstract interpretation, 195-96 clustered analysis, 192-93 instruction caching, 193-95 loop iterations, 191-92 parametric analysis, 192 simulation-based analysis, 196-97 PC sampling, 130 PEAS III, 142-43 compiler generator, 163 defined, 142 hardware synthesis, 143 model of pipeline stage, 142 VHDL models, 143 Perceptual coding, 11 Performance average, 66 compression, 114 embedded microprocessors, 272-74 hardware/software co-design, 387-96 indices, 215 peak, 66 processor, 66-67 worst-case, 67 Periodic admissible sequential schedule (PASS), 200 Personal area networks, 54 Petri nets, 42-43, 242 behavior, 43 defined, 42 illustrated, 43 maximal expansion/cut-off markings, 243 Index Phase constraints, 351-53 Philips Nexperia, 281-83 Physical faults, 48 Physical layer Bluetooth, 54 defined, RexRay, 319-20 Platform-based design, 26-28 defined, 26 illustrated, 27 phases, 28 platform programming, 28 two-stage process, 26 Platform-dependent characteristics, 277 Platform-independent measurements, 277 Platform-independent optimizations, 277 Poisson model, 292 Polyhedral reduced dependence graphs (PRDGs), 206 Polytope model, 172 Post-assembly optimizations, 167 Post-cache decompression, 106-7 Power attacks, 125 countermeasures, 126 defined, 53 See also Attacks Power management, 370 Power simulators, 131-32 Predecessor/successor forces, 392 Preferred lists, 361 Prefetching, 178 Presentation layer, Priority ceiling protocol, 233-34 Priority inheritance protocols, 233 Priority inversion, 176, 232-33 Priority schedulers, 225 dynamic priorities, 230 static priorities, 230 Priority service, 364 Procedure cache, 113 Procedure-splitting algorithm, 169 Processes completion time, 225 concurrent execution, 259 critical, 227 deadlock, 259, 260 defined, 224 execution, 225 execution time, 225 initiation time, 225 modal, 422 real-time scheduling, 224-41 response time, 225 scheduling, 68 slowdown factor, 235 specifications, 226 Processing elements (PEs), 205 defined, 267 design methodology, 288 kernel, 340 multiprocessor, 267, 288 See also Multiprocessors Process migration, 360 Processors comparing, 66-69 cost, 67 customization, 133 DSPs, 71-76 embedded, 68-69 energy, 67 evaluating, 66-67 general-purpose, 68-69 instruction issue width, 68 instruction set style, 68 memory hierarchy, 89-99 MIMD, 68 MISD, 68 performance, 66-67 power, 67 predictability, 67 resource utilization, 83-86 RISC, 67, 69-71 SIMD, 67 superscalar, 80 taxonomy, 67-68 515 516 Index Processors (Cont'd.) vector, 80-82 VLIW, 77-80 Procrastination scheduling, 238-39 Product machines, 36 Programming environments, 169-70 models, 197-218 Program performance analysis, 185-97 average performance, 185 BCET, 185-86 challenges, 186 measures, 185 models, 187-88 WCET, 185-86 Programs, 155-219 code generation, 156-70 defined, 155 execution paths through, 186 flow statements, 188, 189 memory-oriented optimizations, 170-85 models of, 197-218,259 representations, 397-98 Property managers, 374 Protocol data units, 301 Ptolemy II, 215 QNoC, 301-2, 375 Qualcomm MSM5100, 280-81 Quality descriptive language, 374 Quality-of-service (QoS), 370-75 attacks, 53 CORBA and, 374 management as control, 373 model, 371 NoCs and, 375 notification service, 374-75 resources, 371 services, 370-75 Quenya model, 404-5 Radio and networking application, 5-10 RATAN process model, 346 Rate-monotonic analysis (RMA), 230 Rate-monotonic scheduling (RMS), 230 critical instant, 231 priority assignment, 230-31 utilization, 232-33 Razor latch, 88, 89 Reachability, 36 Reactive systems, 198 Real-Time Connection Ordination Protocol, 365 Real-time daemon, 364 Real-time event service, 365 Real-time operating systems (RTOSs), 223, 247-48 interrupts and scheduling, 250 ISRs/ISTs, 250 multiprocessor, 339-61 structure, 250-51 See also Operating systems Real-time primary-backup, 365 Real-time process scheduling, 224-41 algorithms, 227-34 for dynamic voltage scaling, 234-39 performance estimation, 239-41 preliminaries, 224-26 Real-Time Specification for Java (RTSJ), 174-75 Reconfigurable systems CORDS, 426 co-synthesis for, 425-28 defined, 425 Nimble, 426-27 Redundant active stars, 319, 320 Reference implementation, 29 Register allocation, 160-63 cliques, 161-62 conflict graph, 160, 161 defined, 156 graph coloring, 162 Index illustrated, 160 See also Code generation Register files, 95, 96 defined, 95 parameters, 95 size, 96 VLIW, 162-63 Register liveness, 160 Register-transfer implementation, 332 Relative computational load, 404 Reliability, 46, 47-51 demand for, 47 function, 49 methods, 50 system design fundamentals, 48-51 Resource allocation middleware and, 362 multiprocessor software and, 339 Resources defined, 298 dependencies, 227 QoS, 370 utilization, 83-86 Response time theorem, 348-49 Reverse channel, Rewriting rules, 159 RFCOMM, 55 Ripple scheduling, 375 RISC processors, 69-71 architecture, 69 ARM family, 70 defined, 67 embedded versus, 69 MIPS architecture, 70 PowerPC family, 70 Routing interconnection networks, 296 store-and-forward, 296 virtual cut-through, 296 wormhole, 296 RT-CORBA, 363-65 RTM, 253-54 RTU, 253 SAFE-OPS, 124 Safety, 46 Sandblaster processor, 82-83 Scalar variable placement, 178-79 Schedulers constructive, 224, 227 dynamic, 225 instruction, 131 iterative improvement, 224 Ust, 227-28 priority, 225 Spring, 253 static, 224 Schedules defined, 224 failure rate, 252 SDF, 202 single-appearance, 201, 202 unrolled, 348 Scheduling AFAP, 392 caches and, 240 checkpoint-driven, 237-38 critical-path, 390 data dependencies and, 227 deadline-driven, 232 for DVS, 234-39 dynamic, 68, 224 dynamic task, 360 energy-aware, 421 FCFS, 389-90 fine-grained, 371-72 force-directed, 390-92 hard, 225 instructions, 68 interval, 228 languages and, 241-47 least-laxity first (LLF), 232 Ust, 390 multiparadigm, 371 multiprocessor, 339, 340-41, 342-58 OS support, 253-54 processor, 68 517 518 Index Scheduling (Cont'd.) procrastination, 238-39 rate-monotonic (RMS), 230-32 real-time process, 224-41 ripple, 375 slack-based, 236-37 soft, 225 static, 68, 224 Scratch pads, 98-99 allocation algorithm, 183 evaluation, 182 management, 180-81 multitasking and, 240-41 performance comparison, 184 See also Memory Search metrics, 360 Security, 5, 46, 122-26 Self forces, 392 Self-programmable one-chip microcomputer (SPOM) architecture, 123 Sensor networks, 18-21 Intel mote, 18-19 TinyOS, 19-20 ZebraNet, 20-21 Serra system, 407 Service records, 55 Services ARMADA, 365 ENSEMBLE, 369 Multiplex, 368-69 NoC, 370 Nostrum, 370 ORB, 363 QoS, 370-75 RT-CORBA, 363-65 SoC, 366-70 standards-based, 363-66 Session layer, Set-associative caches, 97 S-graphs, 241, 422 SHIM, 247 Short-term memory, 403 Side channel attacks, 123 SIGNAL, 204-5 Signal flow graph (SPG), 40 Signals analysis, 204-5 composition, 204 defined, 204 Simple event model, 353 Simple power analysis, 53 SimpleScalar, 131 Simulated annealing, 403 Simulation CPU, 126-32 direct execution, 130-31 multiprocessor, 277-79 operating systems, 251 Simulation-based timing analysis, 196 Simulators communicating, 278 heterogeneous, 279 MESH, 378 VastSystems CoMET, 376-77 Simulink, 217-18 Single-appearance schedules, 201, 202 Single-instruction, multiple data (SIMD), 67-68 Sink s e c , 346 Slack-based scheduling, 236-37 Slave threads, 403 Slowdown factor, 235 Smart cards, 123 SmartMIPS, 124 Snooping caches, 310 Software abstractions, 330-32 architectures, multiprocessor, 337-79 performance analysis, 32 tool generation, 32 verification, 32 Software-based decomposition, 111-13 Software-controlled radio, Index Software-defined ratio (SDR), 6-7 ideal, tiers, ultimate, Software radio, Software thread integration (STI), 242 Sonics SiliconBackplane III, 304 Source SCC, 346 Sparse time model, 314 SPECInt benchmark set, 127-28 SpecSyn, 408-9 SPIN, 261, 298 Spiral model, 24-25 Spring scheduler, 253 Standards-based design methodologies, 28-30 design tasks, 29-30 pros/cons, 28-29 reference implementation, 29 Standards-based services, 363-66 Starcore SC140 VLIW core, 80 Statecharts, 208-10 hierarchy tree, 209 interpretation, 209 STATEMATE, 208-9 variations, 208 verification, 209-10 STATEMATE, 208-9 Static buffers, 203 Static scheduling, 68 algorithms, 227 defined, 224 implementation, 227 multiprocessor, 349 See also Scheduling STMicroelectronics Nomadik multiprocessor, 284-86 Store-and-forward routing, 296 Streams, 35-36, 214 Strongly connected component (SCC), 344 sink, 346 source, 346 Subtasks, 224 519 Subword parallelism, 81 Superscalar processors, 80 Symmetric multiprocessing (SMP), 368-69 SymTA/S, 353 Synchronous data flow (SDF), 40, 200, 201 buffer management, 202 graphs, 200, 201 Synchronous languages, 198 as deterministic, 198 rules, 198 SystemC, 279 System-level design flow, 329-30 System-on-chip (SoC), 279 services, 366-70 template, 326 System timing analysis, 355 Task graphs, 42, 398 COSYN, 412 large, 411 Task Graphs for Free (TGFF), 398 Task-level parallelism, 46 Tasks activation, 355 assertion, 415 compare, 415 defined, 224 dynamic, 359-61 migration, 360 Technology library, 389 Template-driven synthesis algorithms, 400-407 COSYMA, 401 CoWare, 402-3 hardware/software partitioning, 400 LYCOS, 404 Quenya model, 404-5 Vulcan, 401 See also Co-synthesis algorithms Template-matching algorithm, 159 Temporal logic, 259-60 Tensilica Xpres compiler, 148 520 Index Tensilica Xtensa, 135-38 core customization, 135 defined, 135 features, 136 See also Configurable processors Testing, 31 Texas Instruments C5x DSP family, 72-74 C6x VLIW DSP, 79 C55x co-processor, 75-76 McBSP, 291 OMAP multiprocessor, 286-88, 341-42 Thread-level parallelism, 82-83 Thread pool, 363, 364 Threads CDGs, 242-45 defined, 224 integrating, 244 primary, 242 secondary, 242 slave, 403 time-loop, 403 Throughput factors, 417 Timed distributed method invocation (TDMI), 364 Time-loop threads, 403 Time quantum, 224 Time-triggered architecture (TTA), 313-16 cliques, 316 communications controller, 315 communications network interface, 314,315 defined, 313 host node, 315 interconnection topologies, 316 sparse time model, 314 timestamp, 313 topologies, 315 Timing accidents, 188 attacks, 53 path, 186, 190-97 penalty, 188 simulation-based analysis, 196 TinyOS, 19-20 Tokens, 42 Token-triggered threading, 83 Toshiba MeP core, 138-40 Total conflict factor, 181 Trace-based analysis, 129-30 Traffic models, 291 Transition variables, 120 Transmission start sequence, 320 Transport group protocols, 54 Transport layer, Triple modular redundancy, 50, 51 Turbo codes, Turing machine, 37-38 defined, 37 halting problem, 38 illustrated, 37 operating cycle, 38 Twig code generator, 158 Ultimate software radio, Unbuffered communication, 43 Unified Modeling Language (UML), 217 UNITY, 398 Unrolled schedules, 348 User-custom instruction (UCI), 140 Utilization CPU, 226 processor resource, 83-86 RMS, 231-32 Validation, 31 Variable-length codewords, 105-6 Variable-length coding, 11,12 Variable Hfetime chart, 160, 161 Variable-performance CPUs, 86-89 better-than-worst-case design, 88-89 dynamic voltage and frequency scaling (DVFS), 86-88 Variable-to-fixed encoding 111, 112 Index VastSystems CoMET simulator, 376-77 Vector processing, 68 Vector processors, 80-81 Vehicle control/operation, 15-18 harnesses, 16 microprocessors, 16 safety-critical systems, 16 specialized networks, 16-17 X-by-wire, 17 See also Applications Verification, 31,259-63 finite state and, 36 multiprocessor design, 376-78 software, 32 statecharts, 209-10 techniques, 31 Very long instruction word (VLIW) processors, 77-80 defined, 77 register files, 162-63 split register files, 78 Starcore SC140 VLIW core, 80 structure, 77 TI C6x DSP, 79 uses, 78 See also Processors Video cameras, computation in, 271 Video compression standards, 13 Video drivers, 199 Video encoding standards, 13 Virtual channel flow control, 296 Virtual components, 327 Virtual cut-through routing, 296 Virtual mapping, 257 Virtual Socket Interface Alliance (VISA), 326 Virtual-to-real synthesis, 328 Voting schemes, 50 Vulcan, 401 521 Watchdog timers, 50-51 Waterfall model, 24 Wavelets, 12 Wear leaving, 257 WiFi, 56 Windows CE memory management, 248-49 scheduling and interrupts, 250-51 Windows Media Rights Manager, 5960 Wireless co-synthesis, 421 data, Wolfe and Chanin architecture, 102 architecture performance, 103-4 efficiency comparison, 103 Word Hne energy, 92 Working-zone encoding, 119 Workload, 276 Wormhole routing, 296 Worst-case execution time (WCET), 185-86 Wrappers, 327-29 X-by-wire, 17 Xiilnx Virtex-4 FX platform FPGA family, 385-86 Xpipes, 302 Yet Another Flash Filing System (YAFFS), 258 ZebraNet, 20-21 ... Consumer electronics The Landscape of High- Performance Embedded Computing The overarching theme of this book is that many embedded computing systems are high- performance computing systems that must... problems in embedded computing • Applications that make use of embedded computing • Design methodologies and system modeling for embedded systems • Models of computation • Reliability and security... enough to make the design flexible and long-lived, but Chapter architectures, applications, methodologies architectures applications methodologies Embedded Computing need not provide unnecessary

Ngày đăng: 20/03/2019, 10:32

Xem thêm: Morgan kaufmann high performance embedded computing architectures applications and methodologies sep 2006 ISBN 012369485x pdf

Morgan kaufmann high performance embedded computing architectures applications and methodologies sep 2006 ISBN 012369485x pdf

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan