Core C++ A Software Engineering Approach phần 3 pps

file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm #include <cstring> using namespace std; union StreetOrPOB { char street[30]; // alternative interpretations long int POB; } ; struct Address { char first[30]; int kind; // 0: street address; 1: P.O.B. StreetOrPOB second; // either one or another meaning char third[30]; } ; int main () { Address a1, a2; strcpy(a1.first,"Doe, John"); // address with street strcpy(a1.second.street,"15 Oak Street"); a1.kind = 0; strcpy(a1.third,"Anytown, MA 02445"); strcpy(a2.first,"King, Amy"); a2.second.POB = 761; a2.kind = 1; // address with POB strcpy(a2.third,"Anytown, MA 02445"); cout << a1.first << endl; if (a1.kind == 0) // check data interpretation cout << a1.second.street << endl; else cout << "P.O.B. " << a1.second.POB << endl; cout << a1.third << endl; cout << endl; cout << a2.first << endl; if (a2.kind == 0) // check data interpretation cout << a2.second.street << endl; else cout << "P.O.B. " << a2.second.POB << endl; cout << a2.third << endl; return 0; } This is nice, but it introduces yet another level into the hierarchical structure of types. As a result, the programmer has to use names like a1.second.street, and this is no fun. Meanwhile, the only use of type StreetOrPOB in the program is with type Address. To remedy this, C++ supports anonymous unions. They have no name, and no variable of this type can be defined; however, their fields can be used without any qualification. For example, we can define type Address without using type StreetOrPOB but using an anonymous union instead. struct Address file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (241 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm { char first[30]; int kind; // 0: street address; 1: P.O.B. union { char street[30]; long int POB; } ; // no 'second' field of type StreetOrPOB char third[30]; } ; The union type is gone, but type Address now has two alternative fields, street[] and POB, and they can be referred to by name as in any other field. Of course, it remains the responsibility of the programmer to know which one is which. Remember the story about the bagel and cream cheese? Data must be retrieved consistently with the way they were set. But the extra level of hierarchical notation is not needed anymore. if (a1.kind == 0) // check data interpretation cout << a1.street << endl; // use one interpretation else cout << "P.O.B. " << a1.POB << endl; // or use another one This is a powerful programming style. However, the maintenance programmer has to spend extra time and effort to understand the code. There are extra conditional statements that increase the complexity of code. Presumably, the use of inheritance with virtual functions is good competition for this programming technique. We will discuss it later. Enumerations Enumeration types allow the programmer to define variables that accept values only from a defined set of identifiers. Usually, we introduce integer symbolic constants (using either #define or const definitions) and set up conventions for using them. For example, to emulate the behavior of a traffic light, we need the values that denote the red, green, and yellow colors of the light. Similar to the example with the days of the week, we can introduce character arrays "red", "green", and "yellow" and do assignments and comparisons using string manipulation library functions. char light[7] = { "green" }; // it is green initially if (strcmp(light, "green") == 0) // next it is yellow strcpy(light, "yellow"); // and so on This is nice and clear, and the maintenance programmer will have little trouble understanding the intent of the code designer, but these string operations are unnecessarily slow. You do not want to move a lot of characters around (searching for the terminator inside the library functions) just to file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (242 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm trace the state of the traffic light. Another drawback of this solution is the lack of protection. If somebody wants to make the light pink or magenta, there is no way to stop the programmer from doing so. Another solution is to use integers to denote colors with numbers. I can assign 0 to green, 1 to red, and 2 to yellow. Notice how I introduced these values¡X0, 1, and 2, not 1, 2, and 3. This is what dealing with C++ arrays and indices does to the way people think. When a C (or C++) programmer counts people in the room, he or she says: "Zero, one, two, three, four, five, six, seven, eight, nine; OK, there are ten people in the room." With this approach you avoid using the string manipulation functions. int light = 0; // it is green initially if (light == 0) // next it is yellow light = 2; // and so on The advantage of this approach is speed. This is the only advantage. This type of coding always requires comments, especially for complex algorithms with more-complicated systems of states and transitions between the states. If the comments are too cryptic or somewhat obsolete, the transmission of the designer's knowledge to the maintainer is not facilitated, to say the least. One of the ways to make code more readable while keeping it fast is the use of symbolic constants. We can define symbolic constants whose names are appropriate for the application, for example, RED, GREEN, and YELLOW, and assign a special integer value to each constant. const int RED=0, GREEN=1, YELLOW=2; // color constants Now you can rewrite the example above using these constants. The code is as fast as in the previous example and as clear as the original version with character strings. int light = GREEN; // it is green initially if (light == GREEN) // next it is yellow light = YELLOW; // and so on This solution does protect your steak from falling on the floor to begin with. But it does not protect your code from deterioration in the course of maintenance. If maintainers (or the original designer in the crunch) want to use numbers instead of symbolic constants, it is not a syntax error. If they assign to variable light a value that is outside of the agreed upon range of color values (e.g., file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (243 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm light=42;), it is not a syntax error either. You can add these values (e.g., RED+GREEN), and do all kinds of things you do not actually do to colors. The enumerations are introduced into the language to deal with these kinds of problems. The programmer can define a programmer-defined type and explicitly enumerate all legal values that a variable of that type is allowed to assume. The keyword enum is used to introduce the programmer- defined type name (e.g., Color) similar to the way the keyword struct (or union) introduces programmer-defined types. The braces (followed by the semicolon) follow the type name, again, similar to struct or union. In the braces, the designer lists all values allowed for the type being defined. Often, the programmers use uppercase (similar to constants introduced by #define and by const), but this is not mandatory. For our example, we can define type Color as the enum type. enum Color { RED, GREEN, YELLOW } ; // Color is a type Now we can use type Color to define variables that can only accept values RED, GREEN, and YELLOW. These values are enumeration constants¡Xthey can be used as rvalues only and cannot be changed. Color light = GREEN; // it is green initially if (light == GREEN) // next it is yellow light = YELLOW; // and so on This solution removes the thumb from your steak. The only operations that are defined on values of enumeration type are assignment and relational operators. You cannot add them or do input or output, but you can compare them for equality or inequality and you can check whether one value is greater (or less) than another. if (light > RED) cout << "True\n"; // this prints 'True' The reason is that under the hood, enumeration values are implemented as integers. The first value in the enumeration list is 0 (no surprise, as this is how we count things in C++), the next is 1, and so on. The program can access these values by casting enumeration values to integers. cout << (int) light << endl; // this prints 0, 1, or 2 If the programmer wants to change this value to another value, one can do that explicitly in the file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (244 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm enumeration list. enum Color { RED, GREEN=8, YELLOW } ; // YELLOW is 9 now After that, the assignment of values resumes (YELLOW is 9, and so on). If for some reason you want to set GREEN to 0, this is OK with the compiler, but the program will not be able to distinguish between RED and GREEN (not a big problem unless it tries to control traffic). This technique is useful when the enumeration values are going to be used as masks for bitwise operations and hence have to represent powers of two. enum Status { CLEAR = 2, FULL = 8, EMPTY = 64 } ; Many programmers enthusiastically embrace this facility and use it for defining integer compile time constants. enum { SIZE = 80 } ; // use it to define arrays etc. Notice that this enumeration is anonymous (similar to anonymous union). It does not have a name and hence you cannot define variables of this type, but this is not a big loss because all we need is the symbolic constant SIZE. The result is the same as defining the constant explicitly. const int SIZE = 80; // same thing It is a matter of personal taste (yours or your boss's) what method of defining constants to use. Bit Fields Similar to our discussion of unions and enumerations, we will start with examples of practical problems that can be solved using additional C++ user-defined types. The smallest object that can be allocated and addressed in a C++ program is a character. Sometimes a program might need a value that is too small, and using a full-size integer to store it looks like a waste. Often, we do not pay attention to the opportunity to save memory. When memory is scarce, we would like to pack small values together. Often, external data formats and hardware device interfaces force us to process word elements. file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (245 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm For example, a disk controller might manipulate memory addresses and their components: a page number (from 0 to 15) and the offset of the memory location on the page (from 0 to 4095). The algorithm might require manipulation of the page number (4 bits), offset (12 bits), and the total address (unsigned 16 bits), be able to combine the page number and offset into the address, and extract the page number and offset from the address. Another example might be an input/output port where specific bits are associated with specific conditions and operations. Bit 1 of the port might be set if the device is in the clear to send condition, bit 3 might be set if the receiving buffer is full, and bit 6 might be set if the transmit buffer is empty. The algorithm might require setting each bit in the status word individually and retrieving the state of each bit individually. Each of these computational tasks requires bit manipulation and the use of bitwise logical operations. Combining the page number and the offset into the memory address requires shifting the memory address 12 positions to the left and performing the bitwise OR operation on the result of the shift and the address. unsigned int address, temp; // they must be unsigned int page, offset; // sign bit is never set to one temp = page << 12; // make four bits most senior address = temp | offset; // assume no extra bits there Retrieving the page number and offset from the memory address is more complex. To get the page number, we shift the address right 12 positions to throw away the bits of the offset and move the page number into the least significant bits of the word. To get the address, we use the bitwise AND operation with the mask 0x0FFF that has each of 12 least significant bits set to 1. page = address >> 12; // strip offset bits, get page bits offset = address & 0x0FFF; // strip page bits from address To set individual bits to 1, we use three masks: each mask has only 1 bit set to 1 and all other bits set to 0. By using the bitwise OR operation on the status word, we set the corresponding bit to 1 if it was 0 or leave all the bits in the same state if it was already set to 1. The constants CLEAR, FULL, and EMPTY defined in the previous section are the masks that have only 1 bit set to 1 and other bits set to 0. The constant CLEAR has bit 1 set to 1, FULL has bit 3 set to 1, EMPTY has bit 6 set to 1. unsigned status=0; // assume it is initialized properly status |= CLEAR; // set bit 1 to 1 (if it is zero) file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (246 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm status |= FULL; // set bit 3 to 1 (if it is zero) status |= EMPTY; // set bit 6 to 1 (if it is zero) To reset individual bits to 0, we need masks with all bits set to 1 with the exception of 1 bit. Using the bitwise AND operation will leave all bits in the status word unchanged and will reset the bit that is 0 in the mask. To reset bit 1, we need a mask that has bit 1 reset to 0. To reset bit 3, we need a mask that has bit 3 reset to 0. To reset bit 6, we need a mask that has bit 6 reset to 0 and all other bits should be set to 1. These masks are difficult to express as decimal or even hexadecimal constants. Also, on different platforms we might need the masks of different sizes, and that affects code portability. It is common to invert (negate) the constants' use to set these bits to 1 and use the result of negation to reset these bits to zero in the AND operation. status &= ~CLEAR; // reset bit 1 to 0 (if it is 1) status &= ~FULL; // reset bit 3 to 0 (if it is 1) status &= ~EMPTY; // reset bit 6 to 0 (if it is 1) To access the value of individual bits, we use the AND operation with the masks that have all the bits reset to 0 with the exception of 1 bit that is being accessed. If this bit's status is set, the result of the operation is not 0 (true). If this bit's status is reset to 0, the result of the operation is 0 (false). The masks that will work in these operations are exactly the same as those we used to set and reset status bits. int clear, full, empty; // to test for True or False clear = status & CLEAR; // True if bit 1 is set to one full = status & FULL; // True if bit 3 is set to one empty = status & EMPTY; // True if bit 6 is set to one These low-level operations for packing and unpacking sequences of bits (addressing example) or individual bits (status example) are complex, counterintuitive, and prone to error. C++ allows us to give names to segment bits of different sizes. This is done using conventional structure definitions. For each field, the number of bits allocated to it (field width) is specified using a nonnegative constant after the column. struct Address { int page : 4; int offset : 12; } ; // it is not large enough for 12 bits Field members are packed into machine integers. One has to be careful with signed integers: One file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (247 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm bit is usually allocated for the sign. If you want to use all the bits allocated for the field, the field has to be unsigned, as in this example. struct Address { unsigned int page : 4; unsigned int offset : 12; } ; // place for 12 bits The bit field may not straddle a word boundary. If it does not fit into the machine word, it is allocated to the next word and the remaining bits are left unused. It is a syntax error if the width of the field exceeds the size of the basis type on the given platform (which can be different for different machines). Fields might save data space: There is no need to allocate a byte or a word for each value; however, the size of the code, which manipulates these values, increases because of the need to extract the bits. The end result is not clear. The variables are defined in the same way as structure variables are. Access to bit fields is the same as for regular structure fields. Address a; unsigned address; // make sure that a is initialized address = (a.page << 12) | a.offset; If you want to allocate 1 bit for a flag, make sure the field is unsigned rather than signed. Fields do not have to have names; unnamed fields are used for padding. (We still have to specify the type, colon, and width.) struct Status { unsigned : 1; // bit 0 unsigned Clear : 1; // bit 1 unsigned : 1; // bit 2 unsigned Full : 1; // bit 3 unsigned : 2; // bits 4 and 5 unsigned Empty : 1; } ; // bit 6 The code for manipulating the status variables is very simple. Under the hood, it is implemented through shifts and bitwise logical operations similar to the examples we discussed at the beginning of this section. file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (248 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm Status stat; // make sure it is initialized int clear, full, empty; // for testing for True or False stat.Clear = stat.Full = stat.Empty = 1; // set bit to one stat.Clear = stat.Full = stat.Empty = 0; // reset bits to zero clear = stat.Clear; // the values can be tested full = stat.Full; empty = stat.Empty; The width of zero is allowed; it is the signal to the compiler to align the next field at the next integer boundary. It is allowed to mix data of different integral types. Switching from the type of one size to the type of another size allocates the next field at the word boundary. Careless use of bit fields might not decrease the allocated space, as the next (contrived) example demonstrates. (This code is written for a 16-bit machine where integers are allocated two bytes.) struct Waste { long first : 2 ; // this allocates all 4 bytes unsigned second : 2; // this adds two more char third : 1; // short starts on even address short fourth : 1; } ; // and this: 10 bytes total On some machines, fields are assigned left to right, and on others they are assigned right to left (so- called little endiens and big endiens). This is not a problem for internally defined data structures; however, this is significant for mapping externally defined data, for example, device I/O buffers. When external data come in one format and the computer uses another, the data in the bit fields might be saved incorrectly. Before you decide to use bit fields, evaluate the alternatives. Remember that accessing a character or an integer is always faster than accessing a bit field and takes less code. Summary In this chapter, we looked at major program-building tools that the programmer has for creating large complex programs. Most of these tools deal with aggregation of data into larger units: homogeneous containers (arrays) and heterogeneous objects (structures). These aggregate data types do not have operations of their own with the exception of the assignment for structures. All operations over aggregate objects have to be programmed in terms of operations over individual elements. Since structure fields are accessed using individual field names; they are relatively safe. Array file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (249 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm components are accessed using subscripts, and C++ provides neither compile-time nor run-time protection against illegal values of indices. This can easily lead to incorrect results or to memory corruption and is a source of concern for a C++ programmer. This is especially true for character arrays where the end of valid data is specified by the zero terminator. We also looked at such programmer-defined types as unions, enumerations, and bit fields. Unlike arrays and structures, they are not really necessary. Any program can be written without using these structures. Often, however, they simplify the appearance of the source code, convey more information to the maintainer about the designer's intent, and make the job of the maintainer (and the designer) easier. Chapter 6. Memory Management: the Stack and the Heap Topics in this Chapter ϒΠ Name Scope as a Tool for Cooperation ϒΠ Memory Management: Storage Classes ϒΠ Memory Management: Using Heap ϒΠ Input and Output with Disk Files ϒΠ Summary In the previous chapter, we studied the tools for implementing programmer-defined data structures. Arrays and structures are the basic programming tools that allow the designers to express complex ideas about the application in a concise and manageable form¡Xboth for the designers themselves and also for maintenance programmers. Unions, enumerations and bit fields help the designer to represent code in the most understandable way. All variables, of built-in and of programmer-defined types alike, that were used in the previous coding examples, were named variables. The programmer has to choose the name and the place of the definition in source code. When the program needs memory for named variables, it is allocated and deallocated without further programmer participation, according to the language rules, in the area of memory called stack. We pay for this simplicity with the lack of flexibility: The size of each data item is defined at compile time. For flexible sizes of data structures, C++ allows the programmer to build dynamic arrays and linked data structures. We pay for this flexibility with the complexity of using pointers. When the file://///Administrator/General%20English%20Learning/it2002-7-6/core.htm (250 of 1187) [8/17/2002 2:57:49 PM] Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... variables in a file scope or for local variables defined in a block or in a function scope used for variables kept in high-speed registers rather than in random-access memory For objects (variables) of these classes, the language rules define allocation and deallocation: extern and static variables are allocated in the fixed data memory of the program, auto variables are allocated on the program stack,... for local variables When resources are scarce, it is important to understand the consequences of a design decision For example, in Listing 6.2 I define array num[] as a local variable in function main() and array amounts[] as a local variable in the body of the first loop Both these arrays contain data for loading values into global array a[ ] Defining arrays num[] and amounts[] in different places in... program stack, and register variables are allocated in registers if possible If there are not enough registers available, these variables are allocated either in the fixed area (for global variables) or on the program stack (for local variables) Automatic Variables Automatic variables are local variables defined in functions or in blocks The auto specifier is default and is not often used For example, function... once, at the beginning of the function main() execution Array amounts[] is allocated, initialized, and deallocated as many times as the loop body is executed Array allocation and deallocation does not take much execution time (it involves manipulating the stack pointer), but copying values into array elements for initialization takes about as much time as does copying data from array amounts[] into array... memory at the same time.) Another advantage of using global variables is speed Since each global variable is allocated and deallocated only once rather than each time the scope is entered, this operation cannot slow down the program (of course, for many applications this is not important) Yet another advantage of using global variables is less demand on the program stack The size of the stack that is... in the cout statement, I want to have a variable caption[], which contains the text "Average balance is $" (a common technique to facilitate internationalization of the program), and I want function printAverage() to call function printCaption(), which uses the variable caption[] Again, I am using very small examples so that they are relatively easy to understand, but I introduce additional functions... programmer-defined names form a single name space If a name is declared in a scope for any purpose, it should be unique in that scope among all the names declared in the same scope for any purpose This means that if, for example, count is a name of a variable, then no type, function, parameter, or another variable can be named count in the same scope where the variable count is declared Similar to most software engineering. .. the association between a name of a variable and its location in memory is valid, that is, when the storage is allocated for that variable Unlike lexical scope, storage class is a run-time feature of program behavior Program execution in C++ always starts with main(); the first executable statement in main() is usually the first statement executed by the program Function main() calls other program functions,... program statements (operator new) Dynamic variables are allocated on the program heap In definitions of variables, C++ storage classes can be specified using the following keywords ϒΠ auto: ϒΠ extern: ϒΠ static: ϒΠ register: default for variables defined as local in a block scope or in a function scope (automatic variables) can be applied to variables that are global in file scope can be used for global... allocating and deallocating memory for individual computational objects For some tasks, this technique is not sufficient, and dynamic memory management is used instead As you are going to see later in this chapter, dynamic memory management is more complex and errorprone This is why automatic variables should be used (and are used) as much as possible Memory allocated for an automatic variable in another . shows a simple example that loads account data, displays data, and computes total of account balances. For simplicity of the example, I do not load the data set from the keyboard, an external. accessing a character or an integer is always faster than accessing a bit field and takes less code. Summary In this chapter, we looked at major program-building tools that the programmer has. user-defined types. The smallest object that can be allocated and addressed in a C++ program is a character. Sometimes a program might need a value that is too small, and using a full-size integer