neural networks algorithms applications and programming techniques phần 5 pdf

4.3 The Hopfield Memory 153 causes a lateral inhibition between units on each column. The first delta ensures that this inhibition is confined to each column, where i = j. The second delta ensures that each unit does not inhibit itself. The contribution of the third term in the energy equation is perhaps not so intuitive as the first two. Because it involves a sum of all of the outputs, it has a rather global character, unlike the first two terms, which were localized to rows and columns. Thus, we include a global inhibition, — C, such that each unit in the network is inhibited by this constant amount. Finally, recall that the last term in the energy function contains information about the distance traveled on the tour. The desire to minimize this term can be translated into connections between units that inhibit the selection of adjacent cities in proportion to the distance between those cities. Consider the term For a given column, j (i.e., for a given position on the tour), the two delta terms ensure that inhibitory connections are made only to units on adjacent columns. Units on adjacent columns represent cities that might come either before or after the cities on column j. The factor -Dd X y ensures that the units representing cities farther apart will receive the largest inhibitory signal. We can now define the entire connection matrix by adding the contributions of the previous four paragraphs: T Xi ,Yj = -A6xY(l-Si j )-B6 i j(l-SxY)-C-Dd XY (6 j .i+i+6 j . i -i) (4.30) The inhibitory connections between units are illustrated graphically in Fig- ure 4.11. To find a solution to the TSP, we must return to the equations that describe the time evolution of the network. Equation (4.24) is the one we want: Here, we have used N as the summation limit to avoid confusion with the n previously defined. Because all of the terms in T t j contain arbitrary constants, and Ii can be adjusted to any desired values, we can divide this equation by C and write dt ^ l} J r where r = RC, the system time constant, and we have assumed that Rj = R for all i. 154 The BAM and the Hopfield Memory Position on tour 3 4 Figure 4.11 This schematic illustrates the pattern of inhibitory connections between PEs for the TSP problem: Unit a illustrates the inhibition between units on a single row, unit b shows the inhibition within a single column, and unit c shows the inhibition of units in adjacent columns. The global inhibition is not shown. A digital simulation of this system requires that we integrate the above set of equations numerically. For a sufficiently small value of A£, we can write N A (4.31) Then, we can iteratively update the Uj values according to (4.32) 4.3 The Hopfield Memory 155 where Aw^ is given by Eq. (4.31). The final output values are then calculated using the output function Notice that, in these equations, we have returned to the subscript notation used in the discussion of the general system: v^ rather than VYJ- In the double-subscript notation, we have u X i(t + 1) = uxitt) + &u xt (4.33) and vxt = 9Xr(u X i) = ^(l + tanh(Awxi)) (4.34) If we substitute TXI.YJ from Eq. (4.30) into Eq. (4.31), and define the external inputs as Ixi = Cri , with n' a constant, and C equal to the C in Eq. (4.30), the result takes on an interesting form (see Exercise 4.11): (4.35) n - - D d Exercise 4.11: Assume that n' = n in Eq. (4.35). Then, the sum of terms, ~A( ) - B( ) - C( ) - D( ), has a simple relationship to the TSP energy function in Eq. (4.29). What is that relationship? Exercise 4.12: Using the double-subscript notation on the outputs of the PEs, «C3 refers to the output of the unit that represents city C in position 3 of the tour. This unit is also element v^ of the output-unit matrix. What is the general equation that converts the dual subscripts of the matrix notation, Vjk into the proper single subscript of the vector notation, i>,:? Exercise 4.13: There are 25 possible connections to unit v C 3 — ^33 from other units in the five-city tour problem. Determine the values of the resistors, Rij = l/\Tij\, that form those connections. To complete the solution of the TSP, suitable values for the constants must be chosen, along with the initial values of the uxi- Hopfield [6] provides parameters suitable for a 10-city problem: A = B = 500, C = 200, D = 500, T — 1, A = 50, and n' = 15. Notice that it is not necessary to choose n' = n. Because n' enters the equations through the external inputs, 7j = Cn', it can be used as another adjustable parameter. These parameters must be empirically chosen, and those for a 10-city tour will not necessarily work for tours of different sizes. 156 The BAM and the Hopfield Memory We might be tempted to make all of the initial values of the uxi equal to a constant UQO such that, at t = 0, because that is what we expect that particular sum to be when the network has stabilized on a solution. Assigning initial values in that manner, however, has the effect of placing the system on an unstable equilibrium point, much like a ball placed at the exact top of a hill. Without at least a slight nudge,the ball would remain there forever. Given that nudge, however, the ball would roll down the hill. We can give our TSP system a nudge by adding a random noise term to the UQO values, so that uxi = "oo + 6uxi, where Suxt is the random noise term, which may be different for each unit. In the ball-on-the-hill analogy, the direction of the nudge determines the direction in which the ball rolls off the hill. Likewise, different random-noise selections for the initial uxi values may result in different final stable states. Refer back to the discussion of optimization problems earlier in this section, where we said that a good solution now may be better than the best solution later. Hopfield's solution to the TSP may not always find the best solution (the one with the shortest distance possible), but repeated trials have shown that the network generally settles on tours at or near the minimum distance. Figure 4.12 shows a graphical representation of how a network would evolve toward a solution. We have discussed this example at great length to show both the power and the complexity of the Hopfield network. The example also illustrates a general principle about neural networks: For a given problem, finding an appropriate representation of the data or constraints is often the most difficult part of the solution. 4.4 SIMULATING THE BAM As you may already suspect, the implementation of the BAM network simulator will be straightforward. The only difficulty is the implementation of bidirectional connections between the layers, and, with a little finesse, this is a relatively easy problem to overcome. We shall begin by describing the general nature of the problems associated with modeling bidirectional connections in a sequential memory array. From there, we will present the data structures needed to overcome these problems while remaining compatible with our basic simulator. We conclude this section with a presentation of the algorithms needed to implement the BAM. 4.4.1 Bidirectional-Connection Considerations Let us first consider the basic data structures we have defined for our simulator. We have assumed that all network PEs will be organized into layers, with connections primarily between the layers. Further, we have decided that the 4.4 Simulating the BAM 157 1 23456789 10 A •• B C D E F G H (a) (b) (c) (d) Figure 4.12 This sequence of diagrams illustrates the convergence of the Hopfield network for a 10-city TSP tour. The output values, vxi, are represented as squares at each location in the output- unit matrix. The size of the square is proportional to the magnitude of the output value, (a, b, c) At the intermediate steps, the system has not yet settled on a valid tour. The magnitude of the output values for these intermediate steps can be thought of as the current estimate of the confidence that a particular city will end up in a particular position on the tour, (d) The network has stabilized on the valid tour, DHIFGEAJCB. Source: Reprinted with permission of Springer- Verlag, Heidelberg, from J. J. Hopfield and D. W. Tank, "Neural computation of decisions in optimization problems." Biological Cybernetics, 52:141-152, 1985. L 158 The BAM and the Hopfield Memory individual PEs within any layer will be simulated by processing inputs, with no provision for processing output connections. With respect to modeling bidirectional connections, we are faced with the dilemma of using a single connection as input to two different PEs. Thus, our parallel array structures for modeling network connections are no longer valid. As an example, consider the weight matrix illustrated on page 136 as part of the discussion in Section 4.2. For clarity, we will consider this matrix as being an R x C array, where R = rows = 6 and C — columns — 10. Next, consider the implementation of this matrix in computer memory, as depicted in Figure 4.13. Since memory is organized as a one-dimensional linear array of cells (or bytes, words, etc.), most modern computer languages will allocate and maintain this matrix as a one-dimensional array of R vectors, each C cells long, arranged sequentially in the computer memory. 7 In this implementation, access to each row vector requires at least one multiplication (row index x number of columns per row) and an addition (to determine the memory address of the row, offset from the base address of the array). However, once the beginning of the row has been located, access to the individual components within the vector is simply an increment operation. In the column-vector case, access to the data is not quite as easy. Simply put, each component of the column vector must be accessed by performance of a multiplication (as before, to access the appropriate row), plus an addition to locate the appropriate cell. The penalty imposed by this approach is such that, for the entire column vector to be accessed, R multiplications must be performed. To access each element in the matrix as a component of a column vector, we must do R x C multiplications, or one for each element—a time- consuming process. 4.4.2 BAM Simulator Data Structures Since we have chosen to use the array-based model for our basic network data structure, we are faced with the complicated (and CPU-time-consuming) problem of accessing the network weight matrix first as a set of row vectors for the propagation from layer x to layer y, then accessing weights as a set of column vectors for the propagation in the other direction. Further complicating the situation is the fact that we have chosen to isolate the weight vectors in our network data structure, accessing each array indirectly through the intermediate 7 FORTRAN, which uses a column-major array organization, is the notable exception. 4.4 Simulating the BAM 159 weight_ptr array. If we hold strictly to this scheme, we must significantly modify the design of our simulator to allow access to the connections from both layers of PEs, a situation illustrated in Figure 4.14. As shown in this diagram, all the connection weights will be contained in a set of arrays associated with one layer of PEs. The connections back to the other layer must then be individually accessed by indexing into each array to extract the appropriate element. To solve this dilemma, let's now consider a slight modification to the con- ceptual model of the BAM. Until now, we have considered the connections between the layers as one set of bidirectional paths; that is, signals can pass High memory Row 5 Row 4 Row 3 Row 2 Row 1 RowO Y////////A BA + 6(10) BA + 5(10) BA + 4(10) BA + 3(10) BA + 2(10) BA+1(10) 10 Columns Base address (BA) Low memory Figure 4.13 The row-major structure used to implement a matrix is shown. In this technique, memory is allocated sequentially so that column values within the same row are adjacent. This structure allows the computer to step through all values in a single row by simply incrementing a memory pointer. 160 The BAM and the Hopfield Memory outputs weight matrix y l weights y weights y weights Figure 4.14 This bidirectional connection implementation uses our standard data structures. Here, the connection arrays located by the layer y structure are identical to those previously described for the backpropagation simulator. However, the pointers associated with the layer x structure locate the connection in the first weights array that is associated with the column weight vector. Hence, stepping through connections to layer x requires locating the connection in each weights array at the same offset from the beginning of array as the first connection. from layer x to layer y as well as from layer y to layer x. If we instead consider the connections as two sets of unidirectional paths, we can logically implement the same network if we simply connect the outputs of the x layer to the inputs on the y layer, and, similarly, connect the outputs of the y layer to the inputs on the x layer. To complete this model, we must initialize the connections from x to y with the predetermined weight matrix, while the connections from y to x must contain the transpose of the weight matrix. This strategy allows us to process only inputs at each PE, and, since the connections are always accessed in the desired row-major form, allows efficient signal propagation through the simulator, regardless of direction. 4.4 Simulating the BAM 161 The disadvantage to this approach is that it consumes twice as much memory as does the single-matrix implementation. There is not much that we can do to solve this problem other than reverting to the single-matrix model. Even a linked-list implementation will not solve the problem, as it will require ap- proximately three times the memory of the single-matrix model. Thus, in terms of memory consumption, the single-matrix model is the most efficient implementation. However, as we have already seen, there are performance issues that must be considered when we use the single matrix. We therefore choose to implement the double matrix, because run-time performance, especially in a large network application, must be good enough to prevent long periods of dead time while the human operator waits for the computer to arrive at a solution. The remainder of the network is completely compatible with our generic network data structures. For the BAM, we begin by defining a network with two layers: record BAM = X : "layer; {pointer to first layer record} Y : "layer; {pointer to second layer record} end record; As before, we now consider the implementation of the layers themselves. In the case of the BAM, a layer structure is simply a record used to contain pointers to the outputs and weight_ptr arrays. Such a record is defined by the structure record LAYER = OUTS : ~integer[]; {pointer to node outputs array} WEIGHTS : ~"integer[]; {pointer to weight_ptr array} end record; Notice that we have specified integer values for the outputs and weights in the network. This is a benefit derived from the binary nature of the network, and from the fact that the individual connection weights are given by the dot product between two integer vectors, resulting in an integer value. We use integers in this model, since most computers can process integer values much faster than they can floating-point values. Hence, the performance improvement of the simulator for large BAM applications justifies the use of integers. We now define the three arrays needed to store the node outputs, the connection weights, and the intermediate weight-ptr. These arrays will be sized dynamically to conform to the desired BAM network structure. In the case of the outputs arrays, one will contain x integer values, whereas the other must be sized to contain y integers. The weight_ptr array will contain a memory pointer for each PE on the layer; that is, x pointers will be required to locate the connection arrays for each node on the x layer, and y pointers for the connections to the y layer. Conversely, each of the weights arrays must be sized to accommodate an integer value for each connection to the layer from the input layer. Thus, each 162 The BAM and the Hopfield Memory weights array on the x layer will contain y values, whereas the weights arrays on the y layer will each contain x values. The complete BAM data structure is illustrated in Figure 4.15. 4.4.3 BAM Initialization Algorithms As we have noted earlier, the BAM is different from most of the other ANS networks discussed in this text, in that it is not trained; rather, it is initialized. Specifically, it is initialized from the set of training vectors that it will be required to recall. To develop this algorithm, we use the formula used previously to generate the weight matrix for the BAM, given by Eq. (4.6), and repeated here outputs y 1 weights Figure 4.15 The data structures for the BAM simulator are shown. Notice the difference in the implementation of the connection arrays in this model and in the single-matrix model described earlier. [...]... ———— -~ ~ ——- (5 28) ' The derivative of the energy function is = -xfx'f (5. 29) ' ' 186 Simulated Annealing and the derivative of the partition function is dEmn _E e '^T IT\ ) (5. 30) _ j _ ^ x m n a m n -£ m n /T y / -• l J m.n Substituting Eqs (5. 29) and (5. 30) into Eq (5. 28) yields (5. 31) where we have made use of the definition of P~(V a A H f) ) and the definition ofP-(VJ Equation (5. 31) can now... Intractability W H Freeman, New York, 1979 [3] Morris W Hirsch and Stephen Smale Differential Equations, Dynamical Systems, and Linear Algebra Academic Press, New York, 1974 [4] John J Hopfield Neural networks and physical systems with emergent collective computational abilities Proc Natl Acad Sci USA, 79: 255 4 255 8, April 1982 Biophysics [5] John J Hopfield Neurons with graded response have collective... IEEE Transactions on Information Theory, 35( l) :59 -68, January 1989 [13] David W Tank and John J Hopfield Collective computation in neuronlike circuits Scientific American, 257 (6): 104-114, December 1987 [14] Gene A Tagliarini and Edward W Page Solving constraint satisfaction problems with neural networks In Proceedings of the First IEEE Conference on Neural Networks, San Diego, CA, III: 741-747, June... A H6) P+(V a ) Q and P-(V0 A H6) P+rv \ ° = P+(VQ A H6) * ( ' a/ Using these results, we can write dG 1 , where p (v ~ « A H")a-fcx PT = (5 33) - and z (5. 34) a.b The interpretation of Eqs (5. 33) and (5. 34) will be given shortly For now, recall that weight changes occur in the direction of the negative gradient of G Weight updates are calculated according to A U ;,,=e(p+ - P r.) (5. 35) where e is a... applies to both physical and information systems We wish to extend the analogy along the following lines: If our neural networks have energy and entropy, is it possible to define a temperature parameter that has meaning for neural networks? If so, what is the benefit to be gained by defining such a parameter? Can we place our neural network in contact with a fictitious heat reservoir, and again, is there... two-state neurons Proc Natl Acad Sci USA, 81:3088-3092, May 1984 Biophysics [6] John J Hopfield and David W Tank "Neural" computation of decisions in optimization problems Biological Cybernetics, 52 :141- 152 , 19 85 [7] John J Hopfield and David W Tank Computing with neural circuits: A model Science, 233:6 25- 633, August 1986 [8] Bart Kosko Adaptive bidirectional associative memories Applied Optics, 26(23):4947-4960,... as in Eq (5. 5) Since Iog2 P2i = Iog2 - is independent of i, and 5. 1 Information Theory and Statistical Mechanics 173 £;(.PH - PU) = Ej P\i - £/ p2 . Heidelberg, from J. J. Hopfield and D. W. Tank, " ;Neural computation of decisions in optimization problems." Biological Cybernetics, 52 :141- 152 , 19 85. L 158 The BAM and the Hopfield Memory individual. Hopfield and David W. Tank. " ;Neural& quot; computation of decisions in optimization problems. Biological Cybernetics, 52 :141- 152 , 19 85. [7] John J. Hopfield and David W. Tank. Computing with neural. emergent collective computational abilities. Proc. Natl. Acad. Sci. USA, 79: 255 4- 255 8, April 1982. Biophysics. [5] John J. Hopfield. Neurons with graded response have collective computational