Tài liệu Independent component analysis P2 pptx

Thông tin tài liệu

Part I MATHEMATICAL PRELIMINARIES Independent Component Analysis. Aapo Hyv ¨ arinen, Juha Karhunen, Erkki Oja Copyright  2001 John Wiley & Sons, Inc. ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic) 2 Random Vectors and Independence In this chapter, we review central concepts of probability theory,statistics, and random processes. The emphasis is on multivariate statistics and random vectors. Matters that will be needed later in this book are discussed in more detail, including, for example, statistical independence and higher-order statistics. The reader is assumed to have basic knowledge on single variable probability theory, so that fundamental definitions such as probability, elementary events, and random variables are familiar. Readers who already have a good knowledge of multivariate statistics can skip most of this chapter. For those who need a more extensive review or more information on advanced matters, many good textbooks ranging from elementary ones to advanced treatments exist. A widely used textbook covering probability, random variables, and stochastic processes is [353]. 2.1 PROBABILITY DISTRIBUTIONS AND DENSITIES 2.1.1 Distribution of a random variable In this book, we assume that random variables are continuous-valued unless stated otherwise. The cumulative distribution function (cdf) of a random variable at point is defined as the probability that : (2.1) Allowing to change from to defines the whole cdf for all values of . Clearly, for continuous random variables the cdf is a nonnegative, nondecreasing (often monotonically increasing) continuous function whose values lie in the interval 15 Independent Component Analysis. Aapo Hyv ¨ arinen, Juha Karhunen, Erkki Oja Copyright  2001 John Wiley & Sons, Inc. ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic) 16 RANDOM VECTORS AND INDEPENDENCE σσ m Fig. 2.1 A gaussian probability density function with mean and standard deviation . . From the definition, it also follows directly that ,and . Usually a probability distribution is characterized in terms of its density function rather than cdf. Formally, the probability density function (pdf) of a continuous random variable is obtained as the derivative of its cumulative distribution function: (2.2) In practice, the cdf is computed from the known pdf by using the inverse relationship (2.3) For simplicity, is often denoted by and by , respectively. The subscript referring to the random variable in question must be used when confusion is possible. Example 2.1 The gaussian (or normal) probability distribution is used in numerous models and applications, for example to describe additive noise. Its density function is given by (2.4) PROBABILITY DISTRIBUTIONS AND DENSITIES 17 Here the parameter (mean) determines the peak point of the symmetric density function, and (standard deviation), its effective width (flatness or sharpness of the peak). See Figure 2.1 for an illustration. Generally, the cdf of the gaussian density cannot be evaluated in closed form using (2.3). The term in front of the density (2.4) is a normalizing factor that guarantees that the cdf becomes unity when . However, the values of the cdf can be computed numerically using, for example, tabulated values of the error function erf (2.5) The error function is closely related to the cdf of a normalized gaussian density, for which the mean and the variance . See [353] for details. 2.1.2 Distribution of a random vector Assume now that is an -dimensional random vector (2.6) where denotes the transpose. (We take the transpose because all vectors in this book are column vectors. Note that vectors are denoted by boldface lowercase letters.) The components of the column vector are continuous random variables. The concept of probability distribution generalizes easily to such a random vector. In particular, the cumulative distribution function of is defined by (2.7) where again denotes the probability of the event in parentheses, and is some constant value of the random vector . The notation means that each component of the vector is less than or equal to the respective component of the vector . The multivariate cdf in Eq. (2.7) has similar properties to that of a single random variable. It is a nondecreasing function of each component, with values lying in the interval . When all the components of approach infinity, achieves its upper limit ; when any component , . The multivariate probabilitydensity function of is defined as the derivative of the cumulative distribution function with respect to all components of the random vector : (2.8) Hence (2.9) 18 RANDOM VECTORS AND INDEPENDENCE where is the th component of the vector . Clearly, (2.10) This provides the appropriate normalization condition that a true multivariate probability density must satisfy. In many cases, random variables have nonzero probability density functions only on certain finite intervals. An illustrative example of such a case is presented below. Example 2.2 Assume that the probability density function of a two-dimensional random vector = is elsewhere Let us now compute the cumulative distribution function of . It is obtained by integrating over both and , taking into account the limits of the regions where the density is nonzero. When either or , the density and consequently also the cdf is zero. In the region where and , the cdf is given by In the region where and , the upper limit in integrating over becomes equal to 1, and the cdf is obtained by inserting into the preceding expression. Similarly, in the region and , the cdf is obtained by inserting to the preceding formula. Finally, if both and ,the cdf becomes unity, showing that the probability density has been normalized correctly. Collecting these results yields or and 2.1.3 Joint and marginal distributions The joint distribution of two different random vectors can be handled in a similar manner. In particular, let be another random vector having in general a dimension different from the dimension of . The vectors and can be concatenated to EXPECTATIONS AND MOMENTS 19 a "supervector" = , and the preceding formulas used directly. The cdf that arises is called the joint distribution function of and , and is given by (2.11) Here and are some constant vectors having the same dimensions as and , respectively, and Eq. (2.11) defines the joint probability of the event and . The joint density function of and is again defined formally by dif- ferentiating the joint distribution function with respect to all components of the random vectors and . Hence, the relationship (2.12) holds, and the value of this integral equals unity when both and . The marginal densities of and of are obtained by integrating over the other random vector in their joint density : (2.13) (2.14) Example 2.3 Consider the joint density given in Example 2.2. The marginal densities of the random variables and are elsewhere elsewhere 2.2 EXPECTATIONS AND MOMENTS 2.2.1 Definition and general properties In practice, the exact probability density function of a vector or scalar valued random variable is usually unknown. However, one can use instead expectations of some 20 RANDOM VECTORS AND INDEPENDENCE functions of that random variable for performing useful analyses and processing. A great advantage of expectations is that they can be estimated directly from the data, even though they are formally defined in terms of the density function. Let denote any quantity derived from the random vector . The quantity may be either a scalar, vector, or even a matrix. The expectation of is denoted by E , and is defined by E (2.15) Here the integral is computed over all the components of .The integration operation is applied separately to every component of the vector or element of the matrix, yielding as a result another vector or matrix of the same size. If = , we get the expectation E of ; this is discussed in more detail in the next subsection. Expectations have some important fundamental properties. 1. Linearity. Let , be a set of different random vectors, and , , some nonrandom scalar coefficients. Then E E (2.16) 2. Linear transformation. Let be an -dimensional random vector, and and some nonrandom and matrices, respectively. Then E E E E (2.17) 3. Transformation invariance. Let be a vector-valued function of the random vector .Then (2.18) Thus E =E , even though the integrations are carried out over different probability density functions. These properties can be proved using the definition of the expectation operator and properties of probability density functions. They are important and very helpful in practice, allowing expressions containing expectations to be simplified without actually needing to compute any integrals (except for possibly in the last phase). 2.2.2 Mean vector and correlation matrix Moments of a random vector are typical expectations used to characterize it. They are obtained when consists of products of components of . In particular, the EXPECTATIONS AND MOMENTS 21 first moment of a random vector is called the mean vector of . It is defined as the expectation of : E (2.19) Each component of the -vector is given by E (2.20) where is the marginal density of the th component of . This is because integrals over all the other components of reduce to unity due to the definition of the marginal density. Another important set of moments consists of correlations between pairs of components of . The correlation between the th and th component of is given by the second moment E (2.21) Note that correlation can be negative or positive. The correlation matrix E (2.22) of the vector represents in a convenient form all its correlations, being the element in row and column of . The correlation matrix has some important properties: 1. It is a symmetric matrix: = . 2. It is positive semidefinite: (2.23) for all -vectors . Usually in practice is positive definite, meaning that for any vector , (2.23) holds as a strict inequality. 3. All the eigenvalues of are real and nonnegative (positive if is positive definite). Furthermore, all the eigenvectors of are real, and can always be chosen so that they are mutually orthonormal. Higher-order moments can be defined analogously, but their discussion is post- poned to Section 2.7. Instead, we shall first consider the corresponding central and second-order moments for two different random vectors. 22 RANDOM VECTORS AND INDEPENDENCE 2.2.3 Covariances and joint moments Central moments are defined in a similar fashion to usual moments, but the mean vectors of the random vectors involved are subtracted prior to computing the expectation. Clearly, central moments are only meaningful above the first order. The quantity corresponding to the correlation matrix is called the covariance matrix of , and is given by E (2.24) The elements E (2.25) of the matrix are called covariances, and they are the central moments corresponding to the correlations 1 defined in Eq. (2.21). The covariance matrix satisfies the same properties as the correlation matrix . Using the properties of the expectation operator, it is easy to see that (2.26) If the mean vector , the correlation and covariance matrices become the same. If necessary, the data can easily be made zero mean by subtracting the (estimated) mean vector from the data vectors as a preprocessing step. This is a usual practice in independent component analysis, and thus in later chapters, we simply denote by the correlation/covariance matrix, often even dropping the subscript for simplicity. For a single random variable , the mean vector reduces to its mean value = E , the correlation matrix to the second moment E , and the covariance matrix to the variance of E (2.27) The relationship (2.26) then takes the simple form E = . The expectation operation can be extended for functions of two different random vectors and in terms of their joint density: E (2.28) The integrals are computed over all the components of and . Of the joint expectations, the most widely used are the cross-correlation matrix E (2.29) 1 In classic statistics, the correlation coefficients = are used, and the matrix consisting of them is called the correlation matrix. In this book, the correlation matrix is defined by the formula (2.22), which is a common practice in signal processing, neural networks, and engineering. EXPECTATIONS AND MOMENTS 23 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 y x Fig. 2.2 An example of negative covariance between the random variables and . −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 y x Fig. 2.3 An example of zero covariance between the random variables and . and the cross-covariance matrix E (2.30) Note that the dimensions of the vectors and can be different. Hence, the cross- correlation and -covariance matrices are not necessarily square matrices, and they are not symmetric in general. However, from their definitions it follows easily that (2.31) If the mean vectors of and are zero, the cross-correlation and cross-covariance matrices become the same. The covariance matrix of the sum of two random vectors and having the same dimension is often needed in practice. It is easy to see that (2.32) Correlations and covariances measure the dependence between the random variables using their second-order statistics. This is illustrated by the following example. Example 2.4 Consider the two different joint distributions of the zero mean scalar random variables and shown in Figs. 2.2 and 2.3. In Fig. 2.2, and have a clear negative covariance (or correlation). A positive value of mostly implies that is negative, and vice versa. On the other hand, in the case of Fig. 2.3, it is not possible to infer anything about the value of by observing . Hence, their covariance . [...]... gaussian pdf is that linear processing methods based on first- and second-order statistical information are usually optimal for gaussian data For example, independent component analysis does not bring out anything new compared with standard principal component analysis (to be discussed later) for gaussian data Similarly, linear time-invariant discrete-time filters used in classic statistical signal processing... secondorder temporal statistics on certain conditions Such techniques are quite different from standard independent component analysis They will be discussed in Chapter 18 34 RANDOM VECTORS AND INDEPENDENCE u to x makes the components of the gaussian distribution of uncorrelated, and hence also independent Moreover, the eigenvalues i and eigenvectors i of the covariance matrix x reveal the geometrical... to independent and identically distributed random vectors i having a common mean z and covariance matrix z The limiting distribution of the random vector !1 z m C 1 X yk = p (zi mz) k k (2.78) i=1 C is multivariate gaussian with zero mean and covariance matrix z The central limit theorem has important consequences in independent component analysis and blind source separation A typical mixture, or component. .. definition (2.57) gives rise to a generalization of the standard notion of statistical independence The components of the random vector x are themselves scalar random variables, and the same holds for y and z Clearly, the components of x can be mutually dependent, while they are independent with respect to the components of the other random vectors y and z, and (2.57) still holds A similar argument applies... respective covariance matrices as well s 2.3.2 n Statistical independence A key concept that constitutes the foundation of independent component analysis is statistical independence For simplicity, consider first the case of two different scalar random variables x and y The random variable x is independent of y , if knowing the value of y does not give any information on the value of x For example, x and y... networks the inner product u = T of the weight vector of the neuron and its input vector Inserting this into (2.86) shows clearly that higher-order statistics of the components of the vector are involved in the computations Independent component analysis and blind source separation require the use of higher-order statistics either directly or indirectly via nonlinearities Therefore, we discuss in the following... the components of the noise vector are all uncorrelated and have equal variance 2 , so that in (2.50) n Rn = 2 I (2.51) Sometimes, for example in a noisy version of the ICA model (Chapter 15), the components of the signal vector are also mutually uncorrelated, so that the signal correlation matrix becomes the diagonal matrix s Ds = diag(Efs2g Efs2g : : : Efs2 g) 1 2 m where s1 s2 : : : sm are components... and covariance matrix Cy = ACxAT A special case of this result says that any linear combination of gaussian random variables is itself gaussian This result again has implications in standard independent component analysis: it is impossible to estimate the ICA model for gaussian data, that is, one cannot blindly separate 33 THE MULTIVARIATE GAUSSIAN DENSITY gaussian sources from their mixtures without... each component of is perfectly correlated with itself The best that we can achieve is that different components of are mutually uncorrelated, leading to the uncorrelatedness condition x x C x Cx = Ef(x mx)(x mx)T g = D Here D is an n n diagonal matrix D = diag(c11 c22 : : : cnn) = diag( (2.41) 2 ::: x ) (2.42) = Ef(xi mx )2 g = cii of the 2 x1 2 x2 n whose n diagonal elements are the variances xi i components... whitening of the original data can be made in infinitely many ways Whitening will be discussed in more detail in Chapter 6, because it is a highly useful and widely used preprocessing step in independent component analysis It is clear that there also exists infinitely many ways to decorrelate the original data, because whiteness is a special case of the uncorrelatedness property Example 2.5 Consider the . gaussian data. For example, independent component analysis does not bring out anything new compared with standard principal component analysis (to be discussed. increasing) continuous function whose values lie in the interval 15 Independent Component Analysis. Aapo Hyv ¨ arinen, Juha Karhunen, Erkki Oja Copyright 

Ngày đăng: 21/01/2014, 06:20

Xem thêm: Tài liệu Independent component analysis P2 pptx, Tài liệu Independent component analysis P2 pptx

Tài liệu Independent component analysis P2 pptx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan