Tài liệu 32 Inverse Problems in Microphone Arrays docx

Thông tin tài liệu

A. C. Surendran. “Inverse Problems in Microphone Arrays.” 2000 CRC Press LLC. <http://www.engnetbase.com>. InverseProblemsinMicrophone Arrays A.C.Surendran BellLaboratories LucentTechnologies 32.1Introduction:DereverberationUsingMicrophoneArrays 32.2SimpleDelay-and-SumBeamformers ABriefLookatAdaptiveArrays • ConstrainedAdaptiveBeam- formingFormulatedasanInverseProblem • MultipleBeam- forming 32.3MatchedFiltering 32.4DiophantineInverseFilteringUsingtheMultiple Input-Output(MINT)Model 32.5Results SpeakerIdentification 32.6Summary References 32.1 Introduction:DereverberationUsingMicrophone Arrays Anacousticenclosureusuallyreducestheintelligibilityofthespeechtransmittedthroughitbecause thetransmissionpathisnotideal.Apartfromthedirectsignalfromthesource,thesoundisalso reflectedoffoneormoresurfaces(usuallywalls)beforereachingthereceiver.Theresultingsignalcan beviewedastheoutputofaconvolutioninthetimedomainofthespeechsignalandtheroomimpulse response.Thisphenomenonaffectsthequalityofthetransmittedsoundinimportantapplications suchasteleconferencing,cellulartelephony,andautomaticvoiceactivatedsystems(speakerand speechrecognizers).Roomreverberationcanbeperceptuallyseparatedintotwobroadclasses.Early roomechoesaremanifestedasirregularitiesor“ripples”intheamplitudespectrum.Thiseffect dominatesinsmallrooms,typicallyoffices.Long-termreverberationistypicallyexhibitedasan echo“tail”followingthedirectsound[1]. IfthetransferfunctionG(z)ofthesystemisknown,itmightbepossibletoremovethedeleterious multi-patheffectsbyinversefilteringtheoutputusingafilterH(z)where H(z)= 1 G(z) . (32.1) TypicallyG(z)isthetransformoftheimpulseresponseoftheroomg(n).Ingeneral,thetransfer functionofareverberantenvironmentisanon-minimumphasefunction,i.e.,allthezerosofthe functiondonotnecessarilylieinside|z|=1.Aminimumphasefunctionhasastablecausalinverse, whiletheinverseofanon-minimumphasefunctionisacausaland,ingeneral,infiniteinlength. c  1999byCRCPressLLC In general, G(z) can be expressed as a product of a minimum-phase function and a non-minimum phase function: G(z) = G min (z) · G max (z) . (32.2) Many approacheshavebeen proposed for dereverberating signals. The aim of all the compensation schemes is to bring the impulse response of the system after dereverberation as close as possible to an impulse function. Homomorphicfiltering techniques wereused to estimate the minimum phase part of G(z) [2, 3]. In [2], the minimum phase component was estimated by zeroing out the cepstrum for negativefrequencies. Thentheoutputsignalwasfilteredbythe inverseofthe minimum phasetransfer function. But this technique still did not remove the reverberation contributed by the maximum- phase part of the room response. In [3], the inverse of the maximum-phase part was also estimated from the delayed and truncated version of the acausal inverse. But, the delay can be inordinate and care must be taken to avoid temporal aliasing. An alternate approach to dereverberation is to calculate, in some form, the least squares estimate of the inverse of the transmission path, i.e., calculate the least squares solution of the equation h(n) ∗ g(n) = d(n) , (32.3) where d(n) is the impulse function and ∗ denotes convolution. Assuming that the system can be modeled by an FIR filter, Eq. (32.3) can be expressed in matrix form as:              g(0) g(1)g(0) . . .g(1) ··· 0 g(m) . . . ··· g(0) 0 g(m) ··· g(1) 00··· . . . g(m)                   h(0) h(1) . . . h(i)      =      1 0 . . . 0      , (32.4) or, GH = D, (32.5) where D is the unity matrix and G, H and D are matrices of appropriate dimensions as shown in Eq. (32.4). The least squares method finds an approximate solution given by ˆ H(z) =  G T G  −1 G T D. (32.6) Thus, the error vector can be written as  =[D − G ˆ H ] =[I − G  G T G  −1 G T ]D = ED , where E =[I − G(G T G) −1 G T ]. The mean square error or the energy in the error vector is |||| 2 =||ED|| 2 ≤|E|||D|| 2 ≤ λ max λ min ||D|| 2 , (32.7) where |E| is the norm of E and λ max and λ min are the maximum and minimum eigenvalues of E. The ratio between the maximum and minimum eigenvalues is called the condition number of a matrix and it specifies the noise amplification of the inversion process [4]. c  1999 by CRC Press LLC FIGURE 32.1: Modeling a room with a microphone array as a multiple output FIR system. Typically, the operation is done on the full-band signal. Sub-band approaches have been proposed in [5, 7, 8]. All these approaches use a single microphone. The amplitude spectrum of the room response has “ripples” which produce pronounced notches in the signal output spectrum. As the location of the microphone in the room changes, the room response for the same source changes and, as a result, the position of the notches in the amplitude spectrum varies. This property was used to advantage in [1]. In this method, multiple microphones were located in the room. Then, the output of each microphone was divided into multiple bands of equal bandwidth. For each band, by choosing the microphone whose output has the maximum energy, the ripples were reduced. In [9], the signals from all the microphones in each band were first co-phased, and then weighted by a gain calculated from a normalized cross-correlation function calculated based on the outputs of different microphones. Since the reverberation tails are uncorrelated, the cross-correlation-based gain turned off the tail of the signal. These techniques have had modest success in combating reverberation. In recent years, great progress has been made in the quality, availability, and cost of high performance microphones. Fast digital signal processors that permit complex algorithms to operate in real time have been developed. These advances have enabled the use of large microphone arrays that deploy more sophisticated algorithms for dereverberation. Figure 32.1 shows a generic microphone array system which can “invert” the room acoustics. Different choices of H i (z) lead to different algorithms, each with their own advantages and disadvantages. In this report, we shall discuss single and multiple beamforming, matched filtering, and Diophantine inverse filtering through multiple input-output (MINT) modeling. In all cases we assume that the source location and the room configuration or, alternatively, the G i (z)s, are known. 32.2 Simple Delay-and-Sum Beamformers Arrays that form a single beam directed towards the source of the sound have been designed and built[11]. Inthesesimple delay-and-sumbeamformers, theprocessingfilter hasthe impulse response h i (n) = δ(n − n i ), (32.8) where n i = d i /c, d i is the distance of the ith microphone from the source and c is the speed of sound in air. Sound propagation in the room can be modeled by a set of successive reflections off the surfaces (typically the walls) [10]. Figure 32.2 illustrates the impulse response of a single c  1999 by CRC Press LLC beamformer. The delay at the output of each microphone coheres the sound that arrives at the microphone directly from the source. It can be seen from Fig. 32.2 that in the resulting response, the strength of the coherent pulse is N and there are N(K − 1) distributed pulses. So, ideally, the signal-to-reverberant noise ratio (measured as the ratio of undistorted signal power to reverberant noise power) is N 2 /N(K − 1) [13]. In a highly reverberant room, as the number of images K increases towards infinity, the SNR improvement, N/K − 1, falls to zero. FIGURE 32.2: A single beamformer. (Source: Flanagan, J.L., Surendran, A.C., and Jan, E.-E., Spatially selective sound capture for speech and audio processing, Speech Commun., 13: 207–222, 1993. With kind permission of Elsevier Science - NL, Sara Burgerhartstraat 25, 1055 KV Amsterdam, The Netherlands). The single-beamforming system reported in [11] can automatically determine the direction of the source and rapidly steer the array. But, as the beam is steered away from the broadside, the system exhibits a reduction in spatial discrimination because the beam pattern broadens [12]. Further, beamwidth varies with frequency, so an array has an approximate “useful bandwidth” given by the upper and lower frequencies [12]: f upper = c d| cos φ − cos φ  | max , (32.9) and f lower = f upp er N , (32.10) where c is the speed of sound in air, N is the number of sensors in the array, d is the sensor spacing, φ  is the steering angle measured with respect to the axis of the array, and φ is the direction of the source. c  1999 by CRC Press LLC For example, consider an array with seven microphones and a sensor spacing of 6.5 cm. Further, suppose the desired range of steering is ±30 ◦ from broadside. Then, | cos φ −cos φ  | max = 1.5 and hence f upp er ≈ 3500Hzand f lower ≈ 500Hz. So, to cover the bandwidth of speech, say from 250 Hz to 7 kHz, three harmonically nested arrays of spacing 3.25, 6.5, and 13 cm can be used. Further, the beamwidth also depends on the frequency of the signal as well as the steering direction. If the beam is steered to an angle φ  , then the direction of the source for which the beam response falls to half its power is [12] φ 3dB = cos −1  cos φ  ± 2.8 Nωd  , (32.11) where ω = 2πf and f is the frequency of the signal. Equation 32.11 shows that the smaller the array, the wider the beam. Since most of the energy of a typical room interfering noise lies at lower frequencies, it would be advantageous to build arrays that have higher directivity (smaller beamwidth) at lower frequencies. This, combined with the fact that the array spacing is larger for lower frequency bands, gives yet another reason to harmonically nest arrays (see Fig. 32.3). FIGURE 32.3: Harmonically nested array that covers three frequency ranges. Just as linear one-dimensional arrays display significant fattening of the beams when steered towards the axis of the array, two-dimensional arrays exhibit widening of the beams when steered at angles acute to the plane of the array. Three-dimensional microphone arrays can be constructed [13] that have essentially a constant beamwidth over 4π steradians. Multiple beamforming using three- dimensionalarrays ofsensors notonly providesselectivity in azimuthand elevation butalso selectivity in the direction of the beam, i.e., it provides range selectivity. The performance of single beamformers can degrade severely in the presence of other interfering noise sources, especially if they fall in the direction of the sidelobes. This problem can be mitigated using adaptive arrays. Adaptive arrays are briefly discussed in the next section. 32.2.1 A Brief Look at Adaptive Arrays Adaptive signal processing techniques can be used to form a beam at the desired source while si- multaneously forming a null in the direction of the interfering noise source. Such arrays are called c  1999 by CRC Press LLC “adaptive arrays”. Though adaptive arrays are not effective under conditions of severe reverberation, they are included here because problems in adaptive arrays can be formulated as inverse problems. Hence, we shall discuss adaptive arrays briefly without providing a quantitative analysis of them. Broadband arrays have been analyzed in [14, 15, 16, 17, 18, 19]. In all these methods, the direction of arrival of the signal is assumed to be known. Let the array have N sensors and M delay taps per sensor. If X(k) =[x 1 (k) .x i (k) .x NM (k)] T (see Fig. 32.4) is the set of signals observed at the tap points, then X(k) = S(k) + N(k), where FIGURE 32.4: General form of an adaptive filter. S(k) is the contribution of the desired signal at the tap points and N(k) is the contribution of the unknown interfering noise. The inputs to the sensors, x (jM+1) (k), j = 0, .,(N − 1), are the noisy versions of g(k), the actual signal at the source. Now, the filter output y(k) = W T X(k),where W T =[w 11 , .,w 1M ,w 21 , .,w 2M , .,w N1 , .,w NM ] is the set of weights at the tap points. The goal of the system is to make the output y(k)as close as possible to the source g(k). One way of doing this is to minimize the error E{(g(k) − y(k)) 2 }. The weight W ∗ that achieves this least mean square (LMS) error is also called the Weiner filter, and is given by W ∗ = R −1 XX C gX , (32.12) where R XX is the autocorrelation of X(k) and C gX is the set of cross-correlations between g(k) and each element of X(k).Ifg(k) and N(k)are uncorrelated, then C gX = E{g(k)X(k)}=E{g(k)S(k)}+E{g(k)N(k)} = E{g(k)S(k)} and R XX = E{X(k)X T (k)}=E{(S(k) + N(k))(S(k) + N(k)) T } = R SS + R NN , where R SS and R NN are the autocorrelation matrices for the signal and noise. Usually R NN is not known. In such cases, the exact inverse cannot be calculated and an iterative approach to update the weights is needed. In Widrow’s approach [15], a known pilot-signal g(k) c  1999 by CRC Press LLC is injected into the array. Then, the weights are updated using the Widrow-Hopf algorithm that increments the weight vector in the direction of the negative gradient of the error: W k+1 = W k + µ[g(k) − y(k)]X(k), where W k+1 is the weight vector after the kth update and µ is the step size. Griffiths’ method also uses the LMS approach, but minimizes the mean square error based on the autocorrelation and the cross-correlation values between the input and the output, rather than the signals themselves. Since the mean square error can be written as E{ ( g(k) − y(k) ) 2 }=R gg − 2C T gS W + W T R XX W, where R gg isthe auto-correlationmatrix of g(k)and C gS isthe setof cross-correlation matrix between g(k) and each element of S(k), the weight update can also be done by W k+1 = W k + µ[C gS − R XX W k ] (32.13) = W k + µ[C gS − X(k)X T (k)W k ] (32.14) = W k + µ[C gS − y(k)X(k)] . (32.15) In the above methods, significant distortion is observed in the primary beam due to null-steering. Constrained LMS techniques which place constraints on the performance of the main lobe can be used to reduce distortion [18, 19]. By specifying the broad-band response and the array beam characteristicsas constraints, morerobustbeams can be formed. The problemnow canbe formulated as an optimization technique that minimizes the output power of the system. Given that the output power is E  y 2 (k)  = E  W T X(k)X T (k)W  = W T R XX W = W T R SS W + W T R NN W, if W can be chosen such that W T R NN W = 0, the noise can be eliminated. It was proposed [18] that once the array is steered towards the source with appropriate delays, minimizing the output power is equivalent to removing directional interference, since in-phase signals add coherently. In an accurately steered array, the wavefronts arriving from the direction of steering generate identical signals at each sensor. Hence, the array may be collapsed to a single sensor implementation which is equivalent to an FIR filter [18], i.e., the columns of the broadband array sum to an FIR filter. Additional constraints can be placed on this FIR filter. If the weights of the filters can be written as a matrix: ˆ W =    w 11 w 12 . w 1M . . . . . . . . . . . . w N1 w N2 . w NM    , then it can be specified that  N i=1 w ij = f j ,j= 1, .,M,wheref j ,j= 1, .,M are the taps of an FIR filter that provides the desired filter response. Hence, using this method, directional interference can be suppressed by minimizing the output power and spectral interference can be suppressed by constraining the columns of the weight coefficients. Thus, the problem can be formulated as Minimize: W T R XX W (32.16) subject to: C T W = F, (32.17) c  1999 by CRC Press LLC where F is the desired FIR filter and C =      100 . 0100 . 0 . 100 . 0 010 . 0010 . 0 . 010 . 0 . . . . . . . . . . . . 000 . 1 000 . 1 . 000 . 1      . (32.18) C has M rows with NM entries on each row. The first row of C in Eq. 32.18 has ones in positions 1,(M+1), .,(N−1)∗M +1; the secondrow has ones in positions2,(M+2), .,(N−1)∗M +2, etc. Equation 32.17 can be solved using Lagrange multipliers [18]. This optimization problem can alternatively be posed as an inverse problem. 32.2.2 Constrained Adaptive Beamforming Formulated as an Inverse Problem Using a similar cost function and the same constraint, the system can be formulated as an inverse problem [19]. The function to be optimized, W T R XX W = 0, can be approximated by X T W = 0. This, combined with the constraint in Eq. 32.17 is written as:      x 1 . x M . x (N−1)∗M+1 . x N∗M 1 . 0 . 1 . 0 . . . . . . . . . 0 . 1 . 0 . 1      ∗              w 11 . . . w 1M . . . w N1 . . . w NM              =      0 f 1 . . . f M      , (32.19) AW = F (32.20) This equation can be solved with any techniquethat can invert a matrix. There areseveral problems in solving Eq. 32.20. In general, the equation can be inconsistent. In addition, the system is rank deficient. Further, traditional methods used to solve Eq. 32.20 are not robust to errors such as round- off errors in digital computers, measurement inaccuracies, and noise corruption. In the least squares solution (Eq. 32.6), the noise amplification is dictated by the condition number of the error matrix, i.e., the ratio of the highest and the lowest eigenvalues of E. In the extreme case when λ min = 0, the system is rank-deficient. In such cases, the pseudo-inverse solution can be used. c  1999 by CRC Press LLC Any matrix A can be written using the singular value decomposition as A = UDV T , where D =      σ 1 0 . 0 0 σ 2 . 0 . . . . . . . . . . . . 00 . σ N      , then, A −1 = VD −1 U T , where D −1 =       1 σ 1 0 . 0 0 1 σ 2 . 0 . . . . . . . . . . . . 00 . 1 σ N       . σ 2 i ,i= 1, .,Nare the eigenvalues of AA T . The matrices U and V aremadeupoftheeigenvectors of AA T and A T A, respectively. Extending this definition to rank-deficient matrices, the pseudo-inverse can be written as A † = VD † U T , where D † =        1 σ 1 0 . 0 0 1 σ 2 . 0 00 . 1 σ r . 0 0        , where r is the rank of the matrix A. The rank-deficient system has infinite number of solutions. The pseudo-inverse solution can be shown to be the least squares solution with minimum energy. It can also be viewed as the projection of the least squares solution in the range space of A. An iterative technique called the Row Action Projection (RAP) algorithm [4, 19] can be used to solve Eq. 32.20. Row Action Projection An effective way to find a solution for Eq. 32.20 is to use the RAP method [4], which has been shown to be effective in providing a fast and stable solution to a system of simultaneous equations. Traditional least squares methods need a block of data to calculate the estimate. Most of these methods demand a lot of memory and processing power. RAP operates on only one row at a time, which makes it a useful sample-by-sample method in adaptive signal processing. Further, the matrix A in Eq. 32.20 is a sparse matrix. RAP has been shown to be effective in solving systems with sparse matrices [4]. For a given system of equations, a 01 w 1 + a 02 w 2 + .+ a 0,NM w NM = f 0 a 11 w 1 + a 12 w 2 + .+ a 1,NM w NM = f 1 . = . a M1 w 1 + a M2 w 2 + .+ a M,NM w NM = f M , c  1999 by CRC Press LLC [...]... an exact inverse to the room transfer function 32. 4 Diophantine Inverse Filtering Using the Multiple Input-Output (MINT) Model Miyoshi and Kaneda [23] proposed a novel method to find the exact inverse of a point in a room by using multiple inputs and outputs, each input-output pair modeled by an FIR system For example, a two-input single-output system is described by the two speaker-to-single -microphone. .. obtaining the same SNR gain, the exact inverse requires a lesser number of microphones than either the matched filter or the multiple beamformer The Diophantine inverse filtering method does not suffer from the effects of spatial aliasing that may affect traditional beamformers using periodically spaced microphones Finding the exact inverse is also more computationally intensive than matched filtering... Press LLC 32. 6 Summary Microphone arrays can be successfully used in “inverting” room acoustics A simple single beamformer is not effective in combating room reverberation, especially in the presence of interfering noise sources Adaptive algorithms that project a null in the direction of the interferer can be used, but they introduce significant distortion in the main signal Constrained adaptive arrays. .. capability in severely reverberant environments Processing algorithms such as multiple beamforming and matched filtering, combined with three-dimensional array of sensors, though only providing an approximation to the inverse, give robust dereverberant systems that provide selectivity in a spatial volume and thus immunity from interfering noise sources An exact inverse using Diophantine inverse filtering using... similar to the ones in this report and the data sets used for training and testing were the same The performance under matched conditions for close talking microphone was 94.7% and for the matched filtered output was 88.4% In the presence of an interfering source producing Gaussian noise at 15 dB signal-to-competing noise ratio levels, the performance when trained on close talking microphone and tested... for a source located at the focus and minimizes the output power for all other sources thus providing lower signal-to-noise ratio improvement, but higher levels of spatial discrimination c 1999 by CRC Press LLC FIGURE 32. 12: Response of the Diophantine inverse filtering system (the delay involved is not shown) FIGURE 32. 13: Response of the Diophantine inverse filtering system for a source located away... with the exact inverse discussed in the next section The power of matched filtering in mitigating reverberation and suppressing interfering noise is demonstrated through examples in Section 32. 5 Figure 32. 11 shows the response of a matched filter system It is clear that the matched filter response is similar to, but cannot be exactly, an ideal impulse, i.e., it cannot provide an exact inverse to the room... dB signal-to-competing noise ratio levels was introduced at (3.0,5.0,1.0)m, the performance on the output of the exact inverse filtering system (9.5%) was worse than the single c 1999 by CRC Press LLC FIGURE 32. 11: Response of a matched filtering system for a source located at the focus microphone (14.2%) Under matched training and testing conditions, the performance of the exact inverse system was significantly... (see Fig 32. 8) Hence, the system can adapt to the varying conditions without having to recalculate the FIR filters Figure 32. 8 shows the rate of convergence of the RAP algorithm when the number of microphones in the array is varied The results suggest that increasing the number of microphones used in the array increases the speed of convergence and also provides more accurate results 32. 5 Results In this... exact inverse filters c 1999 by CRC Press LLC For comparison, the SNR gains of a single beamforming, multiple beamforming, and matched filter linear arrays using five microphones are presented below The multiple beamformer has one beam directed at each image of the source Method SNR Single beamformer Multiple beamformer Matched filter -1 dB 11 dB 13 dB Figure 32. 9 shows the impulse response of the room using . formingFormulatedasanInverseProblem • MultipleBeam- forming 32. 3MatchedFiltering 32. 4DiophantineInverseFilteringUsingtheMultiple Input-Output(MINT)Model 32. 5Results. C. Surendran. Inverse Problems in Microphone Arrays. ” 2000 CRC Press LLC. <http://www.engnetbase.com>. InverseProblemsinMicrophone Arrays A.C.Surendran

Ngày đăng: 25/12/2013, 06:16

Xem thêm: Tài liệu 32 Inverse Problems in Microphone Arrays docx, Tài liệu 32 Inverse Problems in Microphone Arrays docx

Tài liệu 32 Inverse Problems in Microphone Arrays docx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan