Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids

A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation... IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 1549 A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids Hala As’ad , Martin Bouchard , and Homayoun Kamkar-Parsi Abstract—In this paper, a binaural beamforming algorithm for challenges in understanding and separating speech in noisy en- hearing aid applications is introduced. The beamforming algorithm vironments [1]–[3]. is designed to be robust to some error in the estimate of the tar- For noise reduction, single channel processing algorithms, get speaker direction. The algorithm has two main components: which rely on frequency and temporal information of the input a robust target linearly constrained minimum variance (TLCMV) signals, have been extensively researched such as in [4], [5]. algorithm based on imposing two constraints around the estimated direction of the target signal, and a post-processor to help with However, single channel algorithms suffer from several limi- the preservation of binaural cues. The robust TLCMV provides tations under low-SNR acoustic scenarios, especially for non- a good level of noise reduction and low level of target distortion stationary noise and multi-talkers conditions. Single channel so- under realistic conditions. The post-processor enhances the beam- lutions typically also introduce distortion and do not provide true former abilities to preserve the binaural cues for both diffuse-like speech intelligibility improvement. A notable exception is the background noise and directional interferers (competing speakers), while keeping a good level of noise reduction. The introduced algo- solution in [6] which has been found to improve speech intel- rithm does not require knowledge or estimation of the directional ligibility. The solution in [6] is based on deep neural networks interferers’ directions nor the second-order statistics of noise-only and a binary masking of some speech components in the T-F components. The introduced algorithm requires an estimate of the domain. This solution, however, does not preserve naturalness target speaker direction, but it is designed to be robust to some of the target speaker speech (high distortion), which is a concern deviation from the estimated direction. Compared with recently proposed state-of-the-art methods, comprehensive evaluations are for its use in hearing aids. It has also not been developed for the performed under complex realistic acoustic scenarios generated in case of one or two competing talkers. both anechoic and mildly reverberant environments, considering a As an alternative, microphone array processing (beamform- mismatch between estimated and true sources direction of arrival. ing) has been widely used in modern hearing aids, leading to Mismatch between the anechoic propagation models used for the directionally sensitive hearing aids [7]. Binaural hearing aids design of the beamformers and the mildly reverberant propagation models used to generate the simulated directional signals is also have also recently been introduced in the market. Binaural hear- considered. The results illustrate the robustness of the proposed ing aids have a hearing aid device at each ear, each possibly algorithm to such mismatches. equipped with multiple microphones, and the devices are ca- pable to transmit signals or information from one side to the Index Terms—Robust LCMV, propagation model mismatch, steering vector mismatch, binaural cues preservations, noise other through a “binaural wireless link”. Microphone arrays can reduction, binaural hearing aids. provide good noise reduction with low distortion, and the use of additional microphones and different microphone geome- try in binaural hearing aids can lead to further improvements I. INTRODUCTION in the directional response, compared to monaural single-sided HEARING aid is a common and effective solution to sen- beamforming. However, even binaural hearing aids have still not sorineural hearing loss. Despite enormous advances in achieved the required robustness in case of real-life complex en- hearing aid technology, the performance of hearing aids under vironments [8]. The performance of binaural beamformers can noisy environments remains one of the most common complaints be significantly affected by a mismatch or an error between the from hearing aid users [1], [2], and hearing-impaired people face target source propagation model assumed for the beamformer design and the actual physical target source propagation [9], Manuscript received December 19, 2018; revised April 24, 2019 and June [10]. This includes errors in the estimated target direction of 16, 2019; accepted June 17, 2019. Date of publication June 21, 2019; date of arrival (DOA) used in the beamformer algorithms, i.e., target current version July 1, 2019. This work was supported in part by a Natural Sci- ences and Engineering Research Council Discovery grant. The associate editor DOA mismatch. This kind of mismatch can be generated from coordinating the review of this manuscript and approving it for publication was imperfect target DOA estimation schemes, from small head Prof. Simon Doclo. (Corresponding author: Martin Bouchard.) movements of the hearing aid user, and from multipath prop- H. As’ad and M. Bouchard are with the School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada agation. To address this problem, several acoustic beamforming (e-mail: hasad056@uottawa.ca; martin.bouchard@uottawa.ca). methods robust to the mismatch in target propagation models H. Kamkar-Parsi is with WS Audiology 91058, Erlangen, Germany (e-mail: have been introduced in the literature [11]–[21], and some of homayoun.kamkarparsi@sivantos.com). Digital Object Identifier 10.1109/TASLP.2019.2924321 these solutions are not specifically for binaural hearing aids. This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ 1550 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Unfortunately, most of the previous work rely on sophisticated performance when the number of sources increases. A relaxed Voice Activity Detection (VAD), speech presence probability version of the joint BLCMV has been proposed in [35]. In this estimation, and/or SNR estimation. These can become diffi- relaxed BLCMV, tunable parameters have been used for each cult to measure in complicated multi-talker reverberant envi- directional interferer, in order to separately control the trade-off ronments, with speakers having variable activity patterns. An between the binaural cues preservation and the noise reduction interesting solution for hearing aids based on inequality con- for each interferer. The BLCMV and all its extensions, i.e., [31]- strained optimization has been proposed in [22] and discussed [35], require knowledge of the propagation models for the di- in [23], to increase the robustness to target DOA mismatch. rectional interferers in addition to the target source (directivity However, since this design uses extra constraints for directional vectors, steering vectors, Relative Acoustic Transfer Functions sources to increase robustness to DOA mismatch, this can lead (RATF)). As will be shown in this paper, this can limit the per- to low degrees of freedom available for residual noise reduc- formance of these approaches, as they suffer from errors in the tion (e.g., low number of adaptive “nulls”) in case of limited estimated propagation models. In addition, the BLCMV and its number of available microphones signals. In addition, it re- variations do not have the ability to preserve the binaural cues quires an estimation of the DOA for the directional interferer of the diffuse-like background noise. As an attempt to design a sources. BLCMV beamformer that does not depend on the propagation All the beamforming designs in [11]–[21] were not designed models (and directions of arrival) of the directional sources, a set to preserve the binaural cues of the residual directional inter- of pre-determined RATFs distributed around the head have been ferers and diffuse-like noise in the binaural output signals. Sev- used for beamforming design in [36]. Each RATF is responsi- eral binaural beamforming solutions have been introduced to ble for preserving the binaural cues of the directional sources preserve some of the binaural cues of these components, while coming from certain directions. Increasing the number of pre- also preserving the target signal and achieving a good noise re- determined RATFs decreases the effect of the mismatch between duction level. Under some assumptions (e.g., accurate direction the true and the pre-determined RATFs, but it also requires a of arrival estimates), binaural beamforming processing such as larger number of microphones in order to achieve a good per- the second and third methods in [24] can provide directional formance. However, in hearing aids applications, only a small noise reduction and preserve the binaural cues of the target sig- number of microphones are normally available for the binaural nal and the directional interferers, depending on the number of beamformer. available microphones. However, this binaural beamforming is In order to preserve the binaural cues for directional inter- not designed to preserve the binaural cues of the diffuse-like ferers and diffuse-like noise components without a knowledge background noise. The Multichannel Wiener Filter (MWF) is of the propagation model of the directional interferers, a binary the basis of several proposed solutions that aim to preserve the decision/classifier algorithm common to the left and right beam- binaural cues. Extensions of the MWF have been proposed in former outputs for each time-frequency (T-F) bin was proposed [25]–[29] as attempts to preserve the binaural cues for the dif- in [37], [38]. A challenge for this classification algorithm is its ferent acoustic scene components. A potential challenge for the applicability in low input SNR environments, as most T-F bins MWF and its extensions is the need for an accurate estimate of can be classified as noise-dominant, resulting in low SNR im- the second order statistics for the noise-only components, which provement and an attenuated target output, as illustrated in [39]. can be difficult to achieve in complex acoustic environments, for As an attempt to enhance the performance of this method, the example multiple talkers with time-varying activity patterns and classification mechanism was later modified to use the output statistics. Detailed information of the MWF and its extensions SNR instead of the input SNR [40]. However, this method re- can be found in [30]. quires an estimation of the second order statistics of the noise The Binaural Linearly Constrained Minimum Variance and the target components, which, as previously described, can method (BLCMV) has been introduced in [31] and a comprehen- be challenging in some real-life time-varying multi-talker envi- sive theoretical analysis has been provided in [32]. The BLCMV ronments. In our recent work [41], an algorithm based on clas- is capable to provide a good trade-off between noise reduc- sification and mixing of binaural signals at each T-F bin was in- tion and cues preservations for a limited number of interferer troduced. Three classification criteria were proposed, based on sources. As an attempt to enhance the noise reduction abilities the power, power difference, and complex coherence computed of the BLCMV, an optimal BLCMV has also been proposed from: 1) binaural beamformer output signals (with good level of in [33]. However, the optimal BLCMV is capable to preserve noise reduction) and 2), original binaural noisy signals (or al- the binaural cues for just one directional interferer as well as ternatively, other binaural signals with cues preserved but with the target source. As another variation of the BLCMV, joint an intermediate level of noise reduction [42], [43]). The com- BLCMV, which jointly estimates the left and right beamformers plex coherence criterion provided better noise reduction over the of two hearing aids, has been introduced in [34] in order to en- other classification criteria. hance the binaural cues preservations abilities of the BLCMV. In this work, we contribute in 1) designing a binaural beam- The joint BLCMV needs one constraint per interferer to pre- former which is robust to mismatch in target propagation models, serve the binaural cues of the interferers, unlike the BLCMV 2) proposing a modified post-processor method preserving the which uses two constraints to preserve each interferer. However, binaural cues of all acoustic scene components (target, diffuse- since a limited number of microphones are available in binaural like background, directional interferers), with a good tradeoff hearing aids, the joint BLCMV can still face a degradation of between noise reduction and cues preservations. For the first AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1551 contribution, we introduce the Robust TLCMV which is robust the binaural wireless links: to a mismatch (error) in the directivity vector assumed for the tar- y (f, t)= x (f, t)+ v (f, t)+ n (f, t) (1) m in,m in,m in,m get signal. This is achieved by designing a binaural beamformer with a wider beam around the estimated target direction. For the where m is the microphone index, and second contribution, the binaural cues preservation are achieved by using a simplified and improved version of the coherence- m =1, ..., M/2,M/2+1, ..., M . based post-processor method in [41], for classification and mix- left side right side ing of binaural signals. Both the proposed Robust TLCMV and The front left (FL) microphone has index m =1, and the the post-processor do not rely on any assumption for the prop- front right (FR) microphone has index m = M/2+1. These agation model (or DOAs) of the interferers (competing speak- microphones are the reference microphone for the left- ers). The proposed TLCMV with post-processor is also found side beamformer and the right-side beamformer, respectively. to be robust to both target DOA mismatch and mismatch be- x ,v , and n at the mth microphone are the tar- tween the anechoic propagation model used for the beamformer in,m in,m in,m get speaker, the sum of directional interferer speakers, and the design and the mildly reverberant propagation models used to diffuse-like background noise components, respectively. f is the generate the directional signals in the simulations. The proposed frequency index and t is the time (frame) index. solution does not rely on target VAD detection, speech proba- By stacking the input microphone signals in M dimensional bility presence estimation, or SNR estimation, which can be dif- vectors, the input signals from the left and right microphones ficult to compute in complex real-life time-varying multi-talker can be written as in (2): environments. In order to study the robustness of the proposed algorithm y(f, t)= x(f, t)+ v(f, t)+ n(f, t) (2) to different types of mismatches, comprehensive validations are conducted through simulations using acoustic scenarios gener- where, y(f, t)=[y (f, t),y (f, t), ..., y (f, t)] 1 2 M ated in mildly reverberant environments and anechoic environ- ments (i.e., with and without mismatch in the sets of directiv- x(f, t)=[x (f, t),x (f, t), ..., x (f, t)] in,1 in,2 in,M ity/steering vectors), and for scenarios with and without DOA v(f, t)=[v (f, t),v (f, t), ..., v (f, t)] in,1 in,2 in,M mismatch. Comparisons are performed with the recently pro- posed state-of-the-art Binaural Minimum Variance Distortion- T n(f, t)=[n (f, t),n (f, t), ..., n (f, t)] . in,1 in,2 in,M less Response (BMVDR) beamformer, the BMVDR with partial noise estimation (BMVDR-n) beamformer [29], [44], [45] and Assuming that s is a target source signal coming from angle with the BLCMV beamformers which uses constraints to atten- θ , and s is the ith interferer source signal (competing talker) x vi uate interferers. coming from angle θ , the target component and the sum of vi This paper is organized as the following. Section II provides the directional interferer components at the microphones can be a detailed description of the system notations and the beam- written in terms of the directivity vectors d(f, θ) as in (3) and forming microphone configurations that are used throughout (4), respectively: this paper. Section III provides a summary of the previously x(f, t)= d(f, θ )s (f, t) (3) x x proposed BLCMV, BMVDR and BMVDR with partial noise estimation algorithms. Section IV provides some detailed in- N formation about the new beamforming algorithm and the post- v(f, t)= d(f, θ )s (f, t). (4) vi vi processing algorithm proposed in this work. Section V explains i=1 the performance metrics used in this work. Section VI explains The vector d(f, θ )=[d (f, θ ), ..., d (f, θ )] is the tar- x 1 x M x the experimental setup. Finally, Section VII provides the simu- get directivity vector, which is the frequency response between lation results of the proposed algorithms, and performance com- the target source and each microphone. Likewise, the vector parisons with the state-of-the-art algorithms. d(f, θ )=[d (f, θ ), ..., d (f, θ )] is the interference di- vi 1 vi M vi rectivity vector for source s , which is the frequency response vi between the interference source s and each microphone. N is vi II. SYSTEM NOTATIONS AND REFERENCE the number of directional interferers in the acoustic scenario con- BEAMFORMING PROCESS sidered. In hearing aids, the directivity vectors include the head A. System Notations shadow effect and other head/ear related effects (e.g., pinnae filtering), therefore Head-Related Transfer Functions (HRTFs) Binaural hearing aid units with two microphone arrays of are used for the directivity vectors in the beamformer designs. M/2 microphones at each ear, i.e., M microphones in total, and The input target signal at the reference microphone x (f, t) ref ideal binaural wireless links between the units (no jitter, delay, can be defined as in (5): packet loss, etc.) are considered. A Short Time Fourier Trans- form (STFT) is used in order to represent the input signals in x (f, t)= d (f, θ )s (f, t). (5) ref ref x x the Time-Frequency (T-F) domain. The input noisy microphone signals in the T-F domain can be written as in (1), with the micro- If the reference microphone is the FL, then x (f, t)= ref phone signals transmitted from one side to the other side through x (f, t). If the reference microphone is the FR, then in,1 1552 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 (10) and (11), to generate the left and right beamformer outputs z (f, t) and z (f, t), respectively. The binaural beamformer on l r the left side aims to extract the target signal as received at the FL microphone (i.e., using the FL microphone as a reference microphone). The binaural beamformer on the right side aims to extract the target signal as received at the FR microphone (i.e., using the FR microphone as a reference microphone). z (f, t)= w (f, t)y(f, t) (10) l l Fig. 1. 2 + 2 microphone configuration (dotted lines represent signals trans- mitted through a wireless link). z (f, t)= w (f, t)y(f, t) (11) x (f, t)= x (f, t). Likewise, d (f, θ ) is the tar- ref in,M/2+1 ref x III. REVIEW OF PREVIOUS BINAURAL BEAMFORMING get directivity vector (or HRTF) at the reference micro- ALGORITHMS phone, with d (f, θ )= d (f, θ ) if FL, and d (f, θ )= ref x 1 x ref x In this section, we review the BLCMV, the BMVDR (which d (f, θ ) if FR. M/2+1 x is a special case of the BLCMV) and the BMVDR extension A correlation matrix for the target component can be defined with partial noise estimation (BMVDR-n). as in (6): R (f)= E{x(f, t)x (f, t)} A. Binaural LCMV (BLCMV) H ∗ = E{d(f, θ )s (f, t)d (f, θ )s (f, t)} (6) x x x The BLCMV [31], [32] is a general form of the BMVDR [24], H 2 = d(f, θ )d (f, θ )E{|s (f, t)| }. x x x where both of these beamformers are based on the constrained minimization of the beamformer output power. However, the The superscript H refers to “Hermitian” which is the complex BLCMV is derived under multiple linear constraints, including conjugate transpose, the superscript “∗” refers to the complex a unity gain constraint in the target signal direction, which is also conjugate, and E{.} refers to the expectation operator. Similarly, used in the BMVDR. In the BLCMV, having multiple constraints the correlation matrix of the sum of directional interferer com- means that small gains are specified in directions corresponding ponents can be defined as in (7) and (8). Directional interferers to interferer sources. The left and right beamformer coefficients are assumed to be uncorrelated with each other. can be derived by the following constrained minimizations in R (f) (12) and (13), respectively. For simplicity, the f and t index are omitted here. = E{v(f, t)v (f, t)} ⎧ ⎫ H H min w (R )w subject to C w = g (12) N N l ⎨ ⎬ y l l l = E d(f, θ )s (f, t) d(f, θ )s (f, t) vi vi vi vi H H ⎩ ⎭ min w (R )w subject to C w = g (13) i=1 i=1 y r r r The constraint matrix C includes the directivity vectors H 2 = d(f, θ )d (f, θ )E{|s (f, t)| } (7) vi vi vi (HRTFs) of each constraint direction, i.e., C =[d(f, θ ), d i=1 (f, θ ), ..., d(f, θ )]. The left gain vector is g =[ςd (f, θ ), v1 vk l 1,l x The correlation matrix of the diffuse-like background noise ηd (f, θ ), ..., ηd (f, θ )] and the right gain vector is 1,l v1 1,l vk component is defined as in (8): g =[ςd (f, θ ),ηd (f, θ ), ..., ηd (f, r M/2+1,r x M/2+1,r v1 M/2+1,r θ )] . The scalars ς and η should be in the range between 0 and vk R (f)= E{n(f, t)n (f, t)}. (8) 1. In order to guarantee the near distortionless response of the Assuming that the target component, the sum of directional target, ς should be close to 1. The value of η controls the noise interferer components, and the diffuse-like background noise reduction level. The number of constraints k available for the component are uncorrelated, the correlation matrix of the input interferers depends on the number of available microphones, noisy signals can be written as in (9): such that k ≤ M − 2. In this work, as the 2 + 2 microphone configuration is used, k ≤ 2. In other words, assuming no DOA R (f)= R (f)+ R (f)+ R (f). (9) y x v n mismatch, the BLCMV [13], [14] can preserve the binaural cues for only two directional interferers when the 2 + 2 microphone B. Beamformer Microphone Configuration configuration is used. In this work, a binaural hearing aid with two microphones Using the complex Lagrangian multiplier method to solve the on each side of the head is used, as illustrated in Fig. 1. We constrained optimization problems in (12) and (13), the left and take advantage of the availability of two bidirectional binaural right binaural beamformer coefficients w and w are as in (14) l r wireless links to transmit two microphone signals from each and (15), respectively: side to the other side. Thus, the beamformer on each side has −1 H −1 −1 direct access to four microphone signals. We will refer to this w = R C(C R C) g (14) l y y l design as the 2 + 2 microphone configuration. The binaural −1 H −1 −1 w = R C(C R C) g . (15) r y y r beamformers are used to process the input noisy signals as in AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1553 Note that some level of diagonal loading may be required IV. THE PROPOSED BEAMFORMING ALGORITHM in practice, to regularize the matrix inversions [46]. Different In this section, a binaural TLCMV robust to target DOA mis- options for the choice of correlation matrices have previously match is first introduced, which does not require estimates of been introduced for the BLCMV [32]. In (14) and (15), the sim- the interferers’ DOAs or propagation models. It should be noted plest option from [32] which uses the noisy microphone signals that in some previous work such as [31]–[36], the name BLCMV correlation matrix R is considered. By using the noisy mi- is used for beamformers that use multiple constraints in order crophone signals correlation matrix R , there is no need for a to attenuate directional interferers (in addition to the constraint sophisticated target voice activity detector (VAD) to estimate to preserve the target). However, in this work, we use the name the noise components correlation matrices R and R .The two v n TLCMV (Target-LCMV) for beamformers that use more than other suggested options in [32] are using either the overall noise one (normally two) constraints for the target, and no constraint components correlation matrix (R + R ) or the background v n for the interferers. The proposed Robust TLCMV requires an es- diffuse-like noise correlation matrix R .Using R + R or n v n timate of the target DOA, but the true target DOA can be within R in the beamformer coefficients computation increases the +10 degrees of the estimated target DOA, as will be shown robustness to mismatch between the estimated target directivity through experiments. This is a realistic condition, however the vector and the actual target directivity vector [47], because us- actual estimation of the target DOA is not considered in this ing the noise components correlation matrix (either R + R v n paper. or R ) in the minimization criteria of (12) and (13) does not Since the Robust TLCMV distorts the binaural cues for the lead to target components minimization. At the opposite, dis- directional interferers and the background diffuse noise, a post- tortion/attenuation of the target component in the beamformer processor which does not require directivity vectors informa- output signal can occur in the presence of mismatch if R is tion is also proposed in this section, to provide a good level of includes the target component). used (since R binaural cues preservation while providing good overall noise However, estimating R is often a difficult task in non- reduction. stationary multiple talkers conditions. And even though R can be more easily estimated, for beamformers that do not rely on A. The Proposed Robust TLCMV Beamforming Algorithm constraints at interferer directions to reduce the interferers (such as the BMVDR or our proposed method, as we will explain later), Aiming to design a binaural beamformer that provides little using R leads to a solution that is not capable of significantly suppression for sources from angles within a small angular re- reducing the interferers. On the other hand, using R in a beam- gion around the estimated target direction, the Robust TLCMV is former such as the BLCMV [31], [32] can be sufficient as long introduced. Two constraints with unity gains are used in the mid- as there are constraints in the interferers directions, since the re- dle of each side of a target zone, which consists of +10 degrees duction of interferers is then determined by the value of a small around the estimated target DOA. For example, if the estimated constraint gain η. target direction is at 0 degree, the beamformer assumes that the target can be anywhere between −10 to 10 degrees, and two unity constraints are used at +5 degrees, in the middle of each B. Binaural MVDR (BMVDR) and Its BMVDR-n Extension side in the estimated target zone. The constraints of the Robust The BMVDR [48] is a special case of the BLCMV, with a TLCMV are as described in (16) and (17), with the beamformer single constraint in the estimated target direction. Therefore, the coefficients computed as in (14) and (15): constraint matrix C can be reduced to d(f, θ ), and the gain vec- C w = g C =[d(f, θ +Δ), d(f, θ − Δ)] l l x x tors g and g can be reduced to d (f, θ ) and d (f, θ ), l r 1,l x x M/2+1,r (16) respectively. The BMVDR preserves the binaural cues of the tar- g =[d (f, θ +Δ),d (f, θ − Δ)] l 1,l x 1,l x get in case of no target DOA mismatch; however, it distorts the C w = g C =[d(f, θ +Δ), d(f, θ − Δ)] r r x x binaural cues for the directional interferers and the background (17) noise. As an attempt to enhance the binaural cues preservation g =[d (f, θ +Δ),d (f, θ − Δ)] r M/2+1,r x M/2+1,r x ability of the BMVDR for the noise components, a small portion of the original noisy signal can be added to the BMVDR, lead- where θ ± Δ are the directions of unity constraints in the mid- ing to the BMVDR with partial noise estimation (BMVDR-n). dle of the assumed target zone. The gain values used in g and The idea of adding a small portion of the original noisy signal g ensure that the beamformer output for a source from DOAs to the processed output was introduced in [29]. More details θ +Δ and θ − Δ has the same level as the one found at the x x of the BMVDR and the BMVDR-n can be found in [44], [45]. input reference microphone for that same source, which we will Many extensions to the BMVDR beamformer were previously refer to as a “unit gain” (i.e., the gain is relative to the input ref- introduced in the literature, such as the work in [24]. However, erence microphone level). Using two unity constraints around in this work we will compare our proposed algorithm with the the estimated target direction forces the beamformer to have BMVDR-n, because of its ability to preserve the binaural cues a wider beam in the direction of the target. Figs. 2 and 3 illus- for both the directional interferers and the diffuse-like back- trate beampatterns of a fixed BMVDR beamformer with a single ground noise. For a fair comparison of our proposed algorithm constraint at 0 degree under 2-D (cylindrically isotropic) diffuse with the BMVDR and BMVDR-n algorithms, the noisy corre- noise conditions and the beampatterns of a fixed Robust TLCMV lation matrix R will be used as for the BLCMV. beamformer with constraints at +5 and –5 degrees under 2-D y 1554 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Fig. 2. Beampatterns of BMVDR and Robust TLCMV at different frequen- cies, shown for left side. Fig. 4. The Robust TLCMV with CCMBB post-processor (dotted lines rep- resent signals transmitted through a wireless link). post-processor based on time-frequency (T-F) classification and mixing of binaural signals is proposed in this section, as Fig. 4 shows. It is an updated version of our recent work [41], to provide a simpler and improved classification and mixing algorithm. Fig. 3. Beampatterns of BMVDR and Robust TLCMV at different frequen- A complex coherence is computed for classification, as it cies, shown for right side. gives the ability to exploit two classification decisions: one for the magnitude and one for the phase. We will thus refer to the post-processing algorithm as the Coherence-based Classi- diffuse noise conditions at different frequencies for the left and fication and Mixing for Binaural Beamforming (CCMBB). The right side. The beampatterns are obtained with HRTFs measured complex coherence is computed on each side, between two sig- from behind-the-ear (BTE) hearing aid units on a mannequin in nals locally available on each side. The first signal is the binaural an anechoic environment, using four microphone signals, i.e., beamformer outputs (z (f, t) or z (f, t), depending on the side), l r 2 microphones at each ear. The same HRTFs are also used to with a good level of interferers and diffuse noise reduction. The produce the 2-D diffuse noise correlation matrix required to pro- second signal is the front microphone noisy signal (y (f, t) or duce Figs. 2 and 3. The beampattern BP (θ) is computed as i y (f, t)), which fully preserves the binaural cues for all acoustic in (18): scene components. Alternatively, at the cost of increased com- plexity, the second signal could be a signal with an intermediate H 2 BP (θ)= |w (f)d(f, θ)| (18) level of interferers and diffuse noise reduction but with binau- ral cues still preserved, such as the output from a common gain where w is the left binaural beamformer coefficients w or i l beamforming approach (e.g., [43], without the post-processing). the right binaural beamformer coefficients w . Figs. 2 and 3 The left complex coherence C (f, t) and right com- zl,yl show that for higher frequencies the BMVDR has a narrow plex coherence C (f, t) are computed as in (19) and (20), zr,yr beam around the target direction, i.e., 0 degree. This narrow respectively: beam around the target direction indicates that the BMVDR is not robust to small target DOA mismatch. However, the Robust Γ (f, t) zl,yl TLCMV has a wider beam around the target direction over all C (f, t)=  (19) zl,yl Γ (f, t)Γ (f, t) frequency components, therefore by design it is more robust to yl,yl zl,zl target DOA mismatch. There is a trade-off between robustness Γ (f, t) zr,yr C (f, t)= (20) to target DOA mismatch and noise reduction, because the use zr,yr Γ (f, t)Γ (f, t) yr,yr zr,zr of an additional constraint for the target in the TLCMV design leads to a reduction in the degrees of freedom available for noise 2 2 where, Γ = E{|y (f, t)| }, Γ = E{|y (f, t)| }, Γ yl,yl l yr,yr r zl,zl reduction (e.g., the positioning of adaptive “nulls”). However, 2 2 = E{|z (f, t)| }, Γ = E{|z (f, t)| } are, respectively, l zr,zr r due to the sparsity and the disjoint properties of speech signals in auto-power spectral densities (auto-PSDs) for the front micro- practice [49], there are often only one or sometimes two domi- phone noisy signals and the binaural beamformer outputs, and nant directional interferer sources active at each time-frequency ∗ ∗ Γ = E{|z (f, t)y (f, t)|}, Γ = E{|z (f, t)y (f, t)|} zl,yl l l zr,yr r bin, and having two degrees of freedom left for the beamformer are cross-PSDs between the binaural beamformer outputs and can be sufficient for good adaptive noise reduction, as will be the front microphone noisy signals. illustrated later in this paper. For binaural cues preservation of a directional source, at low frequency components with wavelengths longer than the diam- B. Post-Processor Using Modified Coherence-Based eter of the head the interaural phase difference (IPD, defined Classification and Mixing Binaural Beamforming (CCMBB) in the next section) is more important than the interaural level The proposed Robust TLCMV of the previous section can difference (ILD, also defined in the next section) [50]. On the distort the binaural cues for the directional interferers and the other hand the ILD is more important for high frequencies with diffuse-like background noise. In order to achieve better bin- wavelength components smaller than the head diameter, i.e., for aural cues preservations for these interferers and for diffuse frequencies higher than 1500 Hz. In the proposed CCMBB, on noise components, while at the same time achieving a good each side for low frequency components (<1500 Hz) the mag- level of overall reduction for interferers and diffuse noise, a nitude of the binaural output is simply the magnitude of the AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1555 beamformer output (no mixing, no classification). This is be- noise” signal u are uncorrelated (as stated in a previous sec- ref cause the output magnitude does not play a role in preserving tion). Next, if a target distortionless response is assumed for the the phase-based IPD binaural cues of the interferers (important at beamformer, i.e., z = x , (21) becomes: x ref low frequencies), and the magnitude-based ILD is not important j(z −u ) u ref E{|x | + |z ||u |e } at low frequencies. Therefore, the magnitude of the binaural out- ref u ref C = . (22) z,y put at low frequencies keeps the emphasis on interferers/noise 2 2 2 2 E{|x | + |z | } E{|x | + |u | } ref u ref ref reduction. Similarly, in the proposed CCMBB, on each side for high frequency components (>1500 Hz) the phase of the bin- At low frequencies, a larger phase change |z −u | u ref aural output is simply the phase of the beamformer output (no between the input and output interferers/noise components is mixing, no classification). This is because the output phase does more likely to lead to distortion of interferers/noise IPD bin- not play a role in preserving the magnitude-based ILD binaural aural cues between the left and right binaural outputs, because cues of the interferers (important at high frequencies), and the such changes do not occur symmetrically in the beamformer phase-based IPD is not important at high frequencies. Therefore, on each side of a binaural system. Similarly, at high frequen- the phase of the binaural output at high frequencies keeps the cies a larger magnitude change ||z |−|u || between the input u ref emphasis on interferers/noise reduction. and output interferers/noise components (i.e., a larger interfer- Another type of binaural cues will be considered in this work, ers/noise reduction) is more likely to lead to distortion of inter- for the preservation of the spatial impression of background dif- ferers/noise ILD binaural cues between the left and right binau- fuse noise: the Magnitude Squared Coherence (MSC, defined ral outputs. Evaluating from (22) the impact on C of different z,y in the next section). The above processing implies that in the |z −u | phase changes and different interferers/noise re- u ref proposed CCMBB the magnitude information of binaural out- duction levels, we can then use C as a classification criterion z,y put signals is considered to be less important for preservation for the CCMBB binaural output phase at low frequencies, where of MSC at low frequencies, and that the phase information of IPD is important. Likewise, evaluating from (22) the impact on binaural output signals is considered to be less important for C of different ||z |−|u || magnitude changes and differ- z,y u ref preservation of MSC at high frequencies. ent interferers/noise reduction levels, we can then use C as a z,y Therefore, two classification and mixing systems need to be classification criterion for the CCMBB binaural output magni- developed based on the complex coherence: one for the bin- tude at high frequencies, where ILD is important. aural output signal phase at low frequencies, and one for the First, we consider the effect of the phase change |z − binaural output signal magnitude at high frequencies. To better u | for some important cases. The effect is more directly ob- ref explain the rationale for the phase and magnitude classification served on the coherence phase value |C |. From the numer- z,y performed at each T-F, a few additional equations are provided ator of (22), we see that a small coherence phase value |C | z,y below. These equations are not required in the actual implemen- occurs if there is a small phase change |z −u | (regard- u ref tation of the CCMBB post-processor, unlike (19), (20). For sim- less of the interferers/noise reduction level, i.e., level of |z | plicity, the left l and right r indices are dropped in these equations relative to |u | and |x |). Another case where a small co- ref ref since the same equation applies to each side, and the time (frame) herence phase value |C | occurs is when there is a large z,y and frequency indices are also dropped. As before, x repre- ref |z −u | phase change with a strong interferers/noise re- u ref sents the target component at the reference microphone, and we duction (|z | small relative to |u | and |x |). A case pro- u ref ref define u = v + n as the sum of the directional inter- ref ref ref ducing a large coherence phase value |C | is when a large z,y ferers components v and the diffuse noise components n ref ref |z −u | phase change is combined with weak interfer- u ref at the reference microphone. The corresponding components in ers/noise reduction (|z | level similar to |u | and |x | levels). u ref ref the beamformer output signal are written as z and z . There- x u Since the case with a large coherence phase value |C | z,y fore, we have y = x + u as the noisy input signal at ref ref ref mentioned above includes both weak interferers/noise reduc- the reference microphone, and z = z + z as the beamformer x u tion and increased risk of binaural IPD cues distortion (from the output, on each side and for each time and frequency bin. large |z −u | phase change), the CCMBB does not use u ref Considering z and y as zero-mean random variables and ref the beamformer output phase in such case. However, to avoid using the polar notation for these variables, the complex coher- losing cases with good interferers/noise reduction levels, the ence becomes as follows, where E{.} refers to an averaging CCMBB keeps the beamformer output phase for smaller val- process over consecutive frames in each frequency bin: ues of |C | (which includes some cases with good or weak z,y E{zy } amount of interferers/noise reduction, as well as large or small ref C = z,y |z −u |). The resulting set of equations for the CCMBB u ref 2 2 E{|z| } E{|y | } ref binaural output phase component at low frequencies is: j(z −x ) j(z −u ) x ref u ref E{|z ||x |e +|z ||u |e } x ref u ref (y (f, t)), |(C (f, t))| >μπ l zl,yl =   . 2 2 2 2 (z (f, t)) = (23) m,l E{|z | +|z | E{|x | + |u | } x u ref ref (z (f, t)), |(C (f, t))|≤ μπ l zl,yl (21) (y (f, t)), |(C (f, t))| >μπ r zr,yr (z (f, t)) = (24) The last part of (21) assumes that components from the target m,r (z (f, t)), |(C (f, t))|≤ μπ r zr,yr signal x and components from the “interferers plus diffuse ref 1556 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 The threshold value is a tunable parameter μπ (0 <μ< ILD cues distortion. To help the balance and keep the binau- 1), where a lower μ leads to lower IPD binaural cues errors ral ILD cues distortion at a reasonable level, for the alternate (and lower MSC errors), but also to lower interferers and diffuse condition with |C | higher than a threshold T the CCMBB z,y noise reduction. A value of μ =0.1 has been found to provide puts more weight on the preservation of the binaural ILD cues, satisfactory experimental results in our simulations. i.e., more weight on the magnitude of the noisy reference input Next, we consider the effect of the magnitude change signal. Essentially this simply means using a value α> 0.5 in ||z |−|u || for some important cases. The effect is more di- (25), (26). This approach has been validated in our experiments u ref rectly observed on the coherence magnitude value |C |.From using the objective metrics presented in the next section, where z,y (22), we see that a case producing a smaller coherence magnitude it was found that a value of α =0.7 provided satisfactory exper- value |C | is when there is good interferers/noise reduction per- imental results (good overall trade-off between interferers/noise z,y formance (small |z | level relative to |u | and |x |, and there- reduction and ILD distortion). u ref ref fore large ||z |−|u ||). On the other hand, if ||z |−|u || The threshold values T (f) and T (f) in (25), (26) are com- u ref u ref l r is small (weak interferers/noise reduction, |z | level similar to puted by taking the magnitude of the complex coherences esti- |u | and |x | levels), the value of |C | depends on the mated at each frequency bin from 219 ms of signals (40 frames, ref ref z,y |z −u | phase change: if there is a large |z −u | with overlap). This is unlike the coherence functions in (19), u ref u ref phase change it leads to a smaller coherence magnitude value (20), (23)–(26), which are estimated with a shorter total time |C |, and if there is a small |z −u | phase change it of 59 ms (only 10 frames, with overlap). The total time close to z,y u ref leads to a larger coherence magnitude value |C | (closer to 200 ms was selected so that the method with a threshold could be z,y 1.0). used in future work under dynamic conditions (e.g., with head We note that unlike the low frequency classification with co- movements and dynamic sources). Using the CCMBB algorithm | considered earlier, here there is herence phase value |C as a post-processor for the proposed Robust TLCMV, we will z,y no case which has both a weak interferers/noise reduction and refer to the resulting beamforming algorithm as the “Robust TL- an increased risk of binaural cues distortion (i.e., a higher risk CMV with CCMBB”. of binaural ILD cues distortion from a large magnitude change ||z |−|u ||). This is because by definition ||z |−|u || is u ref u ref V. PERFORMANCE MEASUREMENT indicative at the same time of the interferers/noise reduction To evaluate the performance of the proposed algorithm and the level (a larger value of ||z |−|u || is better) and the risk of u ref state of the art BLCMV, BMVDR and BMVDR-n algorithms, binaural ILD cues distortion (a smaller value of ||z |−|u || u ref several objective metrics are used in this work. First, to measure is better). Therefore, the approach proposed for the CCMBB the ability of the binaural beamformers to preserve the binaural binaural output magnitude at high frequencies is less drastic or cues, the interaural information between the left and right side less binary than the previous approach for the CCMBB binaural signals is required. Formally, the Interaural Transfer Function output phase at low frequencies, and it involves mixing together (ITF) is defined as the ratio of a directional source component the beamformer output magnitude and the noisy reference input from the left to the right ear [30]. For simplicity, the ITF, ILD magnitude. The resulting set of equations for the binaural output and IPD metrics are developed below for the case of a single magnitude at high frequencies is (at each T-F bin): source, more specifically a single interferer source. In the case of several interferers, in this work we apply the same equations if |C (f, t)| <T (f) zl,yl l ⎪ to an equivalent interferer signal which consists of the sum of α|z (f, t)| +(1 − α)|y (f, t)| l l all interferer signals. All the performance measurements in this |z (f, t)| = (25) m,l ⎪ section are frequency dependent metrics; however, the frequency if |C (f, t)|≥ T (f) zl,yl l ⎪ index f is omitted for simplicity. The input ITF for an interferer (1 − α)|z (f, t)| + α|y (f, t)| l l component can be computed as in (27), where Γ (vref,r),(vref,l) is the cross-PSD between the interferer component at the front if |C (f, t)| <T (f) zr,yr r left and front right reference microphones, and Γ (vref,l),(vref,l) α|z (f, t)| +(1 − α)|y (f, t)| r r is the auto-PSD of the interferer component at the front left |z (f, t)| = . (26) m,r reference microphone: if |C (f, t)|≥ T (f) zr,yr r (vref,r),(vref,l) (1 − α)|z (f, t)| + α|y (f, t)| r r ITF = . (27) in,v (vref,l),(vref,l) The mixing parameter α (0 ≤ α ≤ 1) affects the trade-off Similarly, the ITF between the left and right beamformer out- between the level of interferers/noise reduction and the preser- puts can be described by (28): vation of the binaural ILD cues. As described in an earlier para- graph, the case with a good level of interferers/noise reduction (zv,r),(zv,l) ITF = (28) occurs for a smaller value of |C |, and to preserve this case the out,v z,y (zv,l),(zv,l) CCMBB selects the condition with |C | lower than a threshold z,y T as the condition which puts more weight on interferers/noise where z is the interferer component in the beamformer output reduction, i.e., more weight on the magnitude of the beamformer signals. The errors (or losses) in the Interaural Level Difference output. This is at the expense of increasing the risk of binaural (ILD) and Interaural Phase Difference (IPD) binaural cues are AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1557 defined as in (29) to (34): where in the above cross- and auto-PSDs x , v and n ref ref ref refer to the target, interferers and diffuse noise components at ILD = 10 log 10|ITF | (29) in,v in,v a reference microphone, while z , z and z refer to the corre- x v n sponding components in the beamformer output signal. ILD = 10 log 10|ITF | (30) out,v out,v Finally, to measure the target distortion on each side after pro- ΔILD = ILD − ILD (31) v out,v in,v cessing, two measurements are used: a target Speech Distortion Ratio (SDR) and a Speech Distortion Magnitude-only distance IPD = ITF (32) in,v in,v (SDmag). For each side, we define a target distortion error sig- IPD = ITF (33) out,v out,v nal x as the time domain difference between the (aligned) dist target component in the beamformer output z and the target ΔIPD = IPD − IPD . (34) v out,v in,v component at the reference microphone signal x . The SDR ref In this work, the ILD error ΔILD is only computed for is then computed with the auto-PSDs as in (41): the frequency components above 1500 Hz, and the IPD error xref,xref ΔIPD is only computed for the frequency components below SDR =10log , (41) 1500 Hz. xdist,xdist In order to preserve the spatial impression of the diffuse-like and the SDmag is computed with the same auto-PSDs but as noise, the MSC of the binaural diffuse-like noise components in (42): also has to be preserved. The MSC between the reference mi- crophones can be computed as in (35): SDmag = |10 log Γ − 10 log Γ |. (42) xref,xref zx,zx Since the computation of the performance metrics requires (nref,r),(nref,l) MSC =  . (35) n,in knowing the separate components in the beamformer out- Γ Γ (nref,l),(nref,l) (nref,r),(nref,r) put signals (target, interferers, diffuse noise), the so-called where Γ , Γ and Γ shadow-filtering method was used in the simulations, i.e., fil- (nref,r),(nref,l) (nref,l),(nref,l) (nref,r),(nref,r) are cross- and auto-PSDs from the diffuse noise component at tering/processing all the signal components individually with the front microphones. the same time-variant filter coefficients or post-filtering. In ad- Similarly, the MSC between the left and right binaural outputs dition, since all the talker speech sources were always active in can be computed as in (36): our simulations (except for normal short pauses between words), for each component all the computed frames were used to es- (zn,r),(zn,l)  timate the PSD statistics, and therefore no VAD was required MSC =  (36) n,out under this setup. Γ Γ (zn,l),(zn,l) (zn,r),(zn,r) where Γ , Γ and Γ are cross- (zn,r),(zn,l) (zn,l),(zn,l) (zn,r),(zn,r) VI. EXPERIMENTAL SETUP and auto-PSDs from the diffuse noise component in the beam- Head Related Transfer Functions (HRTFs) measured from former outputs. The MSC error is then computed as in (37): a KEMAR mannequin wearing two binaural Behind-The-Ear ΔMSC = MSC − MSC . (37) n n,out n,in (BTE) hearing aids are used for the simulations. The HRTFs were provided by a hearing aid manufacturer. There were two Next, to measure the reduction of the interferers and diffuse sets of HRTFs: HRTFs from an anechoic environment, and noise components with the beamforming process, a signal to HRTFs from a mildly reverberant environment (T60 150 ms). noise ratio gain (SNR-gain, array gain), a signal to interferers For our simulations, the directional signals (target, interferers) ratio gain (SIR-gain), and a signal to diffuse noise ratio gain for the reverberant conditions are generated using the reverber- (SDNR-gain) are computed on each side, providing the differ- ant HRTFs. Beamformer designs are always performed using the ence in dB between the SNR, SIR, and SDNR at the beamformer anechoic HRTFs, and these HRTFs are also used to generate the output and at the input reference microphone: directional signals for the subset of simulations with anechoic zx,zx conditions. The distance used for the reverberant and the ane- SNRgain(dB)=10 log Γ choic HRTFs measurements, which is between a loudspeaker (zv+zn),(zv+zn) source and the center of the head, was 1 m. The diffuse-like xref,xref − 10 log background noise recordings were also provided by a hearing aid (vref+nref),(vref+nref) manufacturer, again recorded on a KEMAR mannequin wearing (38) two binaural BTE hearing aids, with babble noise recordings played at eight loudspeakers on a circle with a radius of 1 m Γ Γ zx,zx xref,xref SIRgain(dB)=10 log − 10 log around the KEMAR mannequin. The audio signals are sampled Γ Γ zv,zv vref,vref at 24 kHz. A Short Time Fourier Transform (STFT) is used to (39) decompose the signals in the time-frequency domain, with a FFT Γ Γ zx,zx xref,xref size of 256 (10.67 ms), using a Hann window with 50% overlap SDNRgain(dB)=10 log − 10 log Γ Γ between consecutive windows. The generated noisy mixtures of zn,zn nref,nref (40) signals have a total length of 10 sec. 1558 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 TABLE I ACOUSTIC SCENARIOS Fig. 5. The constraints directions for (a) proposed Robust TLCMV, (b) BMVDR and BMVDR-n (c) BLCMV. VII. SYSTEM EVALUATION AND SIMULATION RESULTS In this section, the performance of our proposed beamformer “Robust TLCMV” is first compared with the BMVDR, which has more degrees of freedom available for noise reduction (more adaptive “nulls”), in order to assess the effect of reducing the number of degrees of freedom for the Robust TLCMV. In Fig. 6. Performance of BMVDR and Robust TLCMV in terms of SNR-gain addition, the performance of the Robust TLCMV is evaluated and SDR, under acoustic scenarios from Table I (with and without target DOA mismatch). using both noisy correlation matrix R and diffuse noise correlation matrix R . Binaural cues preservations are not considered in these first comparisons. The proposed CCMBB A. Robust TLCMV and BMVDR (Without Post-Processor) post-processor for binaural cues preservation is then combined with the BMVDR and compared with the MVDR-n, to compare In order to evaluate the performance of the proposed Robust noise reduction, target distortion, and binaural cues preservation TLCMV with the BMVDR (the first option in [24], which does between these two approaches for cues preservation. not preserve the binaural cues of interfering sources), four dif- The proposed Robust TLCMV with CCMBB is then evaluated ferent acoustic scenarios are used, each with a target at 0 or 10 and compared with the BLCMV [13], [14]. For these algorithms, degrees, as Table I illustrates. Due to space limitations, perfor- two types of propagation model mismatch are evaluated. The mance in this subsection is only shown in terms SNR-gain and first type of mismatch is generated from the difference between SDR. The noise reduction and target distortion measurements in the estimated and the true direction of arrivals for the directional this section and the other sections are only shown for the “better sources, i.e., target and directional interferers. We will refer to ear” (the side where the input SNR is higher). The resulting per- this type of mismatch as DOA mismatch. The second type of formance metrics in Fig. 6 illustrate the effect of the target DOA mismatch is between the reverberant HRTFs used to generate the mismatch in the performance of BMVDR and the proposed Ro- reverberant signals at the microphones and the anechoic HRTFs bust TLCMV. The Robust TLCMV outperforms the BMVDR used in all the beamformer designs. We will refer to this second in terms of SDR under the four acoustic scenarios (more signif- type of mismatch as HRTF mismatch. icantly for cases with DOA mismatch, i.e., target at 10 degrees), For a frontal or near-frontal target case, the estimated target and it outperforms the BMVDR in terms of SNR-gain under the DOA is at 0 degree (for our proposed Robust TLCMV with and acoustic scenarios in the presence of DOA mismatch (target at without CCMBB, and for the BMVDR, the BMVDR-n and the 10 degrees). For acoustic scenarios with a target at 0 degree and BLCMV) and the estimated interferers DOAs are at 225 degrees no DOA mismatch, the proposed Robust TLCMV also slightly and 90 degrees (with such estimates required for the BLCMV outperforms the BMVDR in terms of SDR. While this may seem only). As our proposed Robust TLCMV beamformer design as- surprising, it is because of HRTF mismatch (mismatch between sumes that the true target DOA is within +10 degrees of the es- anechoic HRTFs used to design the beamformer and reverberant timated target DOA, two unity constraints are positioned in the HRTFs used to generate directional sources). Although it was middle of the estimated target zone at +5 degrees as Fig. 5(a) designed for robustness to DOA mismatch, the Robust TLCMV illustrates, unlike the BMVDR and BMVDR-n which only use with a wider beampattern around the estimated target direction one constraint at the estimated target direction as Fig. 5(b) shows. is found to also provide better robustness to HRTF mismatch On the other hand, the BLCMV uses three constraints: at 0 de- (here and in other results). In terms of noise reduction, for these gree with gain ζ =1, and at 225 and 90 degrees with a gain ideal cases with no DOA mismatch the BMVDR outperforms η set to 0.2 (as recommended in [32] and shown in Fig. 5(c)). the Robust TLCMV, although typically only by a fraction of a A non-frontal target case with a target speaker at 90 degrees dB. Overall, the results show that the performance of the pro- is also considered, with two unity constraints positioned in the posed Robust TLCMV is competitive (and significantly better middle of the estimated target zone, i.e., at +5 degrees deviation in cases of DOA mismatch) compared to the BMVDR, despite from the assumed target direction in the Robust TLCMV, while a reduced number of degrees of freedom available for noise the BLCMV again uses a unity gain constraint in the estimated reduction. target direction, and two constraints of gain η at the estimated The performance of the proposed Robust TLCMV is then interferer directions. evaluated using a noisy signals correlation matrix R as in (14) y AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1559 TABLE II PERFORMANCE OF BMVDR, BMVDR-N, AND BMVDR-CCMBB Fig. 7. Performance of Robust TLCMV using noisy correlation matrix and diffuse noise correlation matrix in terms of SNR-gain. Fig. 9. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, with no DOA mismatch and no HRTF mismatch. BMVDR-CCMBB is compared with the BMVDR (no binau- ral cues preservation for the interferers and noise components) and with a BMVDR-n which uses 0.7 of the beamformer output mixed with 0.3 of the noisy input signal. An acoustic scenario Fig. 8. Performance of Robust TLCMV using noisy correlation matrix and diffuse noise correlation matrix in terms of SIR-gain and SDNR-gain. is used with a target at 0 degree (no DOA mismatch), an in- terferer at 165 degrees, and diffuse-like noise (5 dB below the directional sources level). The resulting performance metrics in Table II show that the proposed CCMBB cues preservation post- and (15), and using a background diffuse-like noise correla- processing method combined with the BMVDR outperforms the tion matrix R instead of R in (14) and (15). The R and n y n BMVDR-n in terms of SNR-gain by around 2 dB, with a bet- R correlation matrices were estimated using a moving aver- ter SDmag distortion (2.3 dB) and similar scores for the other age lowpass first order recursive filter with a forgetting factor indicators. At the same time, the BMVDR-CCMBB has only of 0.985. An acoustic scenario is used with a target at 0 degree, a slightly lower SNR-gain than the BMVDR, while providing interferers at 225, 90 and 180 degrees, and diffuse-like noise much better scores for the other metrics. This overall indicates (14 dB lower that the directional sources level). The resulting the good performance of the CCMBB post-processor. performance in terms of SNR-gain in Fig. 7 shows that using R for coefficients computation in the Robust TLCMV outperforms using R . More detailed results are shown in Fig. 8 in terms of n C. Robust TLCMV With CCMBB and DOA Mismatch SIR-gain and SDNR-gain. The results illustrate the better per- In this section, the effect of the DOA mismatch for the tar- formance of the proposed Robust TLCMV in terms of SIR-gain get speaker as well as for the directional interferers is studied. when R is used for the coefficients computation. This result We first evaluate the performance of the algorithms in an ane- can be justified since using R enables the proposed Robust choic environment, using speech sources generated by anechoic TLCMV to adaptively position the nulls in the direction of the HRTFs, in order to remove the other source of mismatch gener- active interferers sources at each T-F bin. On the other hand, ated from the reverberation, i.e., HRTF mismatch. using R for coefficients computation in the Robust TLCMV To begin, a case with a target at 0 degree and interferers at performs better than using R for the SDNR-gain (diffuse noise 90 and 225 degrees is considered. For the BLCMV, this is an reduction), which is normal since R is specifically tuned for ideal case with no DOA mismatch, while for the proposed Ro- that. In the rest of this paper, the noisy signals correlation matrix bust TLCMV with CCMBB, the constraints set at +5 degrees R is used in all simulations. do not match the true target DOA (less ideal case). The tar- get and the interferers all have the same level, and the diffuse B. CCMBB and a Method With Direct Mixing noise level is set to 5 dB below each directional source level. In order to evaluate the performance of the proposed CCMBB In terms of SNR-gain, the resulting performance metric in the post-processor for cues preservation separately from the pro- first plot of Fig. 9 illustrates the better SNR-gain performance posed Robust TLCMV, the CCMBB is used as a post-processor of the BLCMV under this scenario ideal for it. In this scenario, to the BMVDR (BMVDR-CCMBB). The performance of the the proposed Robust TLCMV with CCMBB does not have an 1560 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Fig. 12. Performance of Robust TLCMV with CCMBB post-processor under Fig.10. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, mildly reverberant acoustic scenario (HRTF mismatch), with and without DOA under acoustic scenario with 10 degrees DOA mismatch and no HRTF mismatch. mismatch for the target. Fig. 11. Performance of BLCMV in terms of IPD-error and ILD-error under Fig. 13. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, anechoic acoustic scenario, without and with 10 degrees DOA mismatch. under acoustic scenario with 10 degrees DOA mismatch for the target (and for interferers in the BLCMV), and HRTF mismatch. exact unit constraint at 0 degree (true target DOA), unlike the BLCMV. Nevertheless, both the BLCMV and our proposed Ro- noise, the abilities of the BLCMV to preserve the binaural cues bust TLCMV with CCMBB algorithm generate an output with of the directional interferers significantly decrease (i.e., increase significant SNR-gain and very low target distortion, as shown by in the IPD-error and ILD-error metrics). the SNR-gain, SDR and SDmag plots of Fig. 9. Fig. 9 also clearly illustrates the effect of adding the CCMBB to preserve the spa- D. Robust TLCMV With CCMBB With DOA Mismatch and tial impression (binaural cues) of the diffuse-like background HRTF Mismatch noise, as measured with the MSC-error metric. The BLCMV does not preserve the diffuse noise binaural cues, causing the In this section, a more realistic evaluation is performed using large MSC-error scores. speech signals generated in a mildly reverberant environment Assuming an exact knowledge of the true DOA of the target (T60 = approx. 150 ms). Three acoustic scenarios are gener- as well as true DOAs of the directional interferers is impracti- ated with a target at 0, 5, or 10 degrees, interferers at 225 and cal. Therefore, a case with 10 degrees of DOA mismatch is then 90 degrees, 230 and 95 degrees, or 235 and 100 degrees, as tested, using an acoustic scenario with a target at 10 degrees, in- well as with diffuse noise. The target and the interferers again terferers at 235 and 100 degrees, and diffuse-like noise, all with all have the same level, and the diffuse noise level is set to the same levels as earlier. The resulting performance in terms of 5 dB below each directional source level. The directional sig- SNR-gain, SDR and SDmag in Fig. 10 illustrates that the Ro- nals were generated using reverberant HRTFs. The beamformer bust TLCMV with CCMBB provides significantly better results algorithms assume the same target DOA as before: 0 degree (for in this case with DOA mismatch, especially for high frequen- both algorithms), 90 and 225 degrees (required for BLCMV cies, i.e., above 1000 Hz. The post-processing CCMBB method only). Therefore, these cases include HRTF mismatch, with and again provides significant improvements in terms of diffuse- without DOA mismatch. The resulting performance metrics in noise MSC-error. These results indicate the robustness of the Fig. 12 show that the proposed Robust TLCMV with CCMBB proposed algorithm in the presence of target DOA mismatch. remains robust to DOA mismatch in the reverberant environment Moreover, since our proposed algorithm does not assume a prior (up to 10 degrees) since it does not rely on constraints in the ex- knowledge of the directional interferers DOAs, its binaural cues act directions of the directional sources (unlike the BLCMV). preservation performance is not affected with interferers DOA Moreover, Fig. 13 illustrates the overall improved performance mismatch, unlike the BLCMV. Fig. 11 shows that with 10 de- of the Robust TLCMV with CCMBB over the BLCMV in terms grees DOA mismatch in an anechoic environment with a target of SNR-gain, SDR, SDmag and MSC with 10 degrees DOA mis- at 10 degrees, interferers at 235 and 100 degrees, and diffuse-like match in the mildly reverberant environment, i.e., with HRTF AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1561 Fig. 14. Performance of BLCMV in terms of IPD-error and ILD-error with Fig. 16. For non-frontal target, performance in terms of SNR-gain, SDR, SD- and without HRTF mismatch, and without DOA mismatch. mag and MSC-error under a reverberant acoustic scenario (HRTF mismatch), without DOA mismatch. increase in the interferer DOA mismatch combined with HRTF mismatch. Further study of the HRTF mismatch effect is done under an acoustic scenario with a lateral target at 90 degrees, where the effect of the target HRTF mismatch can be more significant than for a frontal target case. Interferers at 225 and 315 degrees as well as diffuse noise are used. The target and the interferers all have the same level, and the diffuse noise level is set to 5 dB below each directional source level. The directional signals are generated using reverberant HRTFs, creating HRTF mismatch. In this case the beamformer algorithms know the value of the ex- Fig. 15. Performance in terms of IPD-error and ILD-error under a reverber- act target DOA at 90 degree (for both algorithms, no target DOA ant acoustic scenario (HRTF mismatch), for different levels of interferer DOA mismatch), and the exact value of the interferers DOAs at 225 mismatch. and 315 degrees (required for BLCMV only). Fig. 16 illustrates the improved performance of the proposed Robust TLCMV with CCMBB over the BLCMV in terms of noise reduction, target mismatch. It is also noticeable that the BLCMV does not have speech distortion, and preservation of the binaural spatial im- the ability to preserve the spatial impression of diffuse noise pression of the background diffuse noise for this scenario with in terms of MSC, unlike the proposed Robust TLCMV with HRTF mismatch. CCMBB. To evaluate the effect of the HRTF mismatch separately, VIII. CONCLUSION i.e., without the effect of DOA mismatch, on the ability of the BLCMV to preserve the binaural cues in terms of IPD and ILD, This work introduced a binaural beamforming algorithm ro- an acoustic scenario is generated with a target at 0 degree, in- bust to target DOA mismatch and HRTF mismatch (Robust terferers at 90 and 225 degrees, and diffuse noise (same levels TLCMV), as well as its combination with a post-processor to as before). Fig. 14 shows that in reverberant environments, i.e., achieve a good trade-off between noise reduction and binaural with HRTF mismatch, the ability of the BLCMV to preserve cues preservation of all acoustic components (Robust TLCMV the binaural cues for the directional interferers significantly de- with CCMBB). The proposed robust beamformer does not re- creases. In order to evaluate the combined effect of the HRTF quire prior knowledge of the propagation model (e.g., HRTFs mismatch and DOA mismatch in the preservation of the binaural or HRTF ratios) for the directional interferers, or second order cues in terms of ILD and IPD, five acoustic scenarios are then statistics estimation of the noise-only or interferers-only compo- used: acoustic scenarios without DOA mismatch, and with 5, nents. The Robust TLCMV was shown to produce better results 10, 15 and 20 degrees of interferers DOA mismatch. The result- than a BMVDR under the case of 10 degrees target DOA mis- ing performance metrics in terms of IPD-error and ILD-error in match, and comparable performance for the ideal case of no Fig. 15 show the performance improvement of our proposed Ro- target DOA mismatch. The CCMBB post-processor was shown bust TLCMV with CCMBB algorithm over the BLCMV for all to produce better results than a direct mixing of the beamformer the tested cases. The average IPD for the frequency components output with the noisy input signal. Finally, the Robust TLCMV lower than 1500 Hz and the average ILD for the frequency com- combined with the CCMBB was shown to produce better per- ponents higher than 1500 Hz are shown in Fig. 15. For the case formance than the BLCMV under 10 degrees sources DOA mis- without interferer DOA mismatch, our proposed algorithm still match and under HRTF mismatch with mild reverberation. Fu- outperforms the BLCMV in terms of IPD and ILD, because of ture work to further develop the proposed algorithm and validate the use of CCMBB post-processing. Fig. 15 also shows that our its performance should include testing under environments with proposed Robust TLCMV with CCMBB is not affected by the higher levels of reverberation, as well as with dynamic sources. 1562 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 REFERENCES [23] E. Hadad et al., “Comparison of two binaural beamforming approaches for hearing aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., [1] W. M. Whitmer, K. F. Wright-Whyte, J. A. Holman, and M. A. Akeroyd, 2017, pp. 236–240. “Hearing aid validation,” in Hearing Aids, 1st ed., vol. 56. Basingstoke, [24] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, “Theoretical analysis U.K.: Springer Nature, 2016, pp. 291–321. of binaural transfer function MVDR beamformers with interference cue [2] G. R. Popelka and B. C. J. Moore, “Future directions for hearing aid de- preservation constraints,” IEEEACM Trans. Audio, Speech, Lang. Pro- velopment,” in Hearing Aids, 1st ed., vol. 56. Basingstoke, U.K.: Springer cess., vol. 23, no. 12, pp. 2449–2464, Dec. 2015. Nature, 2016, pp. 331–341. [25] D. Marquardt, V. Hohmann, and S. Doclo, “Coherence preservation in [3] B. Edwards, “Hearing aids and hearing impairment,” in Speech Processing multi-channel Wiener filtering based noise reduction for binaural hearing in the Auditory System. New York, NY, USA: Springer, 2004, pp. 339–421. aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, [4] M. Li, H. G. McAllister, N. D. Black, and T. A. D. Perez, “Perceptual pp. 8648–8652. time-frequency subtraction algorithm for noise reduction in hearing aids,” [26] D. Marquardt, V. Hohmann, and S. Doclo, “Perceptually motivated coher- IEEE Trans. Biomed. Eng., vol. 48, no. 9, pp. 979–988, Sep. 2001. ence preservation in multi-channel wiener filtering based noise reduction [5] T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude for binaural hearing aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal estimation using a super-Gaussian speech model,” EURASIP J. Appl. Sig- Process., 2014, pp. 3660–3664. nal Process., vol. 2005, pp. 1110–1126, 2005. [27] T. J. Klasen, S. Doclo, T. Van den Bogaert, M. Moonen, and J. Wouters, [6] Y. Wang, A. Narayanan, and D. Wang, “On training targets for supervised “Binaural multi-channel Wiener filtering for hearing aids: Preserving inter- speech separation,” IEEEACM Trans. Audio, Speech, Lang. Process.,vol. aural time and level differences,” in Proc. IEEE Int. Conf. Acoust., Speech, 22, no. 12, pp. 1849–1858, Dec. 2014. Signal Process., 2006, vol. 5, pp. 145–148. [7] T. Ricketts and S. Dhar, “Comparison of performance across three direc- [28] T. Klasen, T. den Bogaert, M. Moonen, and J. Wouters, “Binaural noise tional hearing aids,” J. Amer. Acad. Audiol., vol. 10, no. 4, pp. 180–189, reduction algorithms for hearing aids that preserve interaural time delay cues,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1579–1585, Apr. [8] V. Best, J. Mejia, K. Freeston, R. J. van Hoesel, and H. Dillon, “An eval- uation of the performance of two binaural beamformers in complex and [29] B. Cornelis, S. Doclo, T. Van dan Bogaert, M. Moonen, and J. Wouters, dynamic multitalker environments,” Int. J. Audiol., vol. 54, no. 10, pp. 727– “Theoretical analysis of binaural multimicrophone noise reduction tech- 735, Oct. 2015. niques,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 342– [9] T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Robustness analysis of 355, Feb. 2010. binaural hearing aid beamformer algorithms by means of objective per- [30] S. Doclo, S. Gannot, M. Moonen, and A. Spriet, “Acoustic beamforming ceptual quality measures,” in Proc. IEEE Workshop Appl. Signal Process. for hearing aid applications,” in Handbook on Array Processing and Sensor Audio Acoust., 2007, pp. 315–318. Networks. Hoboken, NJ, USA: Wiley, 2010, pp. 269–302. [10] D. Marquardt and S. Doclo, “Performance comparison of bilateral and [31] E. Hadad, S. Gannot, and S. Doclo, “Binaural linearly constrained min- binaural MVDR-based noise reduction algorithms in the presence of imum variance beamformer for hearing aid applications,” in Proc. Int. DOA estimation errors,” in Proc. Speech Commun.; 12. ITG Symp., 2016, Workshop Acoust. Signal Enhancement, 2012, pp. 1–4. pp. 1–5. [32] E. Hadad, S. Doclo, and S. Gannot, “The binaural LCMV beamformer [11] O. Hoshuyama, A. Sugiyama, and A. Hirano, “A robust adaptive beam- and its performance analysis,” IEEEACM Trans. Audio, Speech, Lang. former for microphone arrays with a blocking matrix using constrained Process., vol. 24, no. 3, pp. 543–558, Mar. 2016. adaptive filters,” IEEE Trans. Signal Process., vol. 47, no. 10, pp. 2677– [33] D. Marquardt, E. Hadad, S. Gannot, and S. Doclo, “Optimal binau- 2684, Oct. 1999. ral LCMV beamformers for combined noise reduction and binaural cue [12] W. Herbordt and W. Kellermann, “Computationally efficient frequency- preservation,” in Proc. 14th Int. Workshop Acoust. Signal Enhancement, domain robust generalized sidelobe canceller,” in Proc. Int. Workshop 2014, pp. 288–292. Acoust. Echo Noise Control, 2001, pp. 51–55. [34] A. I. Koutrouvelis, R. C. Hendriks, J. Jensen, and R. Heusdens, “Im- [13] B.-J. Yoon, I. Tashev, and A. Acero, “Robust adaptive beamforming algo- proved multi-microphone noise reduction preserving binaural cues,” in rithm using instantaneous direction of arrival with enhanced noise suppres- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 460– sion capability,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007, vol. 1, pp. I-133–I-136. [35] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, and J. Jensen, “Relaxed [14] L. Lepauloux, P. Scalart, and C. Marro, “Computationally efficient and binaural LCMV beamforming,” IEEEACM Trans. Audio, Speech, Lang. robust frequency-domain GSC,” in Proc. 12th IEEE Int. Workshop Acoust. Process., vol. 25, no. 1, pp. 137–152, Jan. 2017. Echo Noise Control, 2010, pp. 1–4. [36] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, J. Jensen, and M. Guo, [15] S. Doclo and M. Moonen, “Superdirective beamforming robust against “Binaural beamforming using pre-determined relative acoustic transfer microphone mismatch,” IEEE Trans. Audio, Speech, Lang. Process.,vol. functions,” in Proc. 25th Eur. Signal Process. Conf., 2017, pp. 1–5. 15, no. 2, pp. 617–631, Feb. 2007. [37] J. Thiemann, M. Muller, and S. Van De Par, “A binaural hearing aid speech [16] J. Ahrens, I. Tashev, and M. Thomas, “Beamformer design using measured enhancement method maintaining spatial awareness for the user,” in Proc. microphone directivity patterns: Robustness to modelling error,” in Proc. 22nd Eur. Signal Process. Conf., 2014, pp. 321–325. Asia Pacific Signal Inf. Process. Assoc. Annu. Summit Conf., 2012, pp. 1–4. [38] J. Thiemann, M. Müller, D. Marquardt, S. Doclo, and S. van de Par, [17] E. Mabande, A. Schad, and W. Kellermann, “Design of robust superdirec- “Speech enhancement for multimicrophone binaural hearing aids aiming tive beamformers as a convex optimization problem,” in Proc. IEEE Int. to preserve the spatial auditory scene,” EURASIP J. Adv. Signal Process., Conf. Acoust., Speech, Signal Process., 2009, pp. 77–80. vol. 2016, no. 1, Dec. 2016, Art. no. 12. [18] H. Barfuss, C. Huemmer, G. Lamani, A. Schwarz, and W. Kellermann, [39] H. As’ad, M. Bouchard, and H. Kamkar-Parsi, “Perceptually motivated “HRTF-based robust least-squares frequency-invariant beamforming,” in binaural beamforming with cues preservation for hearing aids,” in Proc. Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2015, pp. 1–5. IEEE Can. Conf. Elect. Comput. Eng., 2016, pp. 1–5. [19] A. Spriet, M. Moonen, and J. Wouters, “Robustness analysis of multichan- [40] A. I. Koutrouvelis, J. Jensen, M. Guo, R. C. Hendriks, and R. Heusdens, nel Wiener filtering and generalized sidelobe cancellation for multimicro- “Binaural speech enhancement with spatial cue preservation utilising si- phone noise reduction in hearing aid applications,” IEEE Trans. Speech multaneous masking,” in Proc. 25th Eur. Signal Process. Conf., 2017, Audio Process., vol. 13, no. 4, pp. 487–503, Jul. 2005. pp. 598–602. [20] Y. Zhao and W. Liu, “Robust fixed frequency invariant beamformer design [41] H. As’ad, M. Bouchard, and H. Kamkar-Parsi, “Binaural beamforming subject to norm-bounded errors,” IEEE Signal Process. Lett., vol. 20, no. with spatial cues preservation for hearing aids in real-life complex acoustic 2, pp. 169–172, Feb. 2013. environments,” in Proc. Asia-Pacific Signal Inf. Process. Assoc. Annu. [21] R. C. Nongpiur, “Design of minimax broadband beamformers that are ro- Summit Conf., 2017, pp. 1390–1399. bust to microphone gain, phase and position errors,” IEEEACM Trans. [42] S. Gannot, D. Burshtein, and E. Weinstein, “Signal enhancement us- Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1013–1022, Jun. ing beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, Aug. 2001. [22] W.-C. Liao, Z.-Q. Luo, I. Merks, and T. Zhang, “An effective low com- [43] T. Lotter and P. Vary, “Dual-channel speech enhancement by superdirec- plexity binaural beamforming algorithm for hearing aids,” in Proc. IEEE tive beamforming,” EURASIP J. Adv. Signal Process., vol. 2006, no. 1, Workshop Appl. Signal Process. Audio Acoust., 2015, pp. 1–5. Dec. 2006, Art. no. 063297. AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1563 [44] N. Göβling, D. Marquardt, and S. Doclo, “Performance analysis of the Martin Bouchard received the B.Ing., M.Sc.A., and extended binaural MVDR beamformer with partial noise estimation in a Ph.D. degrees in electrical engineering from the Uni- homogeneous noise field,” in Proc. Hands-Free Speech Commun. Micro- versité de Sherbrooke, Sherbrooke, QC, Canada, in phone Arrays, 2017, pp. 1–5. 1993, 1995, and 1997, respectively. In January 1998, [45] D. Marquardt and S. Doclo, “Interaural coherence preservation for binaural he joined the School of Electrical Engineering and noise reduction using partial noise estimation and spectral postfiltering,” Computer Science, Faculty of Engineering, Univer- IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 7, pp. 1261– sity of Ottawa, Ottawa, ON, Canada, where he is cur- 1274, Jul. 2018. rently a Professor. In 1996, he co-founded SoftdB [46] A. Tikhonov et al., Numerical Methods for the Solution of Ill-Posed Prob- Inc., Quebec City, QC, which is still active today. lems, vol. 328. New York, NY, USA: Springer, 2013. Over the years, he has conducted research activities [47] H. Cox, “Resolving power and sensitivity to mismatch of optimum array and consulting activities with more than 20 private processors,” J. Acoust. Soc. Amer., vol. 54, no. 3, pp. 771–785, Sep. 1973. sector and governmental partners, supervised more than 50 graduate students and [48] S. M. Golan, S. Gannot, and I. Cohen, “A reduced bandwidth binaural postdoctoral fellows, and authored or coauthored more than 40 journal papers MVDR beamformer,” in Proc. Int. Workshop Acoust. Echo Noise Control, and 85 conference papers. His current research interests include signal process- Tel-Aviv, Israel, 2010, pp. 1–4. ing methods in general and machine learning, with an emphasis on speech, audio, [49] S. Rickard, “Sparse sources are separated sources,” in Proc. 14th Eur. acoustics, hearing aids, and biomedical engineering applications. He served as a Signal Proc. Conf., 2006, pp. 1–5. member of the Speech and Language Technical Committee of the IEEE Signal [50] D. R. Begault and L. J. Trejo, “Overview of spatial hearing Part I: Azimuth Processing Society from 2009 to 2011, as an Associate Editor for the EURASIP and elevation perception,” in 3-D Sound for Virtual Reality and Multime- Journal on Audio, Speech and Music Processing from 2006 to 2011, and as dia, vol. 955. Boston, MA, USA: AP Professional, 2000, pp. 31–65. an Associate Editor for the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2008 to 2009. He is a member of the Ordre des Ingénieurs du Québec. Homayoun Kamkar-Parsi received the B.A.Sc., M.A.Sc., and Ph.D. degrees in electrical engineering from the School of Information Technology and Engi- neering, University of Ottawa, Ottawa, ON, Canada, Hala As’ad received the M.A.Sc. degree in electrical in 2001, 2004, and 2009, respectively. During his engineering with a specialization in audio and speech undergraduate studies, he has obtained the highest processing in 2015 from the University of Ottawa, standing in his graduating class in Electrical Engi- Ottawa, ON, Canada, where she is currently work- neering and the Silver medal for the second Highest ing toward the Ph.D. degree in electrical and com- standing in entire Faculty of Engineering. His gradu- puter engineering. Her doctoral research focuses on ate scholarships included the Natural Sciences and robust binaural beamforming, binaural cues preser- Engineering Research Council scholarship and the vation, and source direction of arrival detection in Ontario Graduate Scholarship. Since 2009, he has been with Siemens Audi- hearing aids. Her research interests include applied ologische Technik GmbH (renamed as Sivantos GmbH in 2015 and as WS signal processing and machine learning with an em- Audiology in 2019), Erlangen, Germany, where his main work and research in- phasis on audio and speech processing, array signal clude speech/audio signal processing with applications in speech enhancement, processing, beamforming, speech enhancement, acoustic source localization, advanced multi-microphone beamforming for binaural hearing aids including and hearing aids. She is the recipient of the Natural Sciences and Engineering remote external microphones (e.g., from smartphone), source localization and Research Council Scholarship, the University of Ottawa Excellence Scholar- tracking, advanced scene analysis and machine learning (neural networks). In ship, and the Ontario Graduate Scholarship. 2018, he was selected as one of the top inventors at Sivantos. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Electrical Engineering and Systems Science arXiv (Cornell University)

A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids

Loading next page...
 
/lp/arxiv-cornell-university/a-robust-target-linearly-constrained-minimum-variance-beamformer-with-cD1OLBL2wZ

References (50)

ISSN
2329-9290
eISSN
ARCH-3348
DOI
10.1109/TASLP.2019.2924321
Publisher site
See Article on Publisher Site

Abstract

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 1549 A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids Hala As’ad , Martin Bouchard , and Homayoun Kamkar-Parsi Abstract—In this paper, a binaural beamforming algorithm for challenges in understanding and separating speech in noisy en- hearing aid applications is introduced. The beamforming algorithm vironments [1]–[3]. is designed to be robust to some error in the estimate of the tar- For noise reduction, single channel processing algorithms, get speaker direction. The algorithm has two main components: which rely on frequency and temporal information of the input a robust target linearly constrained minimum variance (TLCMV) signals, have been extensively researched such as in [4], [5]. algorithm based on imposing two constraints around the estimated direction of the target signal, and a post-processor to help with However, single channel algorithms suffer from several limi- the preservation of binaural cues. The robust TLCMV provides tations under low-SNR acoustic scenarios, especially for non- a good level of noise reduction and low level of target distortion stationary noise and multi-talkers conditions. Single channel so- under realistic conditions. The post-processor enhances the beam- lutions typically also introduce distortion and do not provide true former abilities to preserve the binaural cues for both diffuse-like speech intelligibility improvement. A notable exception is the background noise and directional interferers (competing speakers), while keeping a good level of noise reduction. The introduced algo- solution in [6] which has been found to improve speech intel- rithm does not require knowledge or estimation of the directional ligibility. The solution in [6] is based on deep neural networks interferers’ directions nor the second-order statistics of noise-only and a binary masking of some speech components in the T-F components. The introduced algorithm requires an estimate of the domain. This solution, however, does not preserve naturalness target speaker direction, but it is designed to be robust to some of the target speaker speech (high distortion), which is a concern deviation from the estimated direction. Compared with recently proposed state-of-the-art methods, comprehensive evaluations are for its use in hearing aids. It has also not been developed for the performed under complex realistic acoustic scenarios generated in case of one or two competing talkers. both anechoic and mildly reverberant environments, considering a As an alternative, microphone array processing (beamform- mismatch between estimated and true sources direction of arrival. ing) has been widely used in modern hearing aids, leading to Mismatch between the anechoic propagation models used for the directionally sensitive hearing aids [7]. Binaural hearing aids design of the beamformers and the mildly reverberant propagation models used to generate the simulated directional signals is also have also recently been introduced in the market. Binaural hear- considered. The results illustrate the robustness of the proposed ing aids have a hearing aid device at each ear, each possibly algorithm to such mismatches. equipped with multiple microphones, and the devices are ca- pable to transmit signals or information from one side to the Index Terms—Robust LCMV, propagation model mismatch, steering vector mismatch, binaural cues preservations, noise other through a “binaural wireless link”. Microphone arrays can reduction, binaural hearing aids. provide good noise reduction with low distortion, and the use of additional microphones and different microphone geome- try in binaural hearing aids can lead to further improvements I. INTRODUCTION in the directional response, compared to monaural single-sided HEARING aid is a common and effective solution to sen- beamforming. However, even binaural hearing aids have still not sorineural hearing loss. Despite enormous advances in achieved the required robustness in case of real-life complex en- hearing aid technology, the performance of hearing aids under vironments [8]. The performance of binaural beamformers can noisy environments remains one of the most common complaints be significantly affected by a mismatch or an error between the from hearing aid users [1], [2], and hearing-impaired people face target source propagation model assumed for the beamformer design and the actual physical target source propagation [9], Manuscript received December 19, 2018; revised April 24, 2019 and June [10]. This includes errors in the estimated target direction of 16, 2019; accepted June 17, 2019. Date of publication June 21, 2019; date of arrival (DOA) used in the beamformer algorithms, i.e., target current version July 1, 2019. This work was supported in part by a Natural Sci- ences and Engineering Research Council Discovery grant. The associate editor DOA mismatch. This kind of mismatch can be generated from coordinating the review of this manuscript and approving it for publication was imperfect target DOA estimation schemes, from small head Prof. Simon Doclo. (Corresponding author: Martin Bouchard.) movements of the hearing aid user, and from multipath prop- H. As’ad and M. Bouchard are with the School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada agation. To address this problem, several acoustic beamforming (e-mail: hasad056@uottawa.ca; martin.bouchard@uottawa.ca). methods robust to the mismatch in target propagation models H. Kamkar-Parsi is with WS Audiology 91058, Erlangen, Germany (e-mail: have been introduced in the literature [11]–[21], and some of homayoun.kamkarparsi@sivantos.com). Digital Object Identifier 10.1109/TASLP.2019.2924321 these solutions are not specifically for binaural hearing aids. This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ 1550 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Unfortunately, most of the previous work rely on sophisticated performance when the number of sources increases. A relaxed Voice Activity Detection (VAD), speech presence probability version of the joint BLCMV has been proposed in [35]. In this estimation, and/or SNR estimation. These can become diffi- relaxed BLCMV, tunable parameters have been used for each cult to measure in complicated multi-talker reverberant envi- directional interferer, in order to separately control the trade-off ronments, with speakers having variable activity patterns. An between the binaural cues preservation and the noise reduction interesting solution for hearing aids based on inequality con- for each interferer. The BLCMV and all its extensions, i.e., [31]- strained optimization has been proposed in [22] and discussed [35], require knowledge of the propagation models for the di- in [23], to increase the robustness to target DOA mismatch. rectional interferers in addition to the target source (directivity However, since this design uses extra constraints for directional vectors, steering vectors, Relative Acoustic Transfer Functions sources to increase robustness to DOA mismatch, this can lead (RATF)). As will be shown in this paper, this can limit the per- to low degrees of freedom available for residual noise reduc- formance of these approaches, as they suffer from errors in the tion (e.g., low number of adaptive “nulls”) in case of limited estimated propagation models. In addition, the BLCMV and its number of available microphones signals. In addition, it re- variations do not have the ability to preserve the binaural cues quires an estimation of the DOA for the directional interferer of the diffuse-like background noise. As an attempt to design a sources. BLCMV beamformer that does not depend on the propagation All the beamforming designs in [11]–[21] were not designed models (and directions of arrival) of the directional sources, a set to preserve the binaural cues of the residual directional inter- of pre-determined RATFs distributed around the head have been ferers and diffuse-like noise in the binaural output signals. Sev- used for beamforming design in [36]. Each RATF is responsi- eral binaural beamforming solutions have been introduced to ble for preserving the binaural cues of the directional sources preserve some of the binaural cues of these components, while coming from certain directions. Increasing the number of pre- also preserving the target signal and achieving a good noise re- determined RATFs decreases the effect of the mismatch between duction level. Under some assumptions (e.g., accurate direction the true and the pre-determined RATFs, but it also requires a of arrival estimates), binaural beamforming processing such as larger number of microphones in order to achieve a good per- the second and third methods in [24] can provide directional formance. However, in hearing aids applications, only a small noise reduction and preserve the binaural cues of the target sig- number of microphones are normally available for the binaural nal and the directional interferers, depending on the number of beamformer. available microphones. However, this binaural beamforming is In order to preserve the binaural cues for directional inter- not designed to preserve the binaural cues of the diffuse-like ferers and diffuse-like noise components without a knowledge background noise. The Multichannel Wiener Filter (MWF) is of the propagation model of the directional interferers, a binary the basis of several proposed solutions that aim to preserve the decision/classifier algorithm common to the left and right beam- binaural cues. Extensions of the MWF have been proposed in former outputs for each time-frequency (T-F) bin was proposed [25]–[29] as attempts to preserve the binaural cues for the dif- in [37], [38]. A challenge for this classification algorithm is its ferent acoustic scene components. A potential challenge for the applicability in low input SNR environments, as most T-F bins MWF and its extensions is the need for an accurate estimate of can be classified as noise-dominant, resulting in low SNR im- the second order statistics for the noise-only components, which provement and an attenuated target output, as illustrated in [39]. can be difficult to achieve in complex acoustic environments, for As an attempt to enhance the performance of this method, the example multiple talkers with time-varying activity patterns and classification mechanism was later modified to use the output statistics. Detailed information of the MWF and its extensions SNR instead of the input SNR [40]. However, this method re- can be found in [30]. quires an estimation of the second order statistics of the noise The Binaural Linearly Constrained Minimum Variance and the target components, which, as previously described, can method (BLCMV) has been introduced in [31] and a comprehen- be challenging in some real-life time-varying multi-talker envi- sive theoretical analysis has been provided in [32]. The BLCMV ronments. In our recent work [41], an algorithm based on clas- is capable to provide a good trade-off between noise reduc- sification and mixing of binaural signals at each T-F bin was in- tion and cues preservations for a limited number of interferer troduced. Three classification criteria were proposed, based on sources. As an attempt to enhance the noise reduction abilities the power, power difference, and complex coherence computed of the BLCMV, an optimal BLCMV has also been proposed from: 1) binaural beamformer output signals (with good level of in [33]. However, the optimal BLCMV is capable to preserve noise reduction) and 2), original binaural noisy signals (or al- the binaural cues for just one directional interferer as well as ternatively, other binaural signals with cues preserved but with the target source. As another variation of the BLCMV, joint an intermediate level of noise reduction [42], [43]). The com- BLCMV, which jointly estimates the left and right beamformers plex coherence criterion provided better noise reduction over the of two hearing aids, has been introduced in [34] in order to en- other classification criteria. hance the binaural cues preservations abilities of the BLCMV. In this work, we contribute in 1) designing a binaural beam- The joint BLCMV needs one constraint per interferer to pre- former which is robust to mismatch in target propagation models, serve the binaural cues of the interferers, unlike the BLCMV 2) proposing a modified post-processor method preserving the which uses two constraints to preserve each interferer. However, binaural cues of all acoustic scene components (target, diffuse- since a limited number of microphones are available in binaural like background, directional interferers), with a good tradeoff hearing aids, the joint BLCMV can still face a degradation of between noise reduction and cues preservations. For the first AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1551 contribution, we introduce the Robust TLCMV which is robust the binaural wireless links: to a mismatch (error) in the directivity vector assumed for the tar- y (f, t)= x (f, t)+ v (f, t)+ n (f, t) (1) m in,m in,m in,m get signal. This is achieved by designing a binaural beamformer with a wider beam around the estimated target direction. For the where m is the microphone index, and second contribution, the binaural cues preservation are achieved by using a simplified and improved version of the coherence- m =1, ..., M/2,M/2+1, ..., M . based post-processor method in [41], for classification and mix- left side right side ing of binaural signals. Both the proposed Robust TLCMV and The front left (FL) microphone has index m =1, and the the post-processor do not rely on any assumption for the prop- front right (FR) microphone has index m = M/2+1. These agation model (or DOAs) of the interferers (competing speak- microphones are the reference microphone for the left- ers). The proposed TLCMV with post-processor is also found side beamformer and the right-side beamformer, respectively. to be robust to both target DOA mismatch and mismatch be- x ,v , and n at the mth microphone are the tar- tween the anechoic propagation model used for the beamformer in,m in,m in,m get speaker, the sum of directional interferer speakers, and the design and the mildly reverberant propagation models used to diffuse-like background noise components, respectively. f is the generate the directional signals in the simulations. The proposed frequency index and t is the time (frame) index. solution does not rely on target VAD detection, speech proba- By stacking the input microphone signals in M dimensional bility presence estimation, or SNR estimation, which can be dif- vectors, the input signals from the left and right microphones ficult to compute in complex real-life time-varying multi-talker can be written as in (2): environments. In order to study the robustness of the proposed algorithm y(f, t)= x(f, t)+ v(f, t)+ n(f, t) (2) to different types of mismatches, comprehensive validations are conducted through simulations using acoustic scenarios gener- where, y(f, t)=[y (f, t),y (f, t), ..., y (f, t)] 1 2 M ated in mildly reverberant environments and anechoic environ- ments (i.e., with and without mismatch in the sets of directiv- x(f, t)=[x (f, t),x (f, t), ..., x (f, t)] in,1 in,2 in,M ity/steering vectors), and for scenarios with and without DOA v(f, t)=[v (f, t),v (f, t), ..., v (f, t)] in,1 in,2 in,M mismatch. Comparisons are performed with the recently pro- posed state-of-the-art Binaural Minimum Variance Distortion- T n(f, t)=[n (f, t),n (f, t), ..., n (f, t)] . in,1 in,2 in,M less Response (BMVDR) beamformer, the BMVDR with partial noise estimation (BMVDR-n) beamformer [29], [44], [45] and Assuming that s is a target source signal coming from angle with the BLCMV beamformers which uses constraints to atten- θ , and s is the ith interferer source signal (competing talker) x vi uate interferers. coming from angle θ , the target component and the sum of vi This paper is organized as the following. Section II provides the directional interferer components at the microphones can be a detailed description of the system notations and the beam- written in terms of the directivity vectors d(f, θ) as in (3) and forming microphone configurations that are used throughout (4), respectively: this paper. Section III provides a summary of the previously x(f, t)= d(f, θ )s (f, t) (3) x x proposed BLCMV, BMVDR and BMVDR with partial noise estimation algorithms. Section IV provides some detailed in- N formation about the new beamforming algorithm and the post- v(f, t)= d(f, θ )s (f, t). (4) vi vi processing algorithm proposed in this work. Section V explains i=1 the performance metrics used in this work. Section VI explains The vector d(f, θ )=[d (f, θ ), ..., d (f, θ )] is the tar- x 1 x M x the experimental setup. Finally, Section VII provides the simu- get directivity vector, which is the frequency response between lation results of the proposed algorithms, and performance com- the target source and each microphone. Likewise, the vector parisons with the state-of-the-art algorithms. d(f, θ )=[d (f, θ ), ..., d (f, θ )] is the interference di- vi 1 vi M vi rectivity vector for source s , which is the frequency response vi between the interference source s and each microphone. N is vi II. SYSTEM NOTATIONS AND REFERENCE the number of directional interferers in the acoustic scenario con- BEAMFORMING PROCESS sidered. In hearing aids, the directivity vectors include the head A. System Notations shadow effect and other head/ear related effects (e.g., pinnae filtering), therefore Head-Related Transfer Functions (HRTFs) Binaural hearing aid units with two microphone arrays of are used for the directivity vectors in the beamformer designs. M/2 microphones at each ear, i.e., M microphones in total, and The input target signal at the reference microphone x (f, t) ref ideal binaural wireless links between the units (no jitter, delay, can be defined as in (5): packet loss, etc.) are considered. A Short Time Fourier Trans- form (STFT) is used in order to represent the input signals in x (f, t)= d (f, θ )s (f, t). (5) ref ref x x the Time-Frequency (T-F) domain. The input noisy microphone signals in the T-F domain can be written as in (1), with the micro- If the reference microphone is the FL, then x (f, t)= ref phone signals transmitted from one side to the other side through x (f, t). If the reference microphone is the FR, then in,1 1552 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 (10) and (11), to generate the left and right beamformer outputs z (f, t) and z (f, t), respectively. The binaural beamformer on l r the left side aims to extract the target signal as received at the FL microphone (i.e., using the FL microphone as a reference microphone). The binaural beamformer on the right side aims to extract the target signal as received at the FR microphone (i.e., using the FR microphone as a reference microphone). z (f, t)= w (f, t)y(f, t) (10) l l Fig. 1. 2 + 2 microphone configuration (dotted lines represent signals trans- mitted through a wireless link). z (f, t)= w (f, t)y(f, t) (11) x (f, t)= x (f, t). Likewise, d (f, θ ) is the tar- ref in,M/2+1 ref x III. REVIEW OF PREVIOUS BINAURAL BEAMFORMING get directivity vector (or HRTF) at the reference micro- ALGORITHMS phone, with d (f, θ )= d (f, θ ) if FL, and d (f, θ )= ref x 1 x ref x In this section, we review the BLCMV, the BMVDR (which d (f, θ ) if FR. M/2+1 x is a special case of the BLCMV) and the BMVDR extension A correlation matrix for the target component can be defined with partial noise estimation (BMVDR-n). as in (6): R (f)= E{x(f, t)x (f, t)} A. Binaural LCMV (BLCMV) H ∗ = E{d(f, θ )s (f, t)d (f, θ )s (f, t)} (6) x x x The BLCMV [31], [32] is a general form of the BMVDR [24], H 2 = d(f, θ )d (f, θ )E{|s (f, t)| }. x x x where both of these beamformers are based on the constrained minimization of the beamformer output power. However, the The superscript H refers to “Hermitian” which is the complex BLCMV is derived under multiple linear constraints, including conjugate transpose, the superscript “∗” refers to the complex a unity gain constraint in the target signal direction, which is also conjugate, and E{.} refers to the expectation operator. Similarly, used in the BMVDR. In the BLCMV, having multiple constraints the correlation matrix of the sum of directional interferer com- means that small gains are specified in directions corresponding ponents can be defined as in (7) and (8). Directional interferers to interferer sources. The left and right beamformer coefficients are assumed to be uncorrelated with each other. can be derived by the following constrained minimizations in R (f) (12) and (13), respectively. For simplicity, the f and t index are omitted here. = E{v(f, t)v (f, t)} ⎧ ⎫ H H min w (R )w subject to C w = g (12) N N l ⎨ ⎬ y l l l = E d(f, θ )s (f, t) d(f, θ )s (f, t) vi vi vi vi H H ⎩ ⎭ min w (R )w subject to C w = g (13) i=1 i=1 y r r r The constraint matrix C includes the directivity vectors H 2 = d(f, θ )d (f, θ )E{|s (f, t)| } (7) vi vi vi (HRTFs) of each constraint direction, i.e., C =[d(f, θ ), d i=1 (f, θ ), ..., d(f, θ )]. The left gain vector is g =[ςd (f, θ ), v1 vk l 1,l x The correlation matrix of the diffuse-like background noise ηd (f, θ ), ..., ηd (f, θ )] and the right gain vector is 1,l v1 1,l vk component is defined as in (8): g =[ςd (f, θ ),ηd (f, θ ), ..., ηd (f, r M/2+1,r x M/2+1,r v1 M/2+1,r θ )] . The scalars ς and η should be in the range between 0 and vk R (f)= E{n(f, t)n (f, t)}. (8) 1. In order to guarantee the near distortionless response of the Assuming that the target component, the sum of directional target, ς should be close to 1. The value of η controls the noise interferer components, and the diffuse-like background noise reduction level. The number of constraints k available for the component are uncorrelated, the correlation matrix of the input interferers depends on the number of available microphones, noisy signals can be written as in (9): such that k ≤ M − 2. In this work, as the 2 + 2 microphone configuration is used, k ≤ 2. In other words, assuming no DOA R (f)= R (f)+ R (f)+ R (f). (9) y x v n mismatch, the BLCMV [13], [14] can preserve the binaural cues for only two directional interferers when the 2 + 2 microphone B. Beamformer Microphone Configuration configuration is used. In this work, a binaural hearing aid with two microphones Using the complex Lagrangian multiplier method to solve the on each side of the head is used, as illustrated in Fig. 1. We constrained optimization problems in (12) and (13), the left and take advantage of the availability of two bidirectional binaural right binaural beamformer coefficients w and w are as in (14) l r wireless links to transmit two microphone signals from each and (15), respectively: side to the other side. Thus, the beamformer on each side has −1 H −1 −1 direct access to four microphone signals. We will refer to this w = R C(C R C) g (14) l y y l design as the 2 + 2 microphone configuration. The binaural −1 H −1 −1 w = R C(C R C) g . (15) r y y r beamformers are used to process the input noisy signals as in AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1553 Note that some level of diagonal loading may be required IV. THE PROPOSED BEAMFORMING ALGORITHM in practice, to regularize the matrix inversions [46]. Different In this section, a binaural TLCMV robust to target DOA mis- options for the choice of correlation matrices have previously match is first introduced, which does not require estimates of been introduced for the BLCMV [32]. In (14) and (15), the sim- the interferers’ DOAs or propagation models. It should be noted plest option from [32] which uses the noisy microphone signals that in some previous work such as [31]–[36], the name BLCMV correlation matrix R is considered. By using the noisy mi- is used for beamformers that use multiple constraints in order crophone signals correlation matrix R , there is no need for a to attenuate directional interferers (in addition to the constraint sophisticated target voice activity detector (VAD) to estimate to preserve the target). However, in this work, we use the name the noise components correlation matrices R and R .The two v n TLCMV (Target-LCMV) for beamformers that use more than other suggested options in [32] are using either the overall noise one (normally two) constraints for the target, and no constraint components correlation matrix (R + R ) or the background v n for the interferers. The proposed Robust TLCMV requires an es- diffuse-like noise correlation matrix R .Using R + R or n v n timate of the target DOA, but the true target DOA can be within R in the beamformer coefficients computation increases the +10 degrees of the estimated target DOA, as will be shown robustness to mismatch between the estimated target directivity through experiments. This is a realistic condition, however the vector and the actual target directivity vector [47], because us- actual estimation of the target DOA is not considered in this ing the noise components correlation matrix (either R + R v n paper. or R ) in the minimization criteria of (12) and (13) does not Since the Robust TLCMV distorts the binaural cues for the lead to target components minimization. At the opposite, dis- directional interferers and the background diffuse noise, a post- tortion/attenuation of the target component in the beamformer processor which does not require directivity vectors informa- output signal can occur in the presence of mismatch if R is tion is also proposed in this section, to provide a good level of includes the target component). used (since R binaural cues preservation while providing good overall noise However, estimating R is often a difficult task in non- reduction. stationary multiple talkers conditions. And even though R can be more easily estimated, for beamformers that do not rely on A. The Proposed Robust TLCMV Beamforming Algorithm constraints at interferer directions to reduce the interferers (such as the BMVDR or our proposed method, as we will explain later), Aiming to design a binaural beamformer that provides little using R leads to a solution that is not capable of significantly suppression for sources from angles within a small angular re- reducing the interferers. On the other hand, using R in a beam- gion around the estimated target direction, the Robust TLCMV is former such as the BLCMV [31], [32] can be sufficient as long introduced. Two constraints with unity gains are used in the mid- as there are constraints in the interferers directions, since the re- dle of each side of a target zone, which consists of +10 degrees duction of interferers is then determined by the value of a small around the estimated target DOA. For example, if the estimated constraint gain η. target direction is at 0 degree, the beamformer assumes that the target can be anywhere between −10 to 10 degrees, and two unity constraints are used at +5 degrees, in the middle of each B. Binaural MVDR (BMVDR) and Its BMVDR-n Extension side in the estimated target zone. The constraints of the Robust The BMVDR [48] is a special case of the BLCMV, with a TLCMV are as described in (16) and (17), with the beamformer single constraint in the estimated target direction. Therefore, the coefficients computed as in (14) and (15): constraint matrix C can be reduced to d(f, θ ), and the gain vec- C w = g C =[d(f, θ +Δ), d(f, θ − Δ)] l l x x tors g and g can be reduced to d (f, θ ) and d (f, θ ), l r 1,l x x M/2+1,r (16) respectively. The BMVDR preserves the binaural cues of the tar- g =[d (f, θ +Δ),d (f, θ − Δ)] l 1,l x 1,l x get in case of no target DOA mismatch; however, it distorts the C w = g C =[d(f, θ +Δ), d(f, θ − Δ)] r r x x binaural cues for the directional interferers and the background (17) noise. As an attempt to enhance the binaural cues preservation g =[d (f, θ +Δ),d (f, θ − Δ)] r M/2+1,r x M/2+1,r x ability of the BMVDR for the noise components, a small portion of the original noisy signal can be added to the BMVDR, lead- where θ ± Δ are the directions of unity constraints in the mid- ing to the BMVDR with partial noise estimation (BMVDR-n). dle of the assumed target zone. The gain values used in g and The idea of adding a small portion of the original noisy signal g ensure that the beamformer output for a source from DOAs to the processed output was introduced in [29]. More details θ +Δ and θ − Δ has the same level as the one found at the x x of the BMVDR and the BMVDR-n can be found in [44], [45]. input reference microphone for that same source, which we will Many extensions to the BMVDR beamformer were previously refer to as a “unit gain” (i.e., the gain is relative to the input ref- introduced in the literature, such as the work in [24]. However, erence microphone level). Using two unity constraints around in this work we will compare our proposed algorithm with the the estimated target direction forces the beamformer to have BMVDR-n, because of its ability to preserve the binaural cues a wider beam in the direction of the target. Figs. 2 and 3 illus- for both the directional interferers and the diffuse-like back- trate beampatterns of a fixed BMVDR beamformer with a single ground noise. For a fair comparison of our proposed algorithm constraint at 0 degree under 2-D (cylindrically isotropic) diffuse with the BMVDR and BMVDR-n algorithms, the noisy corre- noise conditions and the beampatterns of a fixed Robust TLCMV lation matrix R will be used as for the BLCMV. beamformer with constraints at +5 and –5 degrees under 2-D y 1554 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Fig. 2. Beampatterns of BMVDR and Robust TLCMV at different frequen- cies, shown for left side. Fig. 4. The Robust TLCMV with CCMBB post-processor (dotted lines rep- resent signals transmitted through a wireless link). post-processor based on time-frequency (T-F) classification and mixing of binaural signals is proposed in this section, as Fig. 4 shows. It is an updated version of our recent work [41], to provide a simpler and improved classification and mixing algorithm. Fig. 3. Beampatterns of BMVDR and Robust TLCMV at different frequen- A complex coherence is computed for classification, as it cies, shown for right side. gives the ability to exploit two classification decisions: one for the magnitude and one for the phase. We will thus refer to the post-processing algorithm as the Coherence-based Classi- diffuse noise conditions at different frequencies for the left and fication and Mixing for Binaural Beamforming (CCMBB). The right side. The beampatterns are obtained with HRTFs measured complex coherence is computed on each side, between two sig- from behind-the-ear (BTE) hearing aid units on a mannequin in nals locally available on each side. The first signal is the binaural an anechoic environment, using four microphone signals, i.e., beamformer outputs (z (f, t) or z (f, t), depending on the side), l r 2 microphones at each ear. The same HRTFs are also used to with a good level of interferers and diffuse noise reduction. The produce the 2-D diffuse noise correlation matrix required to pro- second signal is the front microphone noisy signal (y (f, t) or duce Figs. 2 and 3. The beampattern BP (θ) is computed as i y (f, t)), which fully preserves the binaural cues for all acoustic in (18): scene components. Alternatively, at the cost of increased com- plexity, the second signal could be a signal with an intermediate H 2 BP (θ)= |w (f)d(f, θ)| (18) level of interferers and diffuse noise reduction but with binau- ral cues still preserved, such as the output from a common gain where w is the left binaural beamformer coefficients w or i l beamforming approach (e.g., [43], without the post-processing). the right binaural beamformer coefficients w . Figs. 2 and 3 The left complex coherence C (f, t) and right com- zl,yl show that for higher frequencies the BMVDR has a narrow plex coherence C (f, t) are computed as in (19) and (20), zr,yr beam around the target direction, i.e., 0 degree. This narrow respectively: beam around the target direction indicates that the BMVDR is not robust to small target DOA mismatch. However, the Robust Γ (f, t) zl,yl TLCMV has a wider beam around the target direction over all C (f, t)=  (19) zl,yl Γ (f, t)Γ (f, t) frequency components, therefore by design it is more robust to yl,yl zl,zl target DOA mismatch. There is a trade-off between robustness Γ (f, t) zr,yr C (f, t)= (20) to target DOA mismatch and noise reduction, because the use zr,yr Γ (f, t)Γ (f, t) yr,yr zr,zr of an additional constraint for the target in the TLCMV design leads to a reduction in the degrees of freedom available for noise 2 2 where, Γ = E{|y (f, t)| }, Γ = E{|y (f, t)| }, Γ yl,yl l yr,yr r zl,zl reduction (e.g., the positioning of adaptive “nulls”). However, 2 2 = E{|z (f, t)| }, Γ = E{|z (f, t)| } are, respectively, l zr,zr r due to the sparsity and the disjoint properties of speech signals in auto-power spectral densities (auto-PSDs) for the front micro- practice [49], there are often only one or sometimes two domi- phone noisy signals and the binaural beamformer outputs, and nant directional interferer sources active at each time-frequency ∗ ∗ Γ = E{|z (f, t)y (f, t)|}, Γ = E{|z (f, t)y (f, t)|} zl,yl l l zr,yr r bin, and having two degrees of freedom left for the beamformer are cross-PSDs between the binaural beamformer outputs and can be sufficient for good adaptive noise reduction, as will be the front microphone noisy signals. illustrated later in this paper. For binaural cues preservation of a directional source, at low frequency components with wavelengths longer than the diam- B. Post-Processor Using Modified Coherence-Based eter of the head the interaural phase difference (IPD, defined Classification and Mixing Binaural Beamforming (CCMBB) in the next section) is more important than the interaural level The proposed Robust TLCMV of the previous section can difference (ILD, also defined in the next section) [50]. On the distort the binaural cues for the directional interferers and the other hand the ILD is more important for high frequencies with diffuse-like background noise. In order to achieve better bin- wavelength components smaller than the head diameter, i.e., for aural cues preservations for these interferers and for diffuse frequencies higher than 1500 Hz. In the proposed CCMBB, on noise components, while at the same time achieving a good each side for low frequency components (<1500 Hz) the mag- level of overall reduction for interferers and diffuse noise, a nitude of the binaural output is simply the magnitude of the AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1555 beamformer output (no mixing, no classification). This is be- noise” signal u are uncorrelated (as stated in a previous sec- ref cause the output magnitude does not play a role in preserving tion). Next, if a target distortionless response is assumed for the the phase-based IPD binaural cues of the interferers (important at beamformer, i.e., z = x , (21) becomes: x ref low frequencies), and the magnitude-based ILD is not important j(z −u ) u ref E{|x | + |z ||u |e } at low frequencies. Therefore, the magnitude of the binaural out- ref u ref C = . (22) z,y put at low frequencies keeps the emphasis on interferers/noise 2 2 2 2 E{|x | + |z | } E{|x | + |u | } ref u ref ref reduction. Similarly, in the proposed CCMBB, on each side for high frequency components (>1500 Hz) the phase of the bin- At low frequencies, a larger phase change |z −u | u ref aural output is simply the phase of the beamformer output (no between the input and output interferers/noise components is mixing, no classification). This is because the output phase does more likely to lead to distortion of interferers/noise IPD bin- not play a role in preserving the magnitude-based ILD binaural aural cues between the left and right binaural outputs, because cues of the interferers (important at high frequencies), and the such changes do not occur symmetrically in the beamformer phase-based IPD is not important at high frequencies. Therefore, on each side of a binaural system. Similarly, at high frequen- the phase of the binaural output at high frequencies keeps the cies a larger magnitude change ||z |−|u || between the input u ref emphasis on interferers/noise reduction. and output interferers/noise components (i.e., a larger interfer- Another type of binaural cues will be considered in this work, ers/noise reduction) is more likely to lead to distortion of inter- for the preservation of the spatial impression of background dif- ferers/noise ILD binaural cues between the left and right binau- fuse noise: the Magnitude Squared Coherence (MSC, defined ral outputs. Evaluating from (22) the impact on C of different z,y in the next section). The above processing implies that in the |z −u | phase changes and different interferers/noise re- u ref proposed CCMBB the magnitude information of binaural out- duction levels, we can then use C as a classification criterion z,y put signals is considered to be less important for preservation for the CCMBB binaural output phase at low frequencies, where of MSC at low frequencies, and that the phase information of IPD is important. Likewise, evaluating from (22) the impact on binaural output signals is considered to be less important for C of different ||z |−|u || magnitude changes and differ- z,y u ref preservation of MSC at high frequencies. ent interferers/noise reduction levels, we can then use C as a z,y Therefore, two classification and mixing systems need to be classification criterion for the CCMBB binaural output magni- developed based on the complex coherence: one for the bin- tude at high frequencies, where ILD is important. aural output signal phase at low frequencies, and one for the First, we consider the effect of the phase change |z − binaural output signal magnitude at high frequencies. To better u | for some important cases. The effect is more directly ob- ref explain the rationale for the phase and magnitude classification served on the coherence phase value |C |. From the numer- z,y performed at each T-F, a few additional equations are provided ator of (22), we see that a small coherence phase value |C | z,y below. These equations are not required in the actual implemen- occurs if there is a small phase change |z −u | (regard- u ref tation of the CCMBB post-processor, unlike (19), (20). For sim- less of the interferers/noise reduction level, i.e., level of |z | plicity, the left l and right r indices are dropped in these equations relative to |u | and |x |). Another case where a small co- ref ref since the same equation applies to each side, and the time (frame) herence phase value |C | occurs is when there is a large z,y and frequency indices are also dropped. As before, x repre- ref |z −u | phase change with a strong interferers/noise re- u ref sents the target component at the reference microphone, and we duction (|z | small relative to |u | and |x |). A case pro- u ref ref define u = v + n as the sum of the directional inter- ref ref ref ducing a large coherence phase value |C | is when a large z,y ferers components v and the diffuse noise components n ref ref |z −u | phase change is combined with weak interfer- u ref at the reference microphone. The corresponding components in ers/noise reduction (|z | level similar to |u | and |x | levels). u ref ref the beamformer output signal are written as z and z . There- x u Since the case with a large coherence phase value |C | z,y fore, we have y = x + u as the noisy input signal at ref ref ref mentioned above includes both weak interferers/noise reduc- the reference microphone, and z = z + z as the beamformer x u tion and increased risk of binaural IPD cues distortion (from the output, on each side and for each time and frequency bin. large |z −u | phase change), the CCMBB does not use u ref Considering z and y as zero-mean random variables and ref the beamformer output phase in such case. However, to avoid using the polar notation for these variables, the complex coher- losing cases with good interferers/noise reduction levels, the ence becomes as follows, where E{.} refers to an averaging CCMBB keeps the beamformer output phase for smaller val- process over consecutive frames in each frequency bin: ues of |C | (which includes some cases with good or weak z,y E{zy } amount of interferers/noise reduction, as well as large or small ref C = z,y |z −u |). The resulting set of equations for the CCMBB u ref 2 2 E{|z| } E{|y | } ref binaural output phase component at low frequencies is: j(z −x ) j(z −u ) x ref u ref E{|z ||x |e +|z ||u |e } x ref u ref (y (f, t)), |(C (f, t))| >μπ l zl,yl =   . 2 2 2 2 (z (f, t)) = (23) m,l E{|z | +|z | E{|x | + |u | } x u ref ref (z (f, t)), |(C (f, t))|≤ μπ l zl,yl (21) (y (f, t)), |(C (f, t))| >μπ r zr,yr (z (f, t)) = (24) The last part of (21) assumes that components from the target m,r (z (f, t)), |(C (f, t))|≤ μπ r zr,yr signal x and components from the “interferers plus diffuse ref 1556 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 The threshold value is a tunable parameter μπ (0 <μ< ILD cues distortion. To help the balance and keep the binau- 1), where a lower μ leads to lower IPD binaural cues errors ral ILD cues distortion at a reasonable level, for the alternate (and lower MSC errors), but also to lower interferers and diffuse condition with |C | higher than a threshold T the CCMBB z,y noise reduction. A value of μ =0.1 has been found to provide puts more weight on the preservation of the binaural ILD cues, satisfactory experimental results in our simulations. i.e., more weight on the magnitude of the noisy reference input Next, we consider the effect of the magnitude change signal. Essentially this simply means using a value α> 0.5 in ||z |−|u || for some important cases. The effect is more di- (25), (26). This approach has been validated in our experiments u ref rectly observed on the coherence magnitude value |C |.From using the objective metrics presented in the next section, where z,y (22), we see that a case producing a smaller coherence magnitude it was found that a value of α =0.7 provided satisfactory exper- value |C | is when there is good interferers/noise reduction per- imental results (good overall trade-off between interferers/noise z,y formance (small |z | level relative to |u | and |x |, and there- reduction and ILD distortion). u ref ref fore large ||z |−|u ||). On the other hand, if ||z |−|u || The threshold values T (f) and T (f) in (25), (26) are com- u ref u ref l r is small (weak interferers/noise reduction, |z | level similar to puted by taking the magnitude of the complex coherences esti- |u | and |x | levels), the value of |C | depends on the mated at each frequency bin from 219 ms of signals (40 frames, ref ref z,y |z −u | phase change: if there is a large |z −u | with overlap). This is unlike the coherence functions in (19), u ref u ref phase change it leads to a smaller coherence magnitude value (20), (23)–(26), which are estimated with a shorter total time |C |, and if there is a small |z −u | phase change it of 59 ms (only 10 frames, with overlap). The total time close to z,y u ref leads to a larger coherence magnitude value |C | (closer to 200 ms was selected so that the method with a threshold could be z,y 1.0). used in future work under dynamic conditions (e.g., with head We note that unlike the low frequency classification with co- movements and dynamic sources). Using the CCMBB algorithm | considered earlier, here there is herence phase value |C as a post-processor for the proposed Robust TLCMV, we will z,y no case which has both a weak interferers/noise reduction and refer to the resulting beamforming algorithm as the “Robust TL- an increased risk of binaural cues distortion (i.e., a higher risk CMV with CCMBB”. of binaural ILD cues distortion from a large magnitude change ||z |−|u ||). This is because by definition ||z |−|u || is u ref u ref V. PERFORMANCE MEASUREMENT indicative at the same time of the interferers/noise reduction To evaluate the performance of the proposed algorithm and the level (a larger value of ||z |−|u || is better) and the risk of u ref state of the art BLCMV, BMVDR and BMVDR-n algorithms, binaural ILD cues distortion (a smaller value of ||z |−|u || u ref several objective metrics are used in this work. First, to measure is better). Therefore, the approach proposed for the CCMBB the ability of the binaural beamformers to preserve the binaural binaural output magnitude at high frequencies is less drastic or cues, the interaural information between the left and right side less binary than the previous approach for the CCMBB binaural signals is required. Formally, the Interaural Transfer Function output phase at low frequencies, and it involves mixing together (ITF) is defined as the ratio of a directional source component the beamformer output magnitude and the noisy reference input from the left to the right ear [30]. For simplicity, the ITF, ILD magnitude. The resulting set of equations for the binaural output and IPD metrics are developed below for the case of a single magnitude at high frequencies is (at each T-F bin): source, more specifically a single interferer source. In the case of several interferers, in this work we apply the same equations if |C (f, t)| <T (f) zl,yl l ⎪ to an equivalent interferer signal which consists of the sum of α|z (f, t)| +(1 − α)|y (f, t)| l l all interferer signals. All the performance measurements in this |z (f, t)| = (25) m,l ⎪ section are frequency dependent metrics; however, the frequency if |C (f, t)|≥ T (f) zl,yl l ⎪ index f is omitted for simplicity. The input ITF for an interferer (1 − α)|z (f, t)| + α|y (f, t)| l l component can be computed as in (27), where Γ (vref,r),(vref,l) is the cross-PSD between the interferer component at the front if |C (f, t)| <T (f) zr,yr r left and front right reference microphones, and Γ (vref,l),(vref,l) α|z (f, t)| +(1 − α)|y (f, t)| r r is the auto-PSD of the interferer component at the front left |z (f, t)| = . (26) m,r reference microphone: if |C (f, t)|≥ T (f) zr,yr r (vref,r),(vref,l) (1 − α)|z (f, t)| + α|y (f, t)| r r ITF = . (27) in,v (vref,l),(vref,l) The mixing parameter α (0 ≤ α ≤ 1) affects the trade-off Similarly, the ITF between the left and right beamformer out- between the level of interferers/noise reduction and the preser- puts can be described by (28): vation of the binaural ILD cues. As described in an earlier para- graph, the case with a good level of interferers/noise reduction (zv,r),(zv,l) ITF = (28) occurs for a smaller value of |C |, and to preserve this case the out,v z,y (zv,l),(zv,l) CCMBB selects the condition with |C | lower than a threshold z,y T as the condition which puts more weight on interferers/noise where z is the interferer component in the beamformer output reduction, i.e., more weight on the magnitude of the beamformer signals. The errors (or losses) in the Interaural Level Difference output. This is at the expense of increasing the risk of binaural (ILD) and Interaural Phase Difference (IPD) binaural cues are AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1557 defined as in (29) to (34): where in the above cross- and auto-PSDs x , v and n ref ref ref refer to the target, interferers and diffuse noise components at ILD = 10 log 10|ITF | (29) in,v in,v a reference microphone, while z , z and z refer to the corre- x v n sponding components in the beamformer output signal. ILD = 10 log 10|ITF | (30) out,v out,v Finally, to measure the target distortion on each side after pro- ΔILD = ILD − ILD (31) v out,v in,v cessing, two measurements are used: a target Speech Distortion Ratio (SDR) and a Speech Distortion Magnitude-only distance IPD = ITF (32) in,v in,v (SDmag). For each side, we define a target distortion error sig- IPD = ITF (33) out,v out,v nal x as the time domain difference between the (aligned) dist target component in the beamformer output z and the target ΔIPD = IPD − IPD . (34) v out,v in,v component at the reference microphone signal x . The SDR ref In this work, the ILD error ΔILD is only computed for is then computed with the auto-PSDs as in (41): the frequency components above 1500 Hz, and the IPD error xref,xref ΔIPD is only computed for the frequency components below SDR =10log , (41) 1500 Hz. xdist,xdist In order to preserve the spatial impression of the diffuse-like and the SDmag is computed with the same auto-PSDs but as noise, the MSC of the binaural diffuse-like noise components in (42): also has to be preserved. The MSC between the reference mi- crophones can be computed as in (35): SDmag = |10 log Γ − 10 log Γ |. (42) xref,xref zx,zx Since the computation of the performance metrics requires (nref,r),(nref,l) MSC =  . (35) n,in knowing the separate components in the beamformer out- Γ Γ (nref,l),(nref,l) (nref,r),(nref,r) put signals (target, interferers, diffuse noise), the so-called where Γ , Γ and Γ shadow-filtering method was used in the simulations, i.e., fil- (nref,r),(nref,l) (nref,l),(nref,l) (nref,r),(nref,r) are cross- and auto-PSDs from the diffuse noise component at tering/processing all the signal components individually with the front microphones. the same time-variant filter coefficients or post-filtering. In ad- Similarly, the MSC between the left and right binaural outputs dition, since all the talker speech sources were always active in can be computed as in (36): our simulations (except for normal short pauses between words), for each component all the computed frames were used to es- (zn,r),(zn,l)  timate the PSD statistics, and therefore no VAD was required MSC =  (36) n,out under this setup. Γ Γ (zn,l),(zn,l) (zn,r),(zn,r) where Γ , Γ and Γ are cross- (zn,r),(zn,l) (zn,l),(zn,l) (zn,r),(zn,r) VI. EXPERIMENTAL SETUP and auto-PSDs from the diffuse noise component in the beam- Head Related Transfer Functions (HRTFs) measured from former outputs. The MSC error is then computed as in (37): a KEMAR mannequin wearing two binaural Behind-The-Ear ΔMSC = MSC − MSC . (37) n n,out n,in (BTE) hearing aids are used for the simulations. The HRTFs were provided by a hearing aid manufacturer. There were two Next, to measure the reduction of the interferers and diffuse sets of HRTFs: HRTFs from an anechoic environment, and noise components with the beamforming process, a signal to HRTFs from a mildly reverberant environment (T60 150 ms). noise ratio gain (SNR-gain, array gain), a signal to interferers For our simulations, the directional signals (target, interferers) ratio gain (SIR-gain), and a signal to diffuse noise ratio gain for the reverberant conditions are generated using the reverber- (SDNR-gain) are computed on each side, providing the differ- ant HRTFs. Beamformer designs are always performed using the ence in dB between the SNR, SIR, and SDNR at the beamformer anechoic HRTFs, and these HRTFs are also used to generate the output and at the input reference microphone: directional signals for the subset of simulations with anechoic zx,zx conditions. The distance used for the reverberant and the ane- SNRgain(dB)=10 log Γ choic HRTFs measurements, which is between a loudspeaker (zv+zn),(zv+zn) source and the center of the head, was 1 m. The diffuse-like xref,xref − 10 log background noise recordings were also provided by a hearing aid (vref+nref),(vref+nref) manufacturer, again recorded on a KEMAR mannequin wearing (38) two binaural BTE hearing aids, with babble noise recordings played at eight loudspeakers on a circle with a radius of 1 m Γ Γ zx,zx xref,xref SIRgain(dB)=10 log − 10 log around the KEMAR mannequin. The audio signals are sampled Γ Γ zv,zv vref,vref at 24 kHz. A Short Time Fourier Transform (STFT) is used to (39) decompose the signals in the time-frequency domain, with a FFT Γ Γ zx,zx xref,xref size of 256 (10.67 ms), using a Hann window with 50% overlap SDNRgain(dB)=10 log − 10 log Γ Γ between consecutive windows. The generated noisy mixtures of zn,zn nref,nref (40) signals have a total length of 10 sec. 1558 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 TABLE I ACOUSTIC SCENARIOS Fig. 5. The constraints directions for (a) proposed Robust TLCMV, (b) BMVDR and BMVDR-n (c) BLCMV. VII. SYSTEM EVALUATION AND SIMULATION RESULTS In this section, the performance of our proposed beamformer “Robust TLCMV” is first compared with the BMVDR, which has more degrees of freedom available for noise reduction (more adaptive “nulls”), in order to assess the effect of reducing the number of degrees of freedom for the Robust TLCMV. In Fig. 6. Performance of BMVDR and Robust TLCMV in terms of SNR-gain addition, the performance of the Robust TLCMV is evaluated and SDR, under acoustic scenarios from Table I (with and without target DOA mismatch). using both noisy correlation matrix R and diffuse noise correlation matrix R . Binaural cues preservations are not considered in these first comparisons. The proposed CCMBB A. Robust TLCMV and BMVDR (Without Post-Processor) post-processor for binaural cues preservation is then combined with the BMVDR and compared with the MVDR-n, to compare In order to evaluate the performance of the proposed Robust noise reduction, target distortion, and binaural cues preservation TLCMV with the BMVDR (the first option in [24], which does between these two approaches for cues preservation. not preserve the binaural cues of interfering sources), four dif- The proposed Robust TLCMV with CCMBB is then evaluated ferent acoustic scenarios are used, each with a target at 0 or 10 and compared with the BLCMV [13], [14]. For these algorithms, degrees, as Table I illustrates. Due to space limitations, perfor- two types of propagation model mismatch are evaluated. The mance in this subsection is only shown in terms SNR-gain and first type of mismatch is generated from the difference between SDR. The noise reduction and target distortion measurements in the estimated and the true direction of arrivals for the directional this section and the other sections are only shown for the “better sources, i.e., target and directional interferers. We will refer to ear” (the side where the input SNR is higher). The resulting per- this type of mismatch as DOA mismatch. The second type of formance metrics in Fig. 6 illustrate the effect of the target DOA mismatch is between the reverberant HRTFs used to generate the mismatch in the performance of BMVDR and the proposed Ro- reverberant signals at the microphones and the anechoic HRTFs bust TLCMV. The Robust TLCMV outperforms the BMVDR used in all the beamformer designs. We will refer to this second in terms of SDR under the four acoustic scenarios (more signif- type of mismatch as HRTF mismatch. icantly for cases with DOA mismatch, i.e., target at 10 degrees), For a frontal or near-frontal target case, the estimated target and it outperforms the BMVDR in terms of SNR-gain under the DOA is at 0 degree (for our proposed Robust TLCMV with and acoustic scenarios in the presence of DOA mismatch (target at without CCMBB, and for the BMVDR, the BMVDR-n and the 10 degrees). For acoustic scenarios with a target at 0 degree and BLCMV) and the estimated interferers DOAs are at 225 degrees no DOA mismatch, the proposed Robust TLCMV also slightly and 90 degrees (with such estimates required for the BLCMV outperforms the BMVDR in terms of SDR. While this may seem only). As our proposed Robust TLCMV beamformer design as- surprising, it is because of HRTF mismatch (mismatch between sumes that the true target DOA is within +10 degrees of the es- anechoic HRTFs used to design the beamformer and reverberant timated target DOA, two unity constraints are positioned in the HRTFs used to generate directional sources). Although it was middle of the estimated target zone at +5 degrees as Fig. 5(a) designed for robustness to DOA mismatch, the Robust TLCMV illustrates, unlike the BMVDR and BMVDR-n which only use with a wider beampattern around the estimated target direction one constraint at the estimated target direction as Fig. 5(b) shows. is found to also provide better robustness to HRTF mismatch On the other hand, the BLCMV uses three constraints: at 0 de- (here and in other results). In terms of noise reduction, for these gree with gain ζ =1, and at 225 and 90 degrees with a gain ideal cases with no DOA mismatch the BMVDR outperforms η set to 0.2 (as recommended in [32] and shown in Fig. 5(c)). the Robust TLCMV, although typically only by a fraction of a A non-frontal target case with a target speaker at 90 degrees dB. Overall, the results show that the performance of the pro- is also considered, with two unity constraints positioned in the posed Robust TLCMV is competitive (and significantly better middle of the estimated target zone, i.e., at +5 degrees deviation in cases of DOA mismatch) compared to the BMVDR, despite from the assumed target direction in the Robust TLCMV, while a reduced number of degrees of freedom available for noise the BLCMV again uses a unity gain constraint in the estimated reduction. target direction, and two constraints of gain η at the estimated The performance of the proposed Robust TLCMV is then interferer directions. evaluated using a noisy signals correlation matrix R as in (14) y AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1559 TABLE II PERFORMANCE OF BMVDR, BMVDR-N, AND BMVDR-CCMBB Fig. 7. Performance of Robust TLCMV using noisy correlation matrix and diffuse noise correlation matrix in terms of SNR-gain. Fig. 9. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, with no DOA mismatch and no HRTF mismatch. BMVDR-CCMBB is compared with the BMVDR (no binau- ral cues preservation for the interferers and noise components) and with a BMVDR-n which uses 0.7 of the beamformer output mixed with 0.3 of the noisy input signal. An acoustic scenario Fig. 8. Performance of Robust TLCMV using noisy correlation matrix and diffuse noise correlation matrix in terms of SIR-gain and SDNR-gain. is used with a target at 0 degree (no DOA mismatch), an in- terferer at 165 degrees, and diffuse-like noise (5 dB below the directional sources level). The resulting performance metrics in Table II show that the proposed CCMBB cues preservation post- and (15), and using a background diffuse-like noise correla- processing method combined with the BMVDR outperforms the tion matrix R instead of R in (14) and (15). The R and n y n BMVDR-n in terms of SNR-gain by around 2 dB, with a bet- R correlation matrices were estimated using a moving aver- ter SDmag distortion (2.3 dB) and similar scores for the other age lowpass first order recursive filter with a forgetting factor indicators. At the same time, the BMVDR-CCMBB has only of 0.985. An acoustic scenario is used with a target at 0 degree, a slightly lower SNR-gain than the BMVDR, while providing interferers at 225, 90 and 180 degrees, and diffuse-like noise much better scores for the other metrics. This overall indicates (14 dB lower that the directional sources level). The resulting the good performance of the CCMBB post-processor. performance in terms of SNR-gain in Fig. 7 shows that using R for coefficients computation in the Robust TLCMV outperforms using R . More detailed results are shown in Fig. 8 in terms of n C. Robust TLCMV With CCMBB and DOA Mismatch SIR-gain and SDNR-gain. The results illustrate the better per- In this section, the effect of the DOA mismatch for the tar- formance of the proposed Robust TLCMV in terms of SIR-gain get speaker as well as for the directional interferers is studied. when R is used for the coefficients computation. This result We first evaluate the performance of the algorithms in an ane- can be justified since using R enables the proposed Robust choic environment, using speech sources generated by anechoic TLCMV to adaptively position the nulls in the direction of the HRTFs, in order to remove the other source of mismatch gener- active interferers sources at each T-F bin. On the other hand, ated from the reverberation, i.e., HRTF mismatch. using R for coefficients computation in the Robust TLCMV To begin, a case with a target at 0 degree and interferers at performs better than using R for the SDNR-gain (diffuse noise 90 and 225 degrees is considered. For the BLCMV, this is an reduction), which is normal since R is specifically tuned for ideal case with no DOA mismatch, while for the proposed Ro- that. In the rest of this paper, the noisy signals correlation matrix bust TLCMV with CCMBB, the constraints set at +5 degrees R is used in all simulations. do not match the true target DOA (less ideal case). The tar- get and the interferers all have the same level, and the diffuse B. CCMBB and a Method With Direct Mixing noise level is set to 5 dB below each directional source level. In order to evaluate the performance of the proposed CCMBB In terms of SNR-gain, the resulting performance metric in the post-processor for cues preservation separately from the pro- first plot of Fig. 9 illustrates the better SNR-gain performance posed Robust TLCMV, the CCMBB is used as a post-processor of the BLCMV under this scenario ideal for it. In this scenario, to the BMVDR (BMVDR-CCMBB). The performance of the the proposed Robust TLCMV with CCMBB does not have an 1560 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 Fig. 12. Performance of Robust TLCMV with CCMBB post-processor under Fig.10. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, mildly reverberant acoustic scenario (HRTF mismatch), with and without DOA under acoustic scenario with 10 degrees DOA mismatch and no HRTF mismatch. mismatch for the target. Fig. 11. Performance of BLCMV in terms of IPD-error and ILD-error under Fig. 13. Performance in terms of SNR-gain, SDR, SDmag and MSC-error, anechoic acoustic scenario, without and with 10 degrees DOA mismatch. under acoustic scenario with 10 degrees DOA mismatch for the target (and for interferers in the BLCMV), and HRTF mismatch. exact unit constraint at 0 degree (true target DOA), unlike the BLCMV. Nevertheless, both the BLCMV and our proposed Ro- noise, the abilities of the BLCMV to preserve the binaural cues bust TLCMV with CCMBB algorithm generate an output with of the directional interferers significantly decrease (i.e., increase significant SNR-gain and very low target distortion, as shown by in the IPD-error and ILD-error metrics). the SNR-gain, SDR and SDmag plots of Fig. 9. Fig. 9 also clearly illustrates the effect of adding the CCMBB to preserve the spa- D. Robust TLCMV With CCMBB With DOA Mismatch and tial impression (binaural cues) of the diffuse-like background HRTF Mismatch noise, as measured with the MSC-error metric. The BLCMV does not preserve the diffuse noise binaural cues, causing the In this section, a more realistic evaluation is performed using large MSC-error scores. speech signals generated in a mildly reverberant environment Assuming an exact knowledge of the true DOA of the target (T60 = approx. 150 ms). Three acoustic scenarios are gener- as well as true DOAs of the directional interferers is impracti- ated with a target at 0, 5, or 10 degrees, interferers at 225 and cal. Therefore, a case with 10 degrees of DOA mismatch is then 90 degrees, 230 and 95 degrees, or 235 and 100 degrees, as tested, using an acoustic scenario with a target at 10 degrees, in- well as with diffuse noise. The target and the interferers again terferers at 235 and 100 degrees, and diffuse-like noise, all with all have the same level, and the diffuse noise level is set to the same levels as earlier. The resulting performance in terms of 5 dB below each directional source level. The directional sig- SNR-gain, SDR and SDmag in Fig. 10 illustrates that the Ro- nals were generated using reverberant HRTFs. The beamformer bust TLCMV with CCMBB provides significantly better results algorithms assume the same target DOA as before: 0 degree (for in this case with DOA mismatch, especially for high frequen- both algorithms), 90 and 225 degrees (required for BLCMV cies, i.e., above 1000 Hz. The post-processing CCMBB method only). Therefore, these cases include HRTF mismatch, with and again provides significant improvements in terms of diffuse- without DOA mismatch. The resulting performance metrics in noise MSC-error. These results indicate the robustness of the Fig. 12 show that the proposed Robust TLCMV with CCMBB proposed algorithm in the presence of target DOA mismatch. remains robust to DOA mismatch in the reverberant environment Moreover, since our proposed algorithm does not assume a prior (up to 10 degrees) since it does not rely on constraints in the ex- knowledge of the directional interferers DOAs, its binaural cues act directions of the directional sources (unlike the BLCMV). preservation performance is not affected with interferers DOA Moreover, Fig. 13 illustrates the overall improved performance mismatch, unlike the BLCMV. Fig. 11 shows that with 10 de- of the Robust TLCMV with CCMBB over the BLCMV in terms grees DOA mismatch in an anechoic environment with a target of SNR-gain, SDR, SDmag and MSC with 10 degrees DOA mis- at 10 degrees, interferers at 235 and 100 degrees, and diffuse-like match in the mildly reverberant environment, i.e., with HRTF AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1561 Fig. 14. Performance of BLCMV in terms of IPD-error and ILD-error with Fig. 16. For non-frontal target, performance in terms of SNR-gain, SDR, SD- and without HRTF mismatch, and without DOA mismatch. mag and MSC-error under a reverberant acoustic scenario (HRTF mismatch), without DOA mismatch. increase in the interferer DOA mismatch combined with HRTF mismatch. Further study of the HRTF mismatch effect is done under an acoustic scenario with a lateral target at 90 degrees, where the effect of the target HRTF mismatch can be more significant than for a frontal target case. Interferers at 225 and 315 degrees as well as diffuse noise are used. The target and the interferers all have the same level, and the diffuse noise level is set to 5 dB below each directional source level. The directional signals are generated using reverberant HRTFs, creating HRTF mismatch. In this case the beamformer algorithms know the value of the ex- Fig. 15. Performance in terms of IPD-error and ILD-error under a reverber- act target DOA at 90 degree (for both algorithms, no target DOA ant acoustic scenario (HRTF mismatch), for different levels of interferer DOA mismatch), and the exact value of the interferers DOAs at 225 mismatch. and 315 degrees (required for BLCMV only). Fig. 16 illustrates the improved performance of the proposed Robust TLCMV with CCMBB over the BLCMV in terms of noise reduction, target mismatch. It is also noticeable that the BLCMV does not have speech distortion, and preservation of the binaural spatial im- the ability to preserve the spatial impression of diffuse noise pression of the background diffuse noise for this scenario with in terms of MSC, unlike the proposed Robust TLCMV with HRTF mismatch. CCMBB. To evaluate the effect of the HRTF mismatch separately, VIII. CONCLUSION i.e., without the effect of DOA mismatch, on the ability of the BLCMV to preserve the binaural cues in terms of IPD and ILD, This work introduced a binaural beamforming algorithm ro- an acoustic scenario is generated with a target at 0 degree, in- bust to target DOA mismatch and HRTF mismatch (Robust terferers at 90 and 225 degrees, and diffuse noise (same levels TLCMV), as well as its combination with a post-processor to as before). Fig. 14 shows that in reverberant environments, i.e., achieve a good trade-off between noise reduction and binaural with HRTF mismatch, the ability of the BLCMV to preserve cues preservation of all acoustic components (Robust TLCMV the binaural cues for the directional interferers significantly de- with CCMBB). The proposed robust beamformer does not re- creases. In order to evaluate the combined effect of the HRTF quire prior knowledge of the propagation model (e.g., HRTFs mismatch and DOA mismatch in the preservation of the binaural or HRTF ratios) for the directional interferers, or second order cues in terms of ILD and IPD, five acoustic scenarios are then statistics estimation of the noise-only or interferers-only compo- used: acoustic scenarios without DOA mismatch, and with 5, nents. The Robust TLCMV was shown to produce better results 10, 15 and 20 degrees of interferers DOA mismatch. The result- than a BMVDR under the case of 10 degrees target DOA mis- ing performance metrics in terms of IPD-error and ILD-error in match, and comparable performance for the ideal case of no Fig. 15 show the performance improvement of our proposed Ro- target DOA mismatch. The CCMBB post-processor was shown bust TLCMV with CCMBB algorithm over the BLCMV for all to produce better results than a direct mixing of the beamformer the tested cases. The average IPD for the frequency components output with the noisy input signal. Finally, the Robust TLCMV lower than 1500 Hz and the average ILD for the frequency com- combined with the CCMBB was shown to produce better per- ponents higher than 1500 Hz are shown in Fig. 15. For the case formance than the BLCMV under 10 degrees sources DOA mis- without interferer DOA mismatch, our proposed algorithm still match and under HRTF mismatch with mild reverberation. Fu- outperforms the BLCMV in terms of IPD and ILD, because of ture work to further develop the proposed algorithm and validate the use of CCMBB post-processing. Fig. 15 also shows that our its performance should include testing under environments with proposed Robust TLCMV with CCMBB is not affected by the higher levels of reverberation, as well as with dynamic sources. 1562 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 10, OCTOBER 2019 REFERENCES [23] E. Hadad et al., “Comparison of two binaural beamforming approaches for hearing aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., [1] W. M. Whitmer, K. F. Wright-Whyte, J. A. Holman, and M. A. Akeroyd, 2017, pp. 236–240. “Hearing aid validation,” in Hearing Aids, 1st ed., vol. 56. Basingstoke, [24] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, “Theoretical analysis U.K.: Springer Nature, 2016, pp. 291–321. of binaural transfer function MVDR beamformers with interference cue [2] G. R. Popelka and B. C. J. Moore, “Future directions for hearing aid de- preservation constraints,” IEEEACM Trans. Audio, Speech, Lang. Pro- velopment,” in Hearing Aids, 1st ed., vol. 56. Basingstoke, U.K.: Springer cess., vol. 23, no. 12, pp. 2449–2464, Dec. 2015. Nature, 2016, pp. 331–341. [25] D. Marquardt, V. Hohmann, and S. Doclo, “Coherence preservation in [3] B. Edwards, “Hearing aids and hearing impairment,” in Speech Processing multi-channel Wiener filtering based noise reduction for binaural hearing in the Auditory System. New York, NY, USA: Springer, 2004, pp. 339–421. aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, [4] M. Li, H. G. McAllister, N. D. Black, and T. A. D. Perez, “Perceptual pp. 8648–8652. time-frequency subtraction algorithm for noise reduction in hearing aids,” [26] D. Marquardt, V. Hohmann, and S. Doclo, “Perceptually motivated coher- IEEE Trans. Biomed. Eng., vol. 48, no. 9, pp. 979–988, Sep. 2001. ence preservation in multi-channel wiener filtering based noise reduction [5] T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude for binaural hearing aids,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal estimation using a super-Gaussian speech model,” EURASIP J. Appl. Sig- Process., 2014, pp. 3660–3664. nal Process., vol. 2005, pp. 1110–1126, 2005. [27] T. J. Klasen, S. Doclo, T. Van den Bogaert, M. Moonen, and J. Wouters, [6] Y. Wang, A. Narayanan, and D. Wang, “On training targets for supervised “Binaural multi-channel Wiener filtering for hearing aids: Preserving inter- speech separation,” IEEEACM Trans. Audio, Speech, Lang. Process.,vol. aural time and level differences,” in Proc. IEEE Int. Conf. Acoust., Speech, 22, no. 12, pp. 1849–1858, Dec. 2014. Signal Process., 2006, vol. 5, pp. 145–148. [7] T. Ricketts and S. Dhar, “Comparison of performance across three direc- [28] T. Klasen, T. den Bogaert, M. Moonen, and J. Wouters, “Binaural noise tional hearing aids,” J. Amer. Acad. Audiol., vol. 10, no. 4, pp. 180–189, reduction algorithms for hearing aids that preserve interaural time delay cues,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1579–1585, Apr. [8] V. Best, J. Mejia, K. Freeston, R. J. van Hoesel, and H. Dillon, “An eval- uation of the performance of two binaural beamformers in complex and [29] B. Cornelis, S. Doclo, T. Van dan Bogaert, M. Moonen, and J. Wouters, dynamic multitalker environments,” Int. J. Audiol., vol. 54, no. 10, pp. 727– “Theoretical analysis of binaural multimicrophone noise reduction tech- 735, Oct. 2015. niques,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 342– [9] T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Robustness analysis of 355, Feb. 2010. binaural hearing aid beamformer algorithms by means of objective per- [30] S. Doclo, S. Gannot, M. Moonen, and A. Spriet, “Acoustic beamforming ceptual quality measures,” in Proc. IEEE Workshop Appl. Signal Process. for hearing aid applications,” in Handbook on Array Processing and Sensor Audio Acoust., 2007, pp. 315–318. Networks. Hoboken, NJ, USA: Wiley, 2010, pp. 269–302. [10] D. Marquardt and S. Doclo, “Performance comparison of bilateral and [31] E. Hadad, S. Gannot, and S. Doclo, “Binaural linearly constrained min- binaural MVDR-based noise reduction algorithms in the presence of imum variance beamformer for hearing aid applications,” in Proc. Int. DOA estimation errors,” in Proc. Speech Commun.; 12. ITG Symp., 2016, Workshop Acoust. Signal Enhancement, 2012, pp. 1–4. pp. 1–5. [32] E. Hadad, S. Doclo, and S. Gannot, “The binaural LCMV beamformer [11] O. Hoshuyama, A. Sugiyama, and A. Hirano, “A robust adaptive beam- and its performance analysis,” IEEEACM Trans. Audio, Speech, Lang. former for microphone arrays with a blocking matrix using constrained Process., vol. 24, no. 3, pp. 543–558, Mar. 2016. adaptive filters,” IEEE Trans. Signal Process., vol. 47, no. 10, pp. 2677– [33] D. Marquardt, E. Hadad, S. Gannot, and S. Doclo, “Optimal binau- 2684, Oct. 1999. ral LCMV beamformers for combined noise reduction and binaural cue [12] W. Herbordt and W. Kellermann, “Computationally efficient frequency- preservation,” in Proc. 14th Int. Workshop Acoust. Signal Enhancement, domain robust generalized sidelobe canceller,” in Proc. Int. Workshop 2014, pp. 288–292. Acoust. Echo Noise Control, 2001, pp. 51–55. [34] A. I. Koutrouvelis, R. C. Hendriks, J. Jensen, and R. Heusdens, “Im- [13] B.-J. Yoon, I. Tashev, and A. Acero, “Robust adaptive beamforming algo- proved multi-microphone noise reduction preserving binaural cues,” in rithm using instantaneous direction of arrival with enhanced noise suppres- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 460– sion capability,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007, vol. 1, pp. I-133–I-136. [35] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, and J. Jensen, “Relaxed [14] L. Lepauloux, P. Scalart, and C. Marro, “Computationally efficient and binaural LCMV beamforming,” IEEEACM Trans. Audio, Speech, Lang. robust frequency-domain GSC,” in Proc. 12th IEEE Int. Workshop Acoust. Process., vol. 25, no. 1, pp. 137–152, Jan. 2017. Echo Noise Control, 2010, pp. 1–4. [36] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, J. Jensen, and M. Guo, [15] S. Doclo and M. Moonen, “Superdirective beamforming robust against “Binaural beamforming using pre-determined relative acoustic transfer microphone mismatch,” IEEE Trans. Audio, Speech, Lang. Process.,vol. functions,” in Proc. 25th Eur. Signal Process. Conf., 2017, pp. 1–5. 15, no. 2, pp. 617–631, Feb. 2007. [37] J. Thiemann, M. Muller, and S. Van De Par, “A binaural hearing aid speech [16] J. Ahrens, I. Tashev, and M. Thomas, “Beamformer design using measured enhancement method maintaining spatial awareness for the user,” in Proc. microphone directivity patterns: Robustness to modelling error,” in Proc. 22nd Eur. Signal Process. Conf., 2014, pp. 321–325. Asia Pacific Signal Inf. Process. Assoc. Annu. Summit Conf., 2012, pp. 1–4. [38] J. Thiemann, M. Müller, D. Marquardt, S. Doclo, and S. van de Par, [17] E. Mabande, A. Schad, and W. Kellermann, “Design of robust superdirec- “Speech enhancement for multimicrophone binaural hearing aids aiming tive beamformers as a convex optimization problem,” in Proc. IEEE Int. to preserve the spatial auditory scene,” EURASIP J. Adv. Signal Process., Conf. Acoust., Speech, Signal Process., 2009, pp. 77–80. vol. 2016, no. 1, Dec. 2016, Art. no. 12. [18] H. Barfuss, C. Huemmer, G. Lamani, A. Schwarz, and W. Kellermann, [39] H. As’ad, M. Bouchard, and H. Kamkar-Parsi, “Perceptually motivated “HRTF-based robust least-squares frequency-invariant beamforming,” in binaural beamforming with cues preservation for hearing aids,” in Proc. Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2015, pp. 1–5. IEEE Can. Conf. Elect. Comput. Eng., 2016, pp. 1–5. [19] A. Spriet, M. Moonen, and J. Wouters, “Robustness analysis of multichan- [40] A. I. Koutrouvelis, J. Jensen, M. Guo, R. C. Hendriks, and R. Heusdens, nel Wiener filtering and generalized sidelobe cancellation for multimicro- “Binaural speech enhancement with spatial cue preservation utilising si- phone noise reduction in hearing aid applications,” IEEE Trans. Speech multaneous masking,” in Proc. 25th Eur. Signal Process. Conf., 2017, Audio Process., vol. 13, no. 4, pp. 487–503, Jul. 2005. pp. 598–602. [20] Y. Zhao and W. Liu, “Robust fixed frequency invariant beamformer design [41] H. As’ad, M. Bouchard, and H. Kamkar-Parsi, “Binaural beamforming subject to norm-bounded errors,” IEEE Signal Process. Lett., vol. 20, no. with spatial cues preservation for hearing aids in real-life complex acoustic 2, pp. 169–172, Feb. 2013. environments,” in Proc. Asia-Pacific Signal Inf. Process. Assoc. Annu. [21] R. C. Nongpiur, “Design of minimax broadband beamformers that are ro- Summit Conf., 2017, pp. 1390–1399. bust to microphone gain, phase and position errors,” IEEEACM Trans. [42] S. Gannot, D. Burshtein, and E. Weinstein, “Signal enhancement us- Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1013–1022, Jun. ing beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, Aug. 2001. [22] W.-C. Liao, Z.-Q. Luo, I. Merks, and T. Zhang, “An effective low com- [43] T. Lotter and P. Vary, “Dual-channel speech enhancement by superdirec- plexity binaural beamforming algorithm for hearing aids,” in Proc. IEEE tive beamforming,” EURASIP J. Adv. Signal Process., vol. 2006, no. 1, Workshop Appl. Signal Process. Audio Acoust., 2015, pp. 1–5. Dec. 2006, Art. no. 063297. AS’AD et al.: ROBUST TLCMV BEAMFORMER WITH SPATIAL CUES PRESERVATION FOR BINAURAL HEARING AIDS 1563 [44] N. Göβling, D. Marquardt, and S. Doclo, “Performance analysis of the Martin Bouchard received the B.Ing., M.Sc.A., and extended binaural MVDR beamformer with partial noise estimation in a Ph.D. degrees in electrical engineering from the Uni- homogeneous noise field,” in Proc. Hands-Free Speech Commun. Micro- versité de Sherbrooke, Sherbrooke, QC, Canada, in phone Arrays, 2017, pp. 1–5. 1993, 1995, and 1997, respectively. In January 1998, [45] D. Marquardt and S. Doclo, “Interaural coherence preservation for binaural he joined the School of Electrical Engineering and noise reduction using partial noise estimation and spectral postfiltering,” Computer Science, Faculty of Engineering, Univer- IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 26, no. 7, pp. 1261– sity of Ottawa, Ottawa, ON, Canada, where he is cur- 1274, Jul. 2018. rently a Professor. In 1996, he co-founded SoftdB [46] A. Tikhonov et al., Numerical Methods for the Solution of Ill-Posed Prob- Inc., Quebec City, QC, which is still active today. lems, vol. 328. New York, NY, USA: Springer, 2013. Over the years, he has conducted research activities [47] H. Cox, “Resolving power and sensitivity to mismatch of optimum array and consulting activities with more than 20 private processors,” J. Acoust. Soc. Amer., vol. 54, no. 3, pp. 771–785, Sep. 1973. sector and governmental partners, supervised more than 50 graduate students and [48] S. M. Golan, S. Gannot, and I. Cohen, “A reduced bandwidth binaural postdoctoral fellows, and authored or coauthored more than 40 journal papers MVDR beamformer,” in Proc. Int. Workshop Acoust. Echo Noise Control, and 85 conference papers. His current research interests include signal process- Tel-Aviv, Israel, 2010, pp. 1–4. ing methods in general and machine learning, with an emphasis on speech, audio, [49] S. Rickard, “Sparse sources are separated sources,” in Proc. 14th Eur. acoustics, hearing aids, and biomedical engineering applications. He served as a Signal Proc. Conf., 2006, pp. 1–5. member of the Speech and Language Technical Committee of the IEEE Signal [50] D. R. Begault and L. J. Trejo, “Overview of spatial hearing Part I: Azimuth Processing Society from 2009 to 2011, as an Associate Editor for the EURASIP and elevation perception,” in 3-D Sound for Virtual Reality and Multime- Journal on Audio, Speech and Music Processing from 2006 to 2011, and as dia, vol. 955. Boston, MA, USA: AP Professional, 2000, pp. 31–65. an Associate Editor for the IEEE TRANSACTIONS ON NEURAL NETWORKS from 2008 to 2009. He is a member of the Ordre des Ingénieurs du Québec. Homayoun Kamkar-Parsi received the B.A.Sc., M.A.Sc., and Ph.D. degrees in electrical engineering from the School of Information Technology and Engi- neering, University of Ottawa, Ottawa, ON, Canada, Hala As’ad received the M.A.Sc. degree in electrical in 2001, 2004, and 2009, respectively. During his engineering with a specialization in audio and speech undergraduate studies, he has obtained the highest processing in 2015 from the University of Ottawa, standing in his graduating class in Electrical Engi- Ottawa, ON, Canada, where she is currently work- neering and the Silver medal for the second Highest ing toward the Ph.D. degree in electrical and com- standing in entire Faculty of Engineering. His gradu- puter engineering. Her doctoral research focuses on ate scholarships included the Natural Sciences and robust binaural beamforming, binaural cues preser- Engineering Research Council scholarship and the vation, and source direction of arrival detection in Ontario Graduate Scholarship. Since 2009, he has been with Siemens Audi- hearing aids. Her research interests include applied ologische Technik GmbH (renamed as Sivantos GmbH in 2015 and as WS signal processing and machine learning with an em- Audiology in 2019), Erlangen, Germany, where his main work and research in- phasis on audio and speech processing, array signal clude speech/audio signal processing with applications in speech enhancement, processing, beamforming, speech enhancement, acoustic source localization, advanced multi-microphone beamforming for binaural hearing aids including and hearing aids. She is the recipient of the Natural Sciences and Engineering remote external microphones (e.g., from smartphone), source localization and Research Council Scholarship, the University of Ottawa Excellence Scholar- tracking, advanced scene analysis and machine learning (neural networks). In ship, and the Ontario Graduate Scholarship. 2018, he was selected as one of the top inventors at Sivantos.

Journal

Electrical Engineering and Systems SciencearXiv (Cornell University)

Published: Nov 3, 2018

There are no references for this article.