Access the full text.
Sign up today, get DeepDyve free for 14 days.
(2000)
Parametric and nonparametric statistical procedures
A. Koutrouvelis, R. Hendriks, R. Heusdens, J. Jensen, Meng Guo (2018)
Evaluation of Binaural Noise Reduction Methods in Terms of Intelligibility and Perceived Localization2018 26th European Signal Processing Conference (EUSIPCO)
Chunhua Shen, A. Hengel (2014)
Semidefinite Programming
(2015)
Development and evaluation of psychoacoustically motivated binaural noise reduction and cue preservation techniques
(2008)
Cvx: Matlab software for disciplined convex programming
T. Klasen, T. Bogaert, M. Moonen, J. Wouters (2007)
Binaural Noise Reduction Algorithms for Hearing Aids That Preserve Interaural Time Delay CuesIEEE Transactions on Signal Processing, 55
H. Kayser, S. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, B. Kollmeier (2009)
Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse ResponsesEURASIP Journal on Advances in Signal Processing, 2009
S. Gannot, E. Vincent, S. Golan, A. Ozerov (2017)
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source SeparationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 25
A. Koutrouvelis, R. Hendriks, J. Jensen, R. Heusdens (2016)
Improved multi-microphone noise reduction preserving binaural cues2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
C. Taal, R. Hendriks, R. Heusdens, J. Jensen (2011)
An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy SpeechIEEE Transactions on Audio, Speech, and Language Processing, 19
Daniel Welker, J. Greenberg, J. Desloge, P. Zurek (1997)
Microphone-array hearing aids with binaural output. II. A two-microphone adaptive systemIEEE Trans. Speech Audio Process., 5
A. Koutrouvelis, R. Hendriks, R. Heusdens, J. Jensen, Meng Guo (2017)
Binaural beamforming using pre-determined relative acoustic transfer functions2017 25th European Signal Processing Conference (EUSIPCO)
A. Koutrouvelis, R. Hendriks, R. Heusdens, J. Jensen (2016)
Relaxed Binaural LCMV BeamformingIEEE/ACM Transactions on Audio, Speech, and Language Processing, 25
E. Hadad, Daniel Marquardt, S. Doclo, S. Gannot (2015)
Theoretical Analysis of Binaural Transfer Function MVDR Beamformers with Interference Cue Preservation ConstraintsIEEE/ACM Transactions on Audio, Speech, and Language Processing, 23
(1977)
Short-term spectral analysis, and modification by discrete Fourier transform
J. Kates (2008)
Digital hearing aids.Harvard health letter, 26 7
A. d’Aspremont, L. Ghaoui (2003)
Static arbitrage bounds on basket option pricesMathematical Programming, 106
J. Desloge, W. Rabinowitz, P. Zurek (1997)
Microphone-array hearing aids with binaural output .I. Fixed-processing systemsIEEE Trans. Speech Audio Process., 5
K. Matthews (1998)
Elementary Linear Algebra
E. Hadad, S. Doclo, S. Gannot (2016)
The Binaural LCMV Beamformer and its Performance AnalysisIEEE/ACM Transactions on Audio, Speech, and Language Processing, 24
S. Doclo, Walter Kellermann, S. Makino, S. Nordholm (2015)
Multichannel Signal Enhancement Algorithms for Assisted Listening Devices: Exploiting spatial diversity using multiple microphonesIEEE Signal Processing Magazine, 32
W. Hartmann (1999)
How we localize soundPhysics Today, 52
B. Cornelis, S. Doclo, T. Bogaert, M. Moonen, J. Wouters (2010)
Theoretical Analysis of Binaural Multimicrophone Noise Reduction TechniquesIEEE Transactions on Audio, Speech, and Language Processing, 18
Daniel Marquardt, S. Doclo (2018)
Interaural Coherence Preservation for Binaural Noise Reduction Using Partial Noise Estimation and Spectral PostfilteringIEEE/ACM Transactions on Audio, Speech, and Language Processing, 26
A Convex Approximation of the Relaxed Binaural Beamforming Optimization Problem Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens and Jesper Jensen Abstract—The recently proposed relaxed binaural beamform- Unfortunately, the BMVDR severely distorts the binaural-cues ing (RBB) optimization problem provides a ﬂexible trade-off of the residual noise at the output of the ﬁlter. Speciﬁcally, between noise suppression and binaural-cue preservation of the the residual noise inherits the intaraural transfer function of sound sources in the acoustic scene. It minimizes the output the target and, hence, sounds as originating from the target’s noise power, under the constraints which guarantee that the direction [1]. The lack of spatial separation between the target target remains unchanged after processing and the binaural-cue distortions of the acoustic sources will be less than a user-deﬁned and the noise after processing, may not only provide an unnat- threshold. However, the RBB problem is a computationally ural impression to the user, but may also negatively effect the demanding non-convex optimization problem. The only existing intelligiblity [4]. In [5], [6], the BMVDR was compared with suboptimal method which approximately solves the RBB is a an oracle-based (i.e., non-practically implementable) method successive convex optimization (SCO) method which, typically, in several noise ﬁelds. The oracle-based method has the same requires to solve multiple convex optimization problems per fre- quency bin, in order to converge. Convergence is achieved when noise suppression as the BMVDR, but does not introduce any all constraints of the RBB optimization problem are satisﬁed. In binaural-cue distortions at the output. The spatially correct this paper, we propose a semi-deﬁnite convex relaxation (SDCR) oracle-based method achieved an improvement of about 3 dB of the RBB optimization problem. The proposed suboptimal in SRT-50 over the BMVDR. Therefore, there are several SDCR method solves a single convex optimization problem reasons to seek for methods that simultaneously provide the per frequency bin, resulting in a much lower computational complexity than the SCO method. Unlike the SCO method, the maximum possible noise suppression and binaural-cue preser- SDCR method does not guarantee user-controlled upper-bounded vation of all sources in the acoustic scene. binaural-cue distortions. To tackle this problem we also propose Several modiﬁcations of the BMVDR BF have been pro- a suboptimal hybrid method which combines the SDCR and SCO posed, which can be roughly categorized into two groups. The methods. Instrumental measures combined with a listening test ﬁrst group consists of BFs that add or maintain a portion of the show that the SDCR and hybrid methods achieve signiﬁcantly lower computational complexity than the SCO method, and in unprocessed scene at the output of the ﬁlter (see e.g., [5], [7]– most cases better trade-off between predicted intelligibility and [9]). The second group consists of BFs, whose optimization binaural-cue preservation than the SCO method. problems have the same objective function as the BMVDR, Index Terms—Binaural beamforming, binaural cues, convex but introduce extra equality [3], [10], [11] or inequality [12] optimization, LCMV, noise reduction, semi-deﬁnite relaxation. constraints in order to preserve the binaural cues of the interferers after processing. Such additional constraints in the optimization problem results in less degrees of freedom I. I NTRODUCTION for noise reduction. With equality constraints, closed-form INAURAL beamforming (see e.g., [1] for an overview), solutions may be derived, but the degrees of freedom can also known as binaural spatial ﬁltering, plays an impor- be easily exhausted when multiple interferers exist in the tant role in binaural hearing-aid (HA) systems [2]. Binaural acoustic scene, resulting in poor noise reduction. On the other beamforming is typically described as an optimization prob- hand, inequality constraints provide more ﬂexibility and can lem, where the objective is to i) minimize the output noise approximately preserve the binaural cues of, typically, many power, ii) preserve the target sound source at the left and more acoustic sources, or for the same number of acoustic right HA reference microphone, and iii) preserve the binaural sources provide larger amount of noise reduction [12]. Unfor- cues of all sound sources after processing. The microphone tunately, closed-form solutions do not exist for the inequality- array, which is typically mounted on the HA devices, has constrained binaural BFs and, thus, iterative methods with a only a few microphones and, thus, there is only limited larger complexity are used instead. freedom (i.e., a small feasibility set) to search for a good Recently, the relaxed binaural beamforming (RBB) op- compromise between the three aforementioned goals. Besides timization problem was proposed, which uses inequality the challenge in ﬁnding a good trade-off among all these goals, constraints to preserve the binaural cues of the interfering the complexity should remain as low as possible, due to the sources [12]. The inequality constraints in the RBB are limited computational power of the HA devices. not convex, resulting in a non-convex optimization problem. The binaural minimum variance distortionless response In [12], a suboptimal successive convex optimization (SCO) (BMVDR) BF [1] provides the maximum possible noise method was proposed to approximately solve the RBB prob- suppression among all binaural target-distortionless BFs [3]. This work was supported by the Oticon Foundation and NWO, the Dutch Speech reception threshold (SRT)-50 is the SNR in which a 50% correct Organisation for Scientiﬁc Research. recognition of words is achieved. arXiv:1805.01692v1 [cs.SD] 4 May 2018 2 lem. In most cases, the SCO method needs to solve more than where xˆ , xˆ are played back by the loudspeakers of the L R one convex optimization problem, per frequency bin, in order left and right HAs, respectively. Note that the subscripts L to converge. Convergence is achieved when all constraints of and R are also used to refer to the two elements of the the RBB problem are satisﬁed. As a result, the SCO method vectors in Eq. (1) associated with the left and right reference guarantees an upper-bounded binaural-cue distortion of the microphones of the binaural BF. Here, we select the ﬁrst and interferers (as expressed by the interaural transfer function the M -th microphones as reference microphones and, thus, error), where the upper bound is controlled by the user. y = y and y = y . The same applies to all the other L 1 R M Unfortunately, the SCO method is computationally very de- vectors in Eq. (1). manding due to its need to solve multiple convex optimization All BFs considered in this paper are target-distortionless. Their goal is not only noise supression, but also preservation problems, per frequency bin, in order to converge. In this paper, we propose a semi-deﬁnite convex relaxation (SDCR) of the binaural cues of all sources in the acoustic scene. In of the RBB optimization problem, which is signiﬁcantly faster this paper, we mainly focus on preserving, after processing, the perceived location of the point sources. A simple way than the SCO method. This is because, the SDCR method requires to solve only one convex optimization problem per of measuring the binaural cues of a point source is via the frequency bin. The main drawback of the SDCR method interaural transfer function (ITF), which is a function of the is that it does not guarantee user-controlled upper-bounded ATF vector of the source [13]. The ITF of the i-th interferer binaural-cue distortions as the SCO method. We solve this before and after applying the spatial ﬁlter is given by [13] issue by combining the SDCR and SCO methods into a sub- H b w b iL i in out L ITF = , ITF = . (4) optimal hybrid method. The hybrid method guarantees user- i i b w b iR controlled upper-bounded binaural-cue distortions, and still The input and output ITF of the target is expressed similarly. has a signiﬁcantly lower computational complexity than the Ideally, to preserve all spatial cues of the point sources, a SCO method. Simulation experiments combined with listening binaural BF will produce the same ITF output as the input for tests show that both proposed methods, in most cases, provide all point sources. In practice, this is very difﬁcult to achieve, a better trade-off between noise reduction and binaural-cue when the number of interferers, r, is large and the number preservation than the SCO method. of microphones, M , is small [12]. As a result, most BFs will introduce some distortion to the ITF output, resulting in a II. S IGNAL M ODEL AND NOTATION non-zero ITF error given by [12] We assume that there is one target point-source signal, w b b i iL e out in L r point-source interferers, additive diffuse noise, and two ITF = ITF − ITF = − ≥ 0. (5) i i i w b b iR HAs with M microphones in total. The processing is ac- i complished per time-frequency bin independently. Neglecting A. BMVDR Beamforming time-frequency indices for brevity, the acquired M -element noisy vector in the DFT domain, for a single time-frequency The BMVDR BF [1] achieves the maximum possible noise suppression among all binaural BFs and is obtained from the bin, is given by following simple optimization problem [1], [3]: M×1 y = sa + v b + u ∈ C , (1) i i |{z} H H ˆ ˆ w , w = arg min w w P L R L R i=1 | {z } w ,w L R n H ∗ H ∗ s.t. w a = a w a = a , (6) L L R R where s and v are the target and i-th interferer signals at where the original locations; a and b the acoustic transfer function P 0 P = . (7) (ATF) vectors of the target and i-th interferer, respectively; u 0 P the diffuse background noise, and n the total additive noise. The optimization problem in Eq. (6) provides closed-form Assuming statistical independence between all sources, the solutions to the left and right spatial ﬁlters given by [1], [3] noisy cross-power spectral density matrix is given by −1 ∗ −1 ∗ P aa P aa H M×M n L n R P = E[yy ] = P + P ∈ C , (2) w ˆ = , w ˆ = . (8) y x n L R −1 −1 H H a P a a P a n n H H H with P = E[xx ] = p aa and P = E[nn ] the target x s n It can easily be shown, that the output ITF of the i-th interferer and noise cross-power spectral density matrices, respectively, of the BMVDR spatial ﬁlter is given by [3], [12] and p = E[|s| ] the power spectral density of the target signal. out ITF = , (9) III. BINAURAL BEAMFORM ING PRELIMINARIES which is the ITF input of the target. Therefore, all interferers Binaural BFs consist of two spatial ﬁlters, w , w ∈ L R sound as coming from the target direction after applying the M×1 C , which are both applied to the noisy measurements BMVDR spatial ﬁlter. The BMVDR ITF error of the i-th producing two different outputs given by interferer is given by [12] a b xˆ w y L e,BMVDR L iL = , (3) ITF = − . (10) H i xˆ w y a b R R iR 3 B. Relaxed Binaural Beamforming RBB problem if m ≤ 2M − 3. This means that if the (R)ATF vectors of the interferers have been estimated accurately The relaxed binaural beamforming (RBB) optimization enough, the SCO method will guarantee user-controlled upper- problem, introduced in [12], uses additional inequality con- bounded ITF error of the interferers [12]. For m > 2M − 3, straints (compared to the BMVDR problem) to preserve the no guarantees exist for convergence. In case the method does binaural cues of the interferers. The RBB problem is given not converge, it stops after solving a pre-deﬁned maximum by [12] number of convex optimization problems, k . Nevertheless, max H H ˜ for a reasonable number of inequality constraints, m, it has w ˆ , w ˆ = arg min w w P L R L R w ,w L R been experimentally shown that the SCO method always H ∗ H ∗ converges [12], [15]. A disadvantage of the SCO method is s.t. w a = a w a = a , L L R R that it has been experimentally shown in [12], that for larger c w b b i iL − ≤ E , i = 1,··· , m ≤ r, i values, the SCO method converges to solutions further away w b i iR from the boundary of the inequality constraints of the RBB (11) problem. This results in a better binaural-cue preservation and where less noise reduction compared to the expected trade-off set by e,BMVDR E = c ITF , 0 ≤ c ≤ 1. i i i the user through the parameters c , i = 1,··· , m. i i Note thatE is c times the ITF error of the i-th interferer of the i i BMVDR BF [12]. Recall that the BMVDR causes full collapse IV. PROPOSED CONVEX APPROXIM ATION M ETHOD of the binuaral cues of the interferers towards the binaural cues The proposed method is a semi-deﬁnite convex relaxation of the target. Therefore, the inequality constraints in Eq. (11) (SDCR) of the optimization problem in Eq. (11). First, we control the percentage of collapse. A small c implies good review two important properties that will be useful for under- preservation of binaural cues of the i-th interferer, but a smaller standing the proposed optimization problem. feasibility set and, thus, less noise reduction. On the other hand, a large c implies worse binaural-cue preservation, but Property 1: Any quadratic expression can be expressed as [16] more noise reduction. H H H It is clear from the above that the additional inequality q Zq = tr q Zq = tr qq Z . (12) constraints of the RBB problem require the knowledge of the (relative) ATF vectors of the interferers. In practice, Property 2: We have the following equivalence relation [17] interferers’ (R)ATF vectors are unknown and estimation is A B required. Several methods for estimating RATF vectors exist Z = 0 ⇔ B C (see e.g., [14] for an overview). An alternative approach is to use pre-determined ancechoic (R)ATF vectors of ﬁxed az- A 0, I− AA B = 0, S 0, (13) imuths around the head of the user, as proposed in [15]. These † H C 0, I− CC B = 0, S 0, (14) pre-determined (R)ATF vectors are acoustic scene independent and need to be obtained once for each user. This is useful H † with S =C−B A B the generalized Schur complement of when the (R)ATF vectors of the interferers are difﬁcult to † H A in Z, S =A−BC B the generalized Schur complement estimate, because e.g., the locations of the interferers relative † of C in Z, and A is the pseudo-inverse of A [18]. to the head of the user are non-static. It is worth noting that Before, we present the proposed convex optimization prob- by using pre-determined (R)ATF vectors, a larger number of lem, we ﬁrst introduce an equivalent optimization problem to inequality constraints, m > r, is typically used in Eq. (11). the problem in Eq. (11). That is, This is because we do not know where the interferers are located and we would like to cover the entire space around H H L w ˆ , w ˆ = arg min w w P L R L R the head of the user. w ,w L R If c > 0, i = 1,··· , m, the inequality constraints of i H H H H s.t. w a = a w a = a , L L R R the optimization problem in Eq. (11) are non-convex. As a w b b result, the optimization problem in Eq. (11) is non-convex. i iL L 2 − ≤ E , i = 1,··· , m ≤ r. In [12], a suboptimal successive convex optimization (SCO) b w b i iR method [12], described in Section III-C, was proposed to (15) approximately solve the RBB problem. By reformulating the inequality in Eq. (15), we obtain an equivalent quadratic constraint given by C. Successive Convex Optimization method The successive convex optimization (SCO) method [12] w b b i iL L 2 − ≤ E ⇒ approximately solves the RBB problem by solving multiple w b b i iR convex optimization problems per frequency bin. The SCO A B w H H w w ≤ 0, (16) method converges, when all constraints of the RBB problem in L R H B C w | {z } R Eq. (11) are satisﬁed. It has been shown that the SCO method | {z }| {z } M w always converges to a solution satisfying the constraints of the 4 2 H ∗ H where A = |b | b b , B = −b b b b , C = is also the minimizer of the non-convex RBB problem. This iR i iR i i i iL 2 2 2 H H ˆ ˆ |b | −|b | E b b . Therefore, the optimization prob- means, that in the case of W = ww , the proposed problem iL iR i i i lem in Eq. (15) can be re-written as in Eq. (20) is optimal. Moreover, in this case, the inequalities of the problem in Eqs. (17), (15) (11) are satisﬁed. Otherwise, w ˆ = arg min w Pw if W w ˆ w ˆ , the solution of the problem in Eq. (20) ∗ may or may not satisfy the inequalities of the RBB, which a 0 a s.t. w = , means that we lose the guarantee for user-controlled upper- 0 a a bounded ITF error when the (R)ATF vectors of the interferers w M w ≤ 0, i = 1,··· , m. (17) have been estimated accurately enough. In our experience, in ˆ ˆ practice W = ww almost never happens. Nevertheless, The matrix M is not positive semi-deﬁnite and, therefore, the we will experimentally show in Section V that the SDCR quadratic inequality constraint is not convex and, hence, the method always stays relatively close to the boundary of the optimization problem in Eq. (17) is not convex. The proof inequality constraints of the RBB problem. Finally, the main of non positive semi-deﬁniteness of M uses Property 2. 2 2 H advantage of the new proposed SDCR method is that it Speciﬁcally, note that A 0, but S = −|b | E b b 0, 1 iR i i i H 2 2 reduces signiﬁcantly the computational complexity, since a because b b 0 and −|b | E ≤ 0 and, therefore, M is i iR i i i single convex optimization problem is solved compared to the not positive semi-deﬁnite. multiple convex optimization problems that must be solved in The optimization problem in Eq. (17) is a non-convex the SCO method. quadratic-constrained quadratic program (QCQP) [17], [19]. Following the methodology described in [19], we use Property 1 to re-write the optimization problem in Eq. (17) into the A. Proposed Hybrid Method following equivalent formulation: In this section, we propose a hybrid method, which com- ˆ ˜ w ˆ , W = arg min tr WP bines the SDCR and the SCO methods into a single method. If w,W the (R)ATF vectors of the interferers are estimated accurately a 0 a H L enough, the hybrid method guarantees user-controlled upper- s.t. w = , 0 a a bounded binaural-cue distortions of the interferers as in the tr (WM ) ≤ 0, i = 1,··· , m, ﬁrst version of the SCO method. Moreover, the proposed hybrid method is signiﬁcantly faster than the SCO method and W = ww . (18) slightly slower than the SDCR method. We will experimentally The optimization problem in Eq. (18) is still not convex, but show in Section V, that the hybrid proposed method achieves it has two differences with the problem in Eq. (17). The trace solutions closer to the boundary of the inequality constraints inequality is convex, but the new equality constraint, W = of the RBB problem compared to the SCO method, while at ww is not convex. Following [19], we apply the SDCR to the same time achieving more noise suppression. the non-convex equality constraint of the problem in Eq. (18) For a particular frequency bin, the hybrid method ﬁrst solves and obtain the convex optimization problem given by the SDCR problem and then checks if the inequality con- straints of Eq. (11) are satisﬁed. If all of them are satisﬁed, the ˆ ˜ w ˆ , W = arg min tr WP SDCR method will be used to approximately solve the RBB w,W problem. Otherwise the SCO method is used to approximately a 0 a s.t. w = , solve the RBB problem in this particular frequency bin. In such 0 a a a way, there is a guarantee that we will always have an optimal tr (WM ) ≤ 0, i = 1,··· , m. solution which satisﬁes the constraints of the RBB problem, W ww . (19) while at the same time reducing the overall computational H complexity signiﬁcantly. In order to avoid switching to the Using Property 2, the inequality constraint W ww can be SCO method for just negligibly larger ITF errors than the user- re-written as a linear matrix inequality, and the optimization controlled upper bounds E , we use the following switching problem in Eq. (19) can be re-written into a standard-form criterion: semi-deﬁnite program [19]. That is, w b b i iL ˆ ˜ w ˆ , W = arg min tr WP − ≤ E , i = 1,··· , m, (21) w,W w b b i iR a 0 a H L s.t. w = , where E is a slightly increased upper bound and is given by ∗ i 0 a a tr (WM ) ≤ 0, i = 1,··· , m. a b i L iL E = (c + ) − , i = 1,··· , m, (22) i i a b W w R iR 0. (20) w 1 where is very small, e.g., 0 < < 0.1. This modiﬁcation This is a convex optimization problem, which can be solved avoids possible switching to the SCO method for negligibly efﬁciently [19]. If the solutions are on the boundary, i.e., larger ITF errors than the E . The hybrid method is summa- W = w ˆ w ˆ , the minimizer, w ˆ , of the problem in Eq. (20) rized in Algorithm 1. 5 on the left-hand side with azimuth −70 degrees. Note that the Algorithm 1: Hybrid scheme RATF vectors of all interferers have an azimuth mismatch with w ˆ ← SDCR Problem in Eq. (20) the pre-determined RATF vectors’ azimuths. The microphone if w ˆ satisﬁes Eq. (21) then self-noise is set to have a 40 dB SNR at the left reference return w ˆ microphone, and it has the same power in all microphones. else w ˆ ← SCO method [12] B. Hearing-Aid Setup and Processing return w ˆ end if The total number of microphones is M = 4; two at each HA. The sampling frequency is 16 kHz. The microphone signals were constructed using the head impulse responses from the reverberant ofﬁce environment from the database V. EXPERIMENTS in [21]. We used the overlap-and-add processing method [22] We conducted two different sets of experiments: the for analyzing and synthesizing our signals. The analysis and ﬁrst examines the performance difference between the SCO synthesis windows are square-root Hanning windows and the method [12] (with k = 50), the proposed SDCR method, max overlap is 50%. The frame length is 10 ms, i.e., 160 samples, and the proposed hybrid (with = 0.05) method, when the and the FFT size is 256. true RATF vectors of the interferers are used. The reason for that is to show the theoretical trade-off between noise re- C. Evaluation Methodology duction and binaural-cue preservation. The second experiment We measure the noise-reduction performance in terms of the examines the performance of the same methods, when the pre- determined RATF vectors are used for preserving the binaural segmental signal-to-noise-ratio (SSNR) only in target-presence time regions. We used an ideal activity detector to ﬁnd these cues of the interferers. Note that in both sets of experiments, time-regions. We also predict intelligibility with the STOI we used the true RATF vector of the target source. We used the CVX toolbox [20] to solve the convex optimization problems measure [23]. We measure binaural-cue distortions with instrumental mea- associated with the SCO, SDCR and hybrid methods. The sures and a listening test. The instrumental measures are the CVX toolbox uses an interior point method to solve the convex average ITF error, interaural level difference (ILD) error and optimization problems [17]. In all methods that approximately interaural phase difference (IPD) error per interferer. These solve the RBB problem, we used a common c value for all interferers in the inequality constraints, i.e., c = c,∀i. We averages are calculated only over frequency, since we have ﬁxed BFs over time. Note that, for the IPD error, we averaged also included the BMVDR BF as a reference method in the only the frequency bins in the range of 0 − 1.5 kHz, while comparisons. The noise cross-power spectral density matrix was estimated using 5 seconds of a noise-only segment, where for the ILD error, we averaged only the frequency bins in the range of 3 − 8 kHz. The reason for this choice is that all interferers are active, but the target source is inactive. The the ILDs are perceptually more important for localization spatial ﬁlters of all methods were estimated only once using the same estimated noise cross-power spectral density matrix above 3 kHz, while the IPDs are perceptually more important for localization below 1.5 kHz [24]. Note that we used the and, thus, they are time invariant. expressions from [13] for computing the ILD and IPD errors Note that for the pre-determined RATF vectors, we used for a single frequency bin. We do not measure the binaural-cue the RATF vectors of 24 pre-determined anechoic head impulse distortions of the target, because all methods achieve perfect responses from the database in [21]. The pre-determined RATF preservation of the binaural-cues of the target, since i) there vectors are associated with azimuths uniformly spaced around are no estimation errors on the RATF vector of the target the head with a resolution of 360/24 = 15 degrees, starting signal used in the associated optimization problems and ii) from −90 degrees. Please note that the pre-determined RATF the response of the binaural spatial ﬁlter with respect to the vector at 0 degrees was omitted from the constraints, because target at the two reference microphones is distortionless. it was in the same direction as the RATF vector of the target. The listening test is performed using the methodology described in [6], and examines the performance of the com- pared methods only in the case of the pre-determined RATF A. Acoustic Scene Setup vectors. Ten normal-hearing subjects participated, excluding The acoustic scene that we used consists of one target the authors. They were asked to determine the azimuths of all female talker in the look direction (i.e., 0 degrees), and 4 point-sources in the acoustic scene when listening to signals interferers, where each has the same average power at its processed by the compared methods as well as the unprocessed original location, as the target signal at the original location. scene. The tested c values were 0.3 and 0.7 for the SCO, The ﬁrst interferer is a male talker on the right-hand side of SDCR and hybrid methods. In addition to listening to the noisy the HA user with azimuth of 80 degrees; the second interferer and processed signals, the subjects also listened to the clean is a music signal on the right-hand side of the HA user unprocessed point sources in isolation, in order to determine with azimuth of 50 degrees; the third interferer is a vacuum the reference azimuthms of the point sources. The localization cleaner on the left-hand side of the HA user with azimuth −35 errors were calculated with respect to the reference (and not degrees; and the fourth interferer is a ringing mobile phone the true) azimuths as in [6]. This is because we used only 6 SCO SDCR Hybrid BMVDR Unprocessed -10 -10 0.5 0.5 0.45 0.45 -15 -15 0.4 0.4 -20 -20 0.35 0.35 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 c c c c Fig. 1: Noise reduction and intelligiblity prediction performances when the true RATF vectors of the interferers are used in the SCO, SDCR and hybrid methods. one set of head impulse responses from [21] to construct the RBB problem. It is clear that both SDCR and hybrid meth- the binaural signals, which means that every subject will ods are closer to the boundary of the inequality constraints have a different reference azimuth. In this way, a signiﬁcant compared to the SCO method. Moreover, the hybrid method estimation bias was removed. Two repetitions of the listening is for all c values (on average) below the boundary, even if test were conducted. The reference azimuth of each source we used the extended switch criterion in Eq. (21). On the and every subject was computed as the average between the other hand, the ITF error of the SDCR method sometimes two repetitions, and the error was computed with respect to (see Interferers 1 and 2) is slightly above the boundary. As this averaged reference azimuth. The localization errors of the explained in Section IV, this is because the SDCR method does sources were averaged over subjects and repetitions. A t-test not guarantee a user-controlled upper-bounded ITF error as the was used in order determine whether the methods result in SCO or the hybrid methods do. Note also that as expected the statistically signiﬁcantly different perceived source locations. SCO method for e.g., c = 0.8, 0.9 values, is not close to the boundary, while the SDCR and hybrid methods are closer to We also measured the complexity of the compared methods in terms of the number of convex optimization problems that the boundary. Fig. 3 shows the computational complexity of the compared they needed to solve for all frequency bins in total. Note that methods in terms of number of convex optimization problems the BFs are ﬁxed over time and, therefore, we do not measure varying complexity over time. required to solve for convergence. The SDCR method requires to solve much less convex problems than the SCO method (especially at larger c values) and slightly less compared to D. Discussion of Results with True RATF Vectors the hybrid method. The hybrid method again requires to solve In this section, the compared methods use the true RATF much less convex problems than the SCO method, especially vectors of the sources in the constraints. Fig. 1 depicts the at larger c values. noise reduction performance and intelligibility prediction of We can conclude from the above that, in most cases, the the unprocessed scene, and SCO, SDCR, BMVDR methods theoretical performance (i.e., when the true RATF vectors are used) of both proposed methods is more optimal than the SCO at both reference microphones. As expected the BMVDR achieves the best noise reduction performance and predicted method. Speciﬁcally, both proposed methods provide solutions intelligibility. It is clear, that all other methods achieve similar that are closer to the expected solutions of the RBB problem, since both proposed methods are closer to the boundary. This performances for the left reference microphone, while for the right reference microphone the SCO method achieves the means that both methods provide a more user-controlled trade- off between noise reduction and binaural-cue preservation worst noise reduction performance among all. Moreover, as expected, as c increases, the noise reduction and STOI value than the SCO method, especially in large c values. Finally increases for all methods. Note that the SDCR method has both proposed methods are signiﬁcantly less computationally demanding than the SCO method. almost identical performance as the hybrid method. This is because, in this example the hybrid method switched to the E. Discussion of Results with Pre-Determined RATF Vectors SCO method only a few times. Fig. 2 shows the binaural-cue distortions of the com- In this section, the compared methods use the pre- pared methods per interfering source. As expected, the larger determined RATF vectors. Fig. 4 shows the noise reduction binaural-cue distortions are obtained with the BMVDR BF, performance and intelligibility prediction of the compared while all other methods achieve less binaural-cue distortions. methods. Here the gap in performance between the proposed As expected, as c increases, the binaural-cue distortions in- methods and the SCO method is bigger compared to the crease. Note that for the ITF errors, we also display the c times case where the true RATF vectors were used. The proposed the ITF error of the BMVDR (which is labeled as ITF upper methods (especially the SDCR method) signiﬁcantly improved bound) in order to visualize the closeness of the estimated both noise reduction and predicted intelligibility at both ref- spatial ﬁlters at the boundary of the inequality constraints of erence microphones. The reason why the performance gap SSNR (dB) SSNR (dB) STOI STOI R 7 SCO SDCR Hybrid BMVDR ITF upper bound Interferer 2 Interferer 3 Interferer 4 Interferer 1 1 1 0.5 0.5 0 0 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.5 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0 0 -5 -5 -10 -10 -15 -15 0 5 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 c c c c Fig. 2: Binaural-cue distortions (averaged over frequency) of interferers when the true RATF vectors of the interferers are used in the SCO, SDCR and hybrid methods. Nevertheless, we will see later on in the t-test of the listening SCO SDCR Hybrid test that the compared methods are not signiﬁcantly different for the same c values. In Fig. 6, we show the computational complexities of the compared methods. Again the SDCR method requires to solve less convex problems compared to the SCO method, but the 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 hybrid method does not have a huge computational advantage over the SCO method in this case. However, the usage of the Fig. 3: Computational complexity measured as the number of hybrid method using pre-determined (R)ATF vectors is not solved convex optimization problems (in all frequency bins) critical, since anyway no method can guarantee user-controlled when the true RATF vectors of the interferers are used in the upper-bounded ITF error of the interferers, unless the number SCO, SDCR and hybrid methods. of pre-determined RATF vectors is huge. This of course is not practical since it may result in non-feasible solutions and/or the noise reduction will be negligible due to the large number of constraints. between the SDCR method and the hybrid method is increased Fig. 7 shows the results of the subjective localization test. compared to the case where the true RATF vectors were A similar behavior as with the instrumental binaural-cue dis- used is because the hybrid method switched many more times tortion measures is observed here. The only difference appears to the SCO method (see Algorithm 1) here. In conclusion, for the ringing mobile phone, where for c = 0.7 all methods both proposed methods achieve in most cases a better noise achieve slightly worse performance than the BMVDR. Several reduction and predicted intelligibility than the SCO method, users also reported difﬁculty in localizing the ringing phone especially for larger c values. after completing the test. We believe that this is because of the high frequency content of the ringing tone of the mobile phone Fig. 5 shows the binaural-cue distortions of the com- and only the ILDs might have been used for localization. pared methods per interfering source. As expected, when pre- determined RATF vectors are used, all methods do not guaran- Table I shows the results of the t-test, which was done by tee user-controlled upper-bounded ITF error of the interferers. gathering all localization errors of all sources. The signiﬁcance Therefore, all methods, in many occasions (see interferers 3 level was set to 5%. It is clear that the SCO, SDCR and and 4), result in a larger ITF error than the average ITF upper hybrid methods are all not signiﬁcantly different for the bound of the RBB problem when computed using the true same c value. This means that even though we observed less RATFs of the interferers. The SCO method has the lowest binaural-cue distortions in the SCO method in Figs 2 and 5, binaural-cue distortions compared to the compared methods. compared to the proposed methods for the same c value, these ILD error (dB) IPD error ITF error # solved problems 8 SCO SDCR Hybrid BMVDR Unprocessed -10 -10 0.5 0.5 0.45 0.45 -15 -15 0.4 0.4 -20 -20 0.35 0.35 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 c c c c Fig. 4: Noise reduction and intelligiblity prediction performances when the pre-determined RATF vectors of the interferers are used in the SCO, SDCR and hybrid methods. SCO SDCR Hybrid BMVDR ITF upper bound Interferer 1 Interferer 2 Interferer 3 Interferer 4 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.5 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0 0 -10 -10 -20 -20 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 c c c c Fig. 5: Binaural-cue distortions (averaged over frequency) of interferers when the pre-determined RATF vectors of the interferers are used in the SCO, SDCR and hybrid methods. predicted intelligibility compared to the SCO method. Thus, SCO SDCR Hybrid the proposed methods provide a better perceptual trade-off compared to the SCO method. Finally, note that the SCO, SDCR and hybrid methods are not statistically signiﬁcantly different from the unprocessed scene for c = 0.3. This means that in all three methods the subjects managed (on average) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 to localize as good as in the unprocessed scene. However, unlike the unprocessed scene, all three methods improved Fig. 6: Computational complexity measured as the number of noise reduction and predicted intelligibility. solved convex optimization problems (in all frequency bins) when the pre-determined RATF vectors of the interferers are VI. CONCLUSION used in the SCO, SDCR and hybrid methods. We proposed two new suboptimal methods for approxi- mately solving the non-convex relaxed binaural beamforming (RBB) optimization problem. Both methods are signiﬁcantly differences are not perceptually important. However, recall that computationally less demanding compared to the existing the proposed methods achieve a better noise reduction and successive convex optimization (SCO) method. For each fre- ILD error (dB) IPD error ITF error SSNR (dB) # solved problems SSNR (dB) STOI STOI R 9 TABLE I: T-test: + denotes signiﬁcantly different (i.e., the female talker (target) null hypothesis is rejected at 5% signiﬁcance level), while ◦ mean median denotes not signiﬁcantly different. SCO SCO SDCR SDCR Hybrid Hybrid Method BMVDR c=0.3 c=0.7 c=0.3 c=0.7 c=0.3 c=0.7 male talker BMVDR ◦ + + + + + + SCO + ◦ + ◦ + ◦ + c=0.3 SCO + + ◦ + ◦ + ◦ ringing phone c=0.7 SDCR + ◦ + ◦ + ◦ + c=0.3 SDCR 0 + + ◦ + ◦ + ◦ c=0.7 music Hybrid + ◦ + ◦ + ◦ + c=0.3 50 Hybrid + + ◦ + ◦ + ◦ c=0.7 Unpro- + ◦ + ◦ + ◦ + cessed vacuum cleaner [3] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, “Theoretical analysis of binaural transfer function MVDR beamformers with interference cue preservation constraints,” IEEE Trans. Audio, Speech, Language all sources Process., vol. 23, no. 12, pp. 2449–2464, Dec. 2015. [4] A. W. Bronkhorst, “The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions,” Acta Acoustica, vol. 86, no. 1, pp. 117–128, 2000. [5] D. Marquardt, “Development and evaluation of psychoacoustically mo- BMVDR SCO SCO SDCR SDCR hybrid hybrid unpr. tivated binaural noise reduction and cue preservation techniques,” Ph.D. c=0.3 c=0.7 c=0.3 c=0.7 c=0.3 c=0.7 dissertation, Carl von Ossietzky Universitat ¨ Oldenburg, 2015. [6] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, S. van de Par, J. Jensen, Fig. 7: Localization test comparing the SCO, SDCR and hybrid and M. Guo, “Evaluation of binaural noise reduction methods in terms of intelligibility and perceived localization,” in submitted to EUSIPCO, methods with respect to the localization error in degrees. [7] J. G. Desloge, W. M. Rabinowitz, and P. M. Zurek, “Microphone-array hearing aids with binaural output .I. Fixed-processing systems,” IEEE Trans. Speech Audio Process., vol. 5, no. 6, pp. 529–542, Nov. 1997. [8] D. P. Welker, J. E. Greenberg, J. G. Desloge, and P. M. Zurek, quency bin, the SCO method requires to solve multiple con- “Microphone-array hearing aids with binaural output .II. A two- vex optimization problems in order to converge. In contrast, microphone adaptive system,” IEEE Trans. Speech Audio Process., vol. 5, no. 6, pp. 543–551, Nov. 1997. the ﬁrst proposed method, which is a semi-deﬁnite convex [9] T. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, “Binaural relaxation (SDCR) of the RBB problem, solves only one noise reduction algorithms for hearing aids that preserve interaural time convex optimization problem per frequency bin. Apart from delay cues,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1579–1585, Apr. 2007. the computational advantage, the SDCR method also achieves [10] A. I. Koutrouvelis, R. C. Hendriks, J. Jensen, and R. Heusdens, “Im- in most cases a better trade-off between intelligibility and proved multi-microphone noise reduction preserving binaural cues,” in binaural-cue preservation than the SCO method. However, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2016. [11] E. Hadad, S. Doclo, and S. Gannot, “The binaural LCMV beamformer the SDCR method does not guarantee user-controlled upper and its performance analysis,” IEEE Trans. Audio, Speech, Language bounded ITF error when the RATF vectors of the interferers Process., vol. 24, no. 3, pp. 543–558, Jan. 2016. are estimated accurately enough. This problem is solved by [12] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, and J. Jensen, “Relaxed binaural LCMV beamforming,” IEEE Trans. Audio, Speech, Language the second proposed method, which is a hybrid combination Process., vol. 25, no. 1, pp. 137–152, Jan. 2017. of the SDCR and SCO methods. This method guarantees user- [13] B. Cornelis, S. Doclo, T. Van den Bogaert, M. Moonen, and J. Wouters, controlled upper-bounded ITF error, and at the same time is “Theoretical analysis of binaural multimicrophone noise reduction tech- niques,” IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 2, computationally much less demanding than the SCO method. pp. 342–355, Feb. 2010. Finally, listening tests showed that all three methods achieve [14] S. Gannot, E. Vincet, S. Markovich-Golan, and A. Ozerov, A “ consoli- the same localization errors for the same amount of relaxation. dated perspective on multi-microphone speech enhancement and source separation,” IEEE Trans. Audio, Speech, Language Process., vol. 25, no. 4, pp. 692–730, April 2017. [15] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, J. Jensen, and M. Guo, “Binaural beamforming using pre-determined relative acoustic transfer REFERENCES functions,” in EURASIP Europ. Signal Process. Conf. (EUSIPCO), Aug. [1] S. Doclo, W. Kellermann, S. Makino, and S. Nordholm, “Multichannel 2017. signal enhancement algorithms for assisted listening devices,” IEEE [16] H. Anton, Elementary linear algebra. John Wiley & Sons, 2010. Signal Process. Mag., vol. 32, no. 2, pp. 18–30, Mar. 2015. [17] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge [2] J. M. Kates, Digital hearing aids. Plural publishing, 2008. university press, 2004. error (deg.) error (deg.) error (deg.) error (deg.) error (deg.) error (deg.) 10 [18] G. Golub and C. V. Loan, Matrix Computations, 3rd ed. Oxford: North Oxford Academic, 1983. [19] L. Vandenberghe and S. Boyd, “Semideﬁnite programming,” SIAM review, vol. 38, no. 1, pp. 49–95, Mar. 1996. [20] “Cvx: Matlab software for disciplined convex programming.” 2008. [21] H. Kayser, S. Ewert, J. Annemuller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Database of multichannel in-ear and behind-the-ear head- related and binaural room impulse responses,” EURASIP J. Advances Signal Process., vol. 2009, pp. 1–10, Dec. 2009. [22] J. B. Allen, “Short-term spectral analysis, and modiﬁcation by dis- crete Fourier transform,” IEEE Trans. Acoust., Speech, Signal Process., vol. 25, no. 3, pp. 235–238, June 1977. [23] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An “ algorithm for intelligibility prediction of time-frequency weighted noisy speech,” IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 7, pp. 2125– 2136, Sep. 2011. [24] W. M. Hartmann, “How we localize sound,” Physics Today, vol. 52, no. 11, pp. 24–29, Nov. 1999.
Electrical Engineering and Systems Science – arXiv (Cornell University)
Published: May 4, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.