1. INTRODUCTION
Signal denoising algorithms are often developed with the objective of minimizing a chosen distortion function that quantifies the distance between an estimate and the ground-truth. The most widely used distortion measure in the literature is the mean-squared error (MSE). The ground-truth parameter may be deterministic or stochastic with a known prior distribution. The latter formalism leads to Bayesian estimators. Within the deterministic signal estimation paradigm, which is also the formalism considered in this paper, one typically desires that the estimator has minimum variance and is unbiased (MVU) [Reference Poor1,Reference Kay and Eldar2]. An MVU estimator may not always exist, and if it does, it can be obtained using the theory of sufficient statistics. Eldar and Kay [Reference Kay and Eldar2] showed that, when it comes to minimizing the MSE, biased estimates can outperform the MVU estimate. For example, one could shrink the MVU estimate and optimally select the shrinkage parameter to minimize the MSE.
In this paper, we consider the problem of estimating a deterministic signal corrupted by additive white noise. The noise distribution is assumed to be known, but not restricted to be Gaussian. We propose a new distortion measure based on the probability of error and develop transform-domain shrinkage estimators. The proposed risk criterion leads to shrinkage estimators that are adapted to the noise type, meaning that the shrinkage operators corresponding to different noise distributions having the same variance turn out to be different. This behavior is in contrast with shrinkage estimators obtained using the MSE, wherein one obtains the same shrinkage profile corresponding to different noise distributions having the same variance. Before proceeding with the developments, we review important literature related to the problem at hand.
A) Prior art
The MSE is by far the most widely used measure for obtaining the optimum shrinkage parameter. Since the MSE is a function of the ground-truth, directly minimizing it results in a practically unrealizable estimate, in the sense that the estimate depends on the unknown parameter/or its statistics. However, in some cases, it is possible to find the optimum shrinkage parameter, for example, using a min-max approach [Reference Kay and Eldar2], where the parameter is constrained to a known set. An optimum shrinkage estimator, when the variance of the unbiased estimate is a scaled version of the square of the parameter, with a known scaling, is proposed in [Reference Kay and Eldar2].
Optimum shrinkage estimators have also been computed based on risk estimation, where an unbiased estimate of the MSE, which depends only on the noisy observations, is obtained and subsequently minimized to compute the optimal shrinkage. Under the assumption of Gaussian noise, an unbiased estimate of the MSE, namely Stein's unbiased risk estimator (SURE), was developed based on Stein's lemma [Reference Stein3], and has been successfully employed in numerous denoising applications. The shrinkage estimator of the mean of a multivariate Gaussian distribution with diagonal covariance matrix, obtained by minimizing SURE, dominates the classical least-squares estimate when the observation dimension exceeds three [Reference Eldar4].
A risk minimization approach for denoising using a linear expansion of elementary thresholding functions has been developed in [Reference Luisier, Blu and Unser5–Reference Zheng, Li, Blu and Lee9], wherein the combining weights are chosen optimally to minimize the SURE objective. SURE-optimized wavelet-domain thresholding techniques have been proposed in [Reference Donoho and Johnstone10–Reference Zhang and Desai12]. Atto et al. [Reference Atto, Pastor and Mercier13,Reference Atto, Pastor and Mercier14] have investigated the problem of signal denoising based on optimally selecting the parameters of a wavelet-domain smooth sigmoidal shrinkage function by minimizing the SURE criterion. Ramani et al. [Reference Ramani, Blu and Unser15] developed a Monte Carlo technique to select the parameters of a generic denoising operator based on SURE. An image denoising algorithm based on non-local means (NLM) is proposed in [Reference De Ville and Kocher16], where the NLM parameters are optimized using SURE. Notable denoising algorithms that aim to optimize the SURE objective include wavelet-domain multivariate shrinkage [Reference Hsung, Lun and Ho17], local affine transform for image denoising [Reference Qiu, Wang, Yu and Song18], SURE-optimized blockwise shrinkage for image denoising [Reference Wu, Tracey, Natarajan and P.Noonan19], SURE-optimized Savitzky-Golay filter [Reference Krishnan and Seelamantula20], etc. The SURE approach has also found applications in image deconvolution [Reference Pesquet, Benazza-Benyahia and Chaux21] and compressive sensing [Reference Zheng, Liu, Wanga and Maleki22].
The original formulation of SURE, which assumes independent Gaussian noise has been extended to certain distributions in continuous and discrete exponential families in [Reference Hudson23,Reference Hwang24], respectively, with the assumption of independence left unchanged. Eldar generalized SURE (GSURE) for distributions belonging to the non-i.i.d. multivariate exponential family [Reference Eldar4]. Giryes et al. [Reference Giryes, Elad and Eldar25] used a projected version of GSURE for selecting parameters in the context of solving inverse problems. Luisier et al. [Reference Luisier, Vonesch, Blu and Unser26] proposed an unbiased estimate of MSE under Poisson noise and minimized it to obtain the optimum wavelet-domain image denoising function. A detailed discussion of Gaussian parameter estimation using shrinkage estimators, together with a performance comparison of SURE with the maximum-likelihood (ML) and soft-thresholding-based estimators can be found in [Reference Johnstone27] (Chapter 2). It is shown in [Reference Johnstone27] that the soft-thresholding-based estimator dominates the James-Stein shrinkage estimator in terms of MSE if the parameter vector to be estimated is highly sparse, such as a spike. On the other hand, the shrinkage estimator dominates if the parameter vector has nearly equal entries.
B) This paper
We propose a signal estimation method based on the probability of error (PE) criterion, which we first introduced in [Reference Sadasivan, Mukherjee and Seelamantula28] and demonstrated application to electrocardiogram (ECG) signal denoising. Recently, in [Reference Sadasivan, Mukherjee and Seelamantula29] we proposed a speech denoising framework based on [Reference Sadasivan, Mukherjee and Seelamantula28]. In this paper, we expand further on the idea and carry out a detailed investigation on various aspects related to the PE criterion. In particular, we consider different noise distributions such as Laplacian, Student's-t, Gaussian, a Gaussian mixture model (GMM), expected ℓ1 distortion, analysis and comparison of the implicit shrinkage function, and performance benchmarking with standard denoising techniques. The PE quantifies the probability of the estimate falling outside an ε-neighborhood of the true parameter value. Since the PE risk depends on the ground truth, we consider a surrogate, which may be biased, and minimize it to obtain the shrinkage parameter (Section 2). The optimization is carried out in the discrete cosine transform (DCT) domain, either in a pointwise fashion or on a subband basis. The resulting estimator is referred to as the shrinkage estimator based on the minimum probability of error (MPE). We derive the PE risk for Gaussian, Laplacian, and Student's-t noise distributions (Sections 2-2.1 and 2.5). In practical applications, where the noise distribution may be multimodal and not known explicitly, we propose to use a Gaussian mixture model (GMM) approximation [Reference Plataniotis and Hatzinakos30,Reference Sorenson and Alspach31] (Section 2-2.4). We show the performance of the MPE-based denoising technique on the Piece-Regular signals taken from the Wavelab toolbox in Gaussian, Student's-t, and Laplacian noise contaminations (Section 3). Proceeding further, we also consider the probability of error integrated over 0 < ε < ∞ (Section 4), which results in the expected ℓ1distortion between the parameter and its estimate. The ℓ1 distance has been used in the Bayesian setting and in regression problems as a robust distortion metric [Reference Poor1,Reference Sovic and Sersic32]. The estimators for the expected ℓ1 distortion are also derived by invoking the GMM approximation (Section 4-4.1). The noise-adaptive shrinkage behavior resulting from the proposed PE and ℓ1 distortion-based risk functions are demonstrated by plotting the shrinkage profile as a function of the parameter value (Section 4-4.3). We also assess the denoising performance of the shrinkage estimator obtained by minimizing the ℓ1 distortion for different input SNRs and for different number of noisy realizations (Section 5).
To further boost the denoising performance of the ℓ1 distortion-based estimator, we develop an iterative algorithm to successively refine the cost function and the resulting estimate, starting with the noisy signal as the initialization (Section 5). The iterations lead to an improvement of 2–3 dB in the output signal-to-noise ratio (SNR) (Section 5-5.1).
For performance evaluation, we conduct experiments on the Piece-Regular and the HeaviSine signals from the Wavelab toolbox [33], and ECG signals from the PhysioBank database [34]. We consider three techniques for comparison: (i) wavelet-domain soft-thresholding [Reference Donoho35]; (ii) SURE-based orthonormal wavelet thresholding using a linear expansion of thresholds (SURE-LET) [Reference Luisier, Blu and Unser5]; and (iii) SURE-based smooth sigmoid shrinkage (SS) in wavelet domain [Reference Atto, Pastor and Mercier13], all assuming Gaussian noise contamination (Section 6) in order to facilitate a fair comparison.
II. THE PE RISK
Consider the observation model x = s + w, where x ∈ ℝn and s ∈ ℝn denote the noisy and clean signals, respectively. The noise vector w is assumed to have i.i.d. entries with zero mean and variance σ2. The goal is to estimate $\boldsymbol s$ from $\boldsymbol x$ by minimizing a suitable cost function. The signal model is considered in an appropriate transform domain, where the signal admits a parsimonious representation, but noise does not. We consider two types of shrinkage estimators: (i) pointwise, where a shrinkage factor a i ∈ [0, 1] is applied to x i in order to obtain the estimate $\widehat {s}_i = a_i x_i$, which means that $\widehat {s}_i$ does not depend on $\widehat {s}_j$ for i ≠ j; and (ii) subband-based estimator, wherein a single shrinkage factor a J is applied to a group of coefficients {x i, i ∈ J} in subband J ⊂ {1, 2, …, n}. Shrinkage estimators may also be interpreted as premultiplication of x by a diagonal matrix.
A) PE risk for pointwise shrinkage
Since the estimate of s i does not depend on x j, for j ≠ i, in the pointwise shrinkage scenario, we drop the index i for brevity of notation. The PE risk is defined as
where ε > 0 is a predefined tolerance parameter. The risk $\mathcal {R}$ quantifies the estimation error using the probability measure and implicitly takes into account the noise distribution beyond the first- and second-order statistics. On the contrary, the MSE relies only on the first- and second-order statistics of noise for shrinkage estimators. Substituting $\widehat {s}=ax=a(s+w)$, the risk $\mathcal {R}$ evaluates to
where F( · ) is the cumulative distribution function (c.d.f.) of the noise. Since $\mathcal {R}$ depends on s, which is the parameter to be estimated, it is impractical to optimize it directly over a. To circumvent the problem, we minimize an estimate of $\mathcal {R}$, which is obtained by replacing s with x (which is also the ML estimate of s). Such an estimate $\widehat { \mathcal {R}}=\mathcal {R}(x;a)$ takes the form
and correspondingly, the optimal shrinkage parameter is obtained as $a_{{\rm opt}} = {\rm arg}\,\underset {0 \leq a \leq 1}{\min }\, \widehat {\mathcal {R}}$. A grid search is performed to optimize $\widehat {\mathcal {R}}$ over a ∈ [0, 1], and the clean signal is obtained as $\widehat {s}=a_{{\rm opt}}x$. We next derive explicit formulae for the risk function for Gaussian, Laplacian, and Student's-t noise distributions.
(i) Gaussian distribution: In this case, the noisy observation x also follows a Gaussian distribution, and therefore, $\widehat {s}-s$ is distributed as $\mathcal {N}((a-1)s,a^2\sigma ^2 )$. The PE risk estimate is given as
where $Q(u)=({1}/{\sqrt {2 \pi }})\int _{u}^{\infty }e^{-{t^2}/{2}}\,\mathrm {d}t$.
(ii) Student's-t distribution: Consider the case where the noise follow a Student's-t distribution with parameter λ > 2 and the probability density function (p.d.f.) of noise is given by
The variance of w is σ2 = λ/(λ − 2). The expression for $\widehat {\mathcal {R}}$ is the one given in (2) with
where G 1 is the hypergeometric function defined as
and (q)k denotes the Pochhammer symbol:
(iii) Laplacian distribution: Considering the noise to be i.i.d. Laplacian with zero-mean and parameter b (variance σ2 = 2b 2), with the p.d.f. f(w) = (1/2b)exp ( − |w|/b), the PE risk can be obtained by using the following expression for F(w) in (2):
B) Closeness of $\widehat {\mathcal {R}}$ to $\mathcal {R}$
To measure the closeness of $\widehat {\mathcal {R}}$ to $\mathcal {R}$, consider the example of estimating a scalar s = 4 from a noisy observation x. The PE risk estimate $\widehat {\mathcal {R}}$ is obtained by setting s = x. In Figs 1(a),1(b), and 1(c), we show the variation of the actual risk $\mathcal {R}$ and its estimate $\widehat {\mathcal {R}}$ with a, averaged over 100 independent trials, for Gaussian, Student's-t, and Laplacian noise distributions, respectively. The noise has zero mean, and the variance is taken as σ2 = 1 for Gaussian and Laplacian models, whereas for Student's-t model, the variance is σ2 = 2. The value of ε is set equal to σ while computing the PE risk. We observe that $\widehat {\mathcal {R}}$ is a good approximation to $\mathcal {R}$, particularly in the vicinity of the minima. The deviation of the shrinkage parameter a opt(x), obtained by minimizing $\widehat {\mathcal {R}}$, with respect to its true value a opt(s) resulting from the minimization of $\mathcal {R}$, is shown in Fig. 1(d) for the three noise models under consideration. The central red lines in Fig. 1(d) indicate the medians, whereas the black lines on the top and bottom denote the 25 and the 75 percentile points, respectively. We observe that a opt(x) is concentrated around a opt(s), especially for Gaussian and Laplacian noise, barring a few outliers.
C) Perturbation probability of the location of minimum
The location of the minimum of the PE risk determines the shrinkage parameter. Therefore, one must ensure that it does not deviate too much from its actual value, with high probability, when s is replaced by x in the original risk $\mathcal {R}$. Let $a_{{\rm opt}}(s)=\arg \underset {0\leq a \leq 1}{\min } \mathcal {R}(s;a)$ denote the argument that minimizes the true risk $\mathcal {R}$. Consider the probability of deviation given by
for some δ > 0. Using a first-order Taylor-series approximation of a opt(x) about s, and substituting x = s + w, we obtain a opt(x) ≈ a opt(s) + w a′opt(s), where ′ denotes the derivative. We would like to point out that the Taylor-series approximation is not rigorously justified here but it is reasonable to guide our analysis. The deviation probability $P_e^{{\rm MPE}}$ in (5) simplifies to $P_e^{{\rm MPE}}=\mathbb {P}( \left \vert w \right \vert \geq {\delta }/{\left \vert a'_{{\rm opt}}(s) \right \vert } ).$ For additive Gaussian noise w with zero mean and variance σ2, placing the Chernoff bound [Reference Mitzenmacher and Upfal36,Reference Chang, Cosmon and Milstein37] on $P_e^{{\rm MPE}}$ leads to
To ensure that $P_e^{{\rm MPE}}$ is less than α, for a given α ∈ (0, 1), it suffices to have
which translates to a lower-bound on the input SNR. Since there is no closed-form expression available for $a'_{\rm opt}(s)$ in the context of the PE risk, we empirically obtain the range of input SNR values s 2/σ2, for which (6) is satisfied.
Analogously, to satisfy an upper bound on the deviation probability $P_e^{{\rm SURE}}$ of the minimum in the case of SURE, for a given deviation δ > 0, one must ensure that
The proof of (7) is given in Appendix A.
The minimum input SNR required to ensure P e ≤ α for both SURE- and MPE-based shrinkage estimators is shown in Fig. 2, for different values of α and δ. The PE risk estimate is obtained by replacing s with x and setting ε = σ. We observe that reducing the amount of deviation δ for a given probability α, or vice versa, leads to a higher input SNR requirement for both SURE and MPE. We also observe from Fig. 2 that, for given δ and α, SURE requires a higher input SNR than MPE to keep the δ-deviation probability under α. Also, for a given input SNR, the δ-deviation probability of the estimated shrinkage parameter a opt(x) from the optimum a opt(s) is smaller for MPE than SURE, thereby indicating that the MPE-based shrinkage is comparatively more reliable and robust than the SURE-based one at low input SNRs.
D) Unknown noise distributions
In practical applications, the distribution of noise may not be known in a parametric form and may also be multimodal. At best, one would have access to realizations of the noise, from which the distribution has to be estimated. In such cases, approximation of the noise p.d.f. using a GMM is a viable alternative [Reference Plataniotis and Hatzinakos30,Reference Sorenson and Alspach31], wherein one can estimate the parameters of the GMM using the expectation-maximization algorithm [Reference Redner and Walker38]. Gaussian mixture modeling is attractive as it comes with the guarantee that, asymptotically, as the number of Gaussians increases, the GMM approximation to a p.d.f. that has a finite number of discontinuities converges uniformly except at the points of the discontinuity [Reference Sorenson and Alspach31]. The GMM approximation can be used even for non-Gaussian, unimodal distributions. For the GMM-based noise p.d.f.
the PE risk turns out to be
using (3). For illustration, consider the estimation of a scalar s = 4 in the transform domain from its noisy observation x. The additive noise is Laplacian distributed with zero mean and variance σ2 = 1. The noise distribution is modeled using a GMM with M = 4 components and the corresponding PE risk estimate is obtained using (9) by setting s = x. In Fig. 3(a), we show a Laplacian p.d.f. and its GMM approximation. Fig. 3(b) shows the GMM approximation to a multimodal distribution. Figure 4(a) shows the PE risk based on the original Laplacian distribution as well as the GMM approximation, as a function of the shrinkage parameter a. The close match between the two indicates that the GMM is a viable alternative when the noise distribution is unknown or follows a complicated model. In Fig. 4(b), we plot the GMM-based PE risk and its estimate averaged over 100 independent trials. We observe that the locations of the minima of the actual risk and its estimate are in good agreement, thereby justifying the minimization of $\widehat {\mathcal {R}}$. The PE risk and its estimate are shown in Fig. 4(c) for the multimodal p.d.f. of Fig. 3(b).
E) PE risk for subband shrinkage
Let a J be the shrinkage factor applied to the set of coefficients {x i, i ∈ J} in subband J. The estimate $\,\widehat {\boldsymbol {s}}_J$ of the clean signal is obtained by $\,\widehat {\boldsymbol {s}}_J= a_J \boldsymbol {x}_J$, where xJ ∈ ℝ|J| and a J ∈ [0, 1]. For notational brevity, we drop the subscript J, as we did for pointwise shrinkage, and express the estimator as $\widehat {\boldsymbol {s}} = a \boldsymbol {x}$, where boldface letters indicate vectors. Analogous to pointwise shrinkage, the PE risk for subband shrinkage is defined as $\mathcal {R} = \mathbb {P}( \left \Vert \widehat {\boldsymbol {s}}-\boldsymbol {s} \right \Vert _2>\epsilon ),$ which, for $\,\widehat {\boldsymbol {s}} = a \boldsymbol {x}$, becomes $\mathcal {R} = \mathbb {P}( \left \Vert a \boldsymbol {w} + (a-1)\boldsymbol {s} \right \Vert _2>\epsilon )$. For $\boldsymbol {w} \sim \mathcal {N}(0,\sigma ^2 I)$,
where k = |J|, $\lambda = \sum _{j=1}^{k}({(1-a)^2 s_j^2}/{a^2\sigma ^2})$, θ = (ε/aσ)2, and F(θ | k, λ) is the c.d.f. of the non-central χ2 distribution, given by
where $\Gamma (a)\,{=}\,\int _{0}^{\infty } \exp (-t) t^{a-1}\,\mathrm {d} t$ and $\gamma (x,a)\,{=}\, \int _{0}^{x} \exp\break (-t) t^{a-1}\,\mathrm {d} t$.
Similar to pointwise shrinkage, we propose to obtain an estimate $\widehat {\mathcal {R}}$ of $\mathcal {R}$ for subband shrinkage estimators by replacing s j with x j. The optimum subband shrinkage factor is obtained by minimizing $\widehat {\mathcal {R}}$.
Fig. 5 shows the subband PE risk and its estimate versus a, where the underlying clean signal s ∈ ℝ|J| is corrupted by Gaussian noise and the subband size is chosen to be |J| = k = 8. The clean signal s is generated by drawing samples from $\mathcal {N}(2\times \bold 1_k, I_k)$, where $\bold 1_k$ and I k denote a k-length vector of all ones and a k × k identity matrix, respectively. The observation x is obtained by adding zero-mean i.i.d. Gaussian noise to s, with an input SNR of 5 dB, where the input SNR is defined as ${\rm SNR}_{{\rm in}}= 10 \log _{10}(({1}/{k\sigma ^2}) \sum _{n=1}^{k}s_n^2)\,{\rm dB}$. The PE risk estimate is obtained by replacing s with x in (10), which does not drastically alter the minimum (cf. Fig. 5).
III. EXPERIMENTAL RESULTS FOR MPE-BASED DENOISING
The performance of the MPE-based pointwise and subband shrinkage estimator is validated on a synthesized harmonic signal (of length N = 2048) in Gaussian noise and the Piece-Regular signal (of length N = 4096) in Gaussian, Student's-t, multimodal, and Laplacian noise. The Piece-Regular signal has both smooth and rapidly-varying regions, making it a suitable candidate for the assessment of denoising performance.
A) Performance of pointwise-shrinkage estimator
1) Harmonic signal denoising
Consider the signal
in additive white Gaussian noise, with zero mean and variance σ2. Since the denoising is carried out in the DCT [Reference Rao and Yip39] domain, the Gaussian noise statistics remain unaltered. For the purpose of illustration, we assume that σ2 is known. In practice, σ2 may not be known a priori and could be replaced by the robust median estimate [Reference Mallat40] or the trimmed estimate [Reference Pastor and Socheleau41]. The clean signal is estimated using inverse DCT after applying the optimum shrinkage. The denoising performance of the MPE and SURE-based approaches is compared in Table 1. In case of the Wiener filter, the power spectrum of the clean signal is estimated using the standard spectral subtraction technique [Reference Boll42,Reference Loizou43]. We observe that MPE-based shrinkage with ε = 3.5σ is superior to SURE and Wiener filter (WF) by 8–12 dB. The comparison also shows that the performance of the MPE depends critically on the choice of ε.
2) Piece-Regular signal denoising
We consider several noisy realizations of the Piece-Regular signal, taken from the Wavelab toolbox [33], under Gaussian, Student's-t, multimodal (Fig. 3(b)), and Laplacian noise contaminations. The noise variance is assumed to be known. Notably, the Gaussian and Student's-t distributions of noise are preserved by an orthonormal transform [Reference Kibria and Joarder44], unlike the Laplacian distribution. Therefore, the PE estimates for Laplacian noise and multimodal noise are computed based on a four-component and a three-component GMM approximation, respectively, in the DCT domain. The denoised signal corresponding to Laplacian noise is shown in Fig. 6 to serve as an illustration. The MPE estimates are better than SURE estimates. The SNR plots in Fig. 7 indicate that the MPE outperforms SURE for the noise types under consideration and that the gains are particularly high in the input SNR range of −5 to 20 dB and tend to reduce beyond 20 dB.
3) Effect of ε on the denoising performance of MPE
Obtaining a closed-form expression for the ε that maximizes the output SNR is not straightforward. We determine the optimum ε empirically by measuring the SNR gain as a function of ε (cf. Fig. 8) for i.i.d. Gaussian noise. We observe that the output SNR exhibits a peak approximately at β = ε/σ = 3.5 for the harmonic signal in (11) and at β = 3 for the Piece-Regular signal. As a rule of thumb, we recommend ε = 3σ for pointwise shrinkage estimators. Some insights into the role of ε and its choice will be presented in Section 4-4.3.
B) Performance of subband MPE shrinkage
To validate the performance of the MPE-based subband shrinkage estimator (cf. Section 2-2.5), we consider denoising of the Piece-Regular signal in additive Gaussian noise. The clean signal and its noisy measurement are shown in Fig. 9(a). Denoising is carried out by grouping k adjacent DCT coefficients to form a subband. The denoised signals obtained using SURE and MPE are shown in Fig. 9(b) and 9(c), respectively. The subband size k is chosen to be 16 and the parameter ε is set equal to $1.75\sqrt {k}\sigma$, a value that was determined experimentally and found to be nearly optimal. We observe that the MPE gives 1 dB improvement in SNR than the SURE approach.
The variation of the output SNR is also studied as a function of k (cf. Fig. 10). We experimented with ε = 3σ, $\epsilon =1.75\sqrt {k}\sigma$, and $\epsilon =1.25\sqrt {k}\sigma$ corresponding to subband sizes k = 1, k ∈ [2, 16], and k>16, respectively. For both SURE and MPE, as k increases, the output SNR also increases and eventually saturates for k ≥ 40. For input SNR below 15 dB, MPE gives a comparatively higher SNR than SURE, and the margin diminishes with increase in input SNR or the subband size k. The degradation in performance of SURE for low SNRs is due to the large error in estimating the MSE at such SNRs. The SURE-based estimate of MSE becomes increasingly reliable as k increases, thereby leading to superior performance.
IV. THE EXPECTED ℓ 1 DISTORTION
To eliminate the dependence of MPE on ε, we consider the accumulated probability of error, namely $\int _{0}^{\infty }\mathbb {P}( \left \vert \widehat {s}-s \right \vert >\epsilon )\mathrm {d} \epsilon$ as the risk to be minimized. For a nonnegative random variable Y, we know that $\mathcal {E} \{Y\}=\int _{0}^{\infty }\mathbb {P}( Y>\epsilon )\mathrm {d} \epsilon.$ Therefore, the accumulated probability of error is the expected ℓ1 distortion:
For Gaussian noise distribution,
Denoting u = ε − (a − 1)s/aσ and μ = −(a − 1)s/aσ, the first integral in (13) is evaluated as
The second term in (13) can be evaluated by replacing μ with −μ in (17). Combining both integrals, we obtain the expression for the expected ℓ1distortion:
An estimate of the expected ℓ1 distortion is calculated by replacing s with x. In Fig. 11(a), we show the variation of the original ℓ1 distortion and its estimate obtained by setting s = x, as functions of a, averaged over 100 independent realizations of $\mathcal {N}(0,1)$ noise. The actual parameter value is s = 4. The figure shows that the minimum of the expected ℓ1 risk is close to that of its estimate.
In principle, one could iteratively minimize the ℓ1 distortion by starting with $\ \widehat {s}=x$ and successively refining it. Such an approach is given in Algorithm 1. An illustration of the denoising performance of the iterative algorithm is deferred to Section 5.
The parentheses in $\hat{s}^{(j)}$ in the algorithm flowchart are not appearing properly.
A) Expected ℓ1 risk using GMM approximation
For the GMM p.d.f. in (8), the expected ℓ1 distortion evaluates to (cf. Appendix B for the derivation)
where μm = −((a − 1)s + θm)/aσm. The expected ℓ1 risk and its estimate for a multimodal (cf. Fig. 3(b)) and Laplacian noise p.d.f.s are shown in Figs 11(b) and 11(c), respectively. We observe that, in both cases, the locations of the minima of the true risk and its estimate are in good agreement.
B) Optimum shrinkage versus posterior SNR
We next study the behavior of a opt for different input SNRs to compare the denoising capabilities of the MPE and the expected ℓ1-distortion-based shrinkage estimators. The optimal pointwise shrinkage parameter a opt for Gaussian noise statistics, obtained by minimizing SURE, PE risk estimate, and the estimated ℓ1 risk, for different values of the a posteriori SNR x 2/σ2 is plotted in Fig. 12(a). To illustrate the effect of ε, the variation of a opt versus a posteriori SNR for MPE corresponding to Gaussian noise is shown in Fig. 12(b), for different ε. We observe that the shrinkage profiles are characteristic of a reasonable denoising algorithm, as Fig. 12(a) and 12(b) exhibit that the shrinkage parameters increase as the a posteriori SNR increases. Whereas in the case of MPE, the choice of ε is crucial, the expected ℓ1 distortion does not require tuning such a parameter.
C) Shrinkage profiles of estimators based on MPE/ℓ1 vis-à-vis SURE
To gain insight into the denoising behavior of the MPE and the ℓ1-minimization-based shrinkage estimators, it is instructive to study the resulting thresholding functions shown in Fig. 13. Figures 13(a) and 13(b) correspond to Gaussian noise. The comparison of the shrinkage functions based on MPE, ℓ1 risk, and SURE (cf. Fig. 13(a)) reveals that MPE results in a hard-thresholding-type shrinkage, whereas the ℓ1 risk and SURE produce shrinkage profiles similar to a soft-thresholding function (the threshold operator is called “soft” as it is a continuous function of the input data [Reference Johnstone27]). The shrinkage functions resulting from MPE with different ε are shown in Fig. 13(b). We observe that increasing ε is tantamount to applying a larger attenuation on the noisy signal. In Fig. 13(c), we demonstrate the variation of the MPE-based shrinkage with respect to two different noise distributions, namely, Gaussian and Laplacian, having the same first- and second-order moments. We observe that the MPE-based estimator corresponding to Laplacian noise attenuates larger amplitudes more than its Gaussian counterpart. Such an attenuation profile helps suppress Laplacian noise more effectively, considering the heavier tail of the Laplacian distribution. A distribution-specific shrinkage profile also distinguishes MPE from the conventional MSE-based estimators. In contrast, the MSE-based shrinkage estimator, given by [Reference Sadasivan, Mukherjee and Seelamantula28]: $\hat {s}_{{\rm MSE}}=\max \{0,1-{\sigma ^2}/{x^2}\}x,$ relies only on moments up to second order.
An inspection of the shrinkage functions corresponding to different ε values in Fig. 13(b) reveals a particularly interesting property: Transform coefficients with magnitudes smaller than ε are set to zero by the MPE-based shrinkage function. Thus, the PE risk essentially results in a distribution-specific threshold operator parameterized by ε. In this context, we recall an observation made by Johnstone et al. in [Reference Johnstone and Silverman45]: A threshold value of 3σ to 4σ or higher works well for signals that are sparse in the transform domain, whereas a threshold value of 2σ or lower is more appropriate for a dense signal. The optimal ε values obtained empirically for the harmonic and the Piece-Regular signals (cf. Section 3-3.1.3.1.3, Fig. 8) also support this statement. Since the harmonic signal is sparser than the Piece-Regular signal in the DCT domain, the optimal ε tends to be higher for the harmonic signal as compared with that for the Piece-Regular signal. Thus, the shrinkage functions shown in Fig. 13(b) provide us with valuable insights on how to fix ε. If the underlying signal is sparse in the transform considered, one should fix ε in the range [3σ, 4σ], whereas for dense signals, one should set ε to about 2σ.
V. PERFORMANCE OF THE EXPECTED ℓ1 DISTORTION-BASED POINTWISE SHRINKAGE ESTIMATOR
In a practical denoising application, we have only one noisy realization from which the clean signal has to be estimated. However, it is instructive to consider the case of multiple realizations as it throws some light on the performance comparisons vis-à-vis other estimators such as the ML estimator. Consider the observation model x(m) = s + w(m) in ℝn, 1 ≤ m ≤ M, where one has access to M noisy realizations of the signal s, and the noise vectors w(m) are drawn independently from the $\mathcal {N}(\bold 0,\sigma ^2 I_n)$ distribution. The ML estimator of the i th signal coefficient s i is given by $\hat {s}_{{\rm ML},i}=({1}/{M}) \sum _{m=1}^{M} x_i^{(m)}$, where $x_i^{(m)}$ is the i th component of x(m). Dropping the subscript i, as each coefficient is treated independently of the others, the shrinkage estimator takes the form $\,\widehat {s}=a_{{\rm opt}} \hat {s}_{{\rm ML}}$. To study the behavior of the estimate with respect to M, we consider two variants: (i) where a opt is obtained by minimizing $\mathcal {R}_{\ell _1}(s,a)$, referred to as the oracle-ℓ1; and (ii) where a opt is chosen to minimize $\mathcal {R}_{\ell _1}(\hat {s}_{{\rm ML}},a)$, referred to as ML-ℓ1. The output SNR as a function of M for the Piece-Regular signal, corresponding to an input SNR of 5 dB, is shown in Fig. 14(a). For all three estimators, namely, oracle-ℓ1, ML-ℓ1, and the ML estimate, the output SNR increases with M. However, for the oracle-ℓ1 and the ML-ℓ1 estimators, the output SNR stagnates as M increases beyond 40. For M ≤ 60, the oracle-ℓ1 and the ML-ℓ1 shrinkage estimators exhibit better performance compared with the ML estimator. As one would expect, the performance of the ML-ℓ1 estimator matches with that obtained using the oracle-ℓ1 as M becomes large, because the ML estimate converges in probability to the true parameter. For M = 1, which is often the case in practice, the ML-ℓ1 estimate significantly dominates the ML estimator as seen in Fig. 14(a). The SNR gain over the ML estimator could be further improved by using the iterative minimization algorithm introduced in Section 4 (cf. Algorithm 1). The performance of the ML-ℓ1 and the ML estimators, for different values of M and input SNR is shown in Fig. 14(b). The figures show that for small values of SNR and M, the ML-ℓ1 estimate outperforms the ML estimator. This is of significant importance in a practical setting where we have only one noisy realization (M = 1).
A) Iterative minimization of the expected ℓ1-risk
When M = 1, the ML-ℓ1 estimator is obtained by minimizing $\mathcal {R}_{\ell _1}(x,a)$, where x is the noisy version of s. We refer to this estimate as the non-iterative ℓ1-based shrinkage estimator. Following Algorithm 1, one could iteratively refine the estimate, starting from x. We compare the non-iterative ℓ1-based estimator with its iterative counterpart (ITER-ℓ1), and present the results in Figs 15,16, and 17, corresponding to Gaussian, multimodal (Fig. 3(b)), and a GMM approximation to the Laplacian noise, respectively. The output SNR obtained using the oracle-ℓ1 estimator, calculated by minimizing $\mathcal {R}_{\ell _1}(s,a)$, is also shown for benchmarking the performance.
We make the following observations from Figs 15,16, and 17: (i) the output SNR increases with iterations, albeit marginally after about 10 iterations; (ii) the iterative method consistently dominates the non-iterative one, with an overall SNR improvement of about 2–3 dB, for input SNR in the range −5 dB to 20 dB; and (iii) the SNR gain of the iterative technique also reduces for higher input SNR, similar to other denoising algorithms.
VI. PERFORMANCE ASSESSMENT OF MPE AND ℓ1-RISK MINIMIZATION ALGORITHMS VERSUS BENCHMARK DENOISING ALGORITHMS
We compare the MPE and the ℓ1-based shrinkage estimators with three denoising algorithms: (i) wavelet soft-thresholdingFootnote 1 (ST) [Reference Donoho35]; (ii) the SURE-LET denoising algorithmFootnote 2 [Reference Luisier, Blu and Unser5]; and (iii) smooth sigmoid shrinkage (SS) [Reference Atto, Pastor and Mercier13] in the wavelet domainFootnote 3. In [Reference Donoho35], a wavelet-based soft-thresholding scheme is used for denoising, with the threshold selected as $\tau =\sigma \sqrt []{2 \ln (N)}$ for an N-length signal. The SURE-LET technique employs a linear expansion of thresholds (LET), which is a linear combination of elementary denoising functions and optimizes for the coefficients by minimizing the SURE criterion. In [Reference Atto, Pastor and Mercier13], a smooth sigmoid shrinkage is applied on the wavelet coefficients to achieve denoising, and the parameters of the sigmoid, which control the degree of attenuation, are obtained by minimizing the SURE objective. We consider ECG signals taken from the PhysioBank database, and the HeaviSine and Piece-Regular signals taken from Wavelab toolbox for performance evaluation.
The noise is assumed to follow a Gaussian distribution and the output SNR values are averaged over 100 independent realizations. The noise variance is estimated using a median-based estimator [Reference Mallat40], which was also employed by Luisier et al.1 and Donoho2. In SURE-LET, SS, and ST techniques, denoising is performed using Symmlet-4, with three levels of decomposition, as these settings were found to be the best for the ECG signal (following [Reference Krishnan and Seelamantula20]). In the case of MPE and ℓ1-based shrinkage estimators, denoising is performed in the DCT domain. We use the shorthand notations MPE and MPE-SUB to denote the pointwise and subband shrinkage estimators, respectively. The corresponding SURE-based subband shrinkage estimator is denoted as SURE-SUB. We set k = 16 and $\epsilon =1.75\sqrt {k}\sigma$ for computing the subband shrinkage parameters. These parameters have not been specifically optimized; however, experimentally they were found to work well. The output SNR as a function of the input SNR, obtained using various algorithms, is shown in Fig. 18.
From the ECG signal denoising performance shown in Fig. 18(a), we observe that the MPE estimate consistently dominates soft-thresholding-based denoising for input SNRs ranging from −5 dB to 20 dB. The iterative ℓ1-distortion-based shrinkage estimator (20 iterations) yields lower output SNR compared with the MPE-based estimate for input SNR values in the range −5 to 17.5 dB, but surpasses it for relatively higher values of input SNR (17.5–20 dB). The SURE-LET and SS algorithms dominate both MPE and ℓ1-based shrinkage estimators, because they use more sophisticated denoising functions in the transform domain, thereby offering greater flexibility. For input SNR in the range 0–20 dB, the expected ℓ1-distortion-based shrinkage estimator consistently outperforms the soft-thresholding-based technique. We observe from Fig. 18 that the MPE-subband estimator is better than the competing algorithms at low input SNR.
In the preceding discussion, we considered DCT-domain shrinkage for the proposed methods and wavelet-based denoising for the benchmark methods, because these are the best-case performance scenarios. In order to maintain uniformity in comparison, we have also considered the scenarios where all the techniques operate either in the DCT or wavelet domain. Figure 19 shows a performance comparison in the case of DCT. For input SNRs in the range −5 to 15 dB, MPE-SUB shows the best denoising performance. Next, we address wavelet-domain denoising using MPE. We consider the same denoising framework as that of Atto and Pastor [Reference Atto, Pastor and Mercier13], but replace their sigmoid-shrinkage function with MPE-based shrinkage in every subband. The parameter ε has been set to ε = 3σ for MPE and $\epsilon =1.25\sigma \sqrt {k}$ for MPE-SUB, where k is the subband signal length. Tables 2 and 4 show that for input SNR below 10 dB, MPE-SUB gives better results than the competing algorithms. For the HeaviSine signal (cf. Table 3), soft-thresholding technique gave a better performance than the other techniques. However, MPE-SUB leads to a better performance than SURE-LET, SURE-SUB, and SS. The wavelet-domain denoising results are shown in a tabular form and DCT-domain denoising in figures, since the relative margin of improvement is smaller in the case of wavelet. When all the techniques operate in the DCT domain, the margin of improvement is maximum in the case of MPE.
VII. CONCLUSIONS
We have proposed a new framework for signal denoising based on a novel criterion, namely the probability of error. Our framework is applicable to scenarios where the noise is i.i.d. and additive. Denoising is performed by means of optimum transform-domain shrinkage in the sense of minimum probability of error. We have considered both pointwise and subband shrinkage estimators within the MPE paradigm. The performance of the proposed MPE estimators depends on the choice of the error-tolerance parameter ε, which was determined empirically considering SNR gain as the quantity of interest. The parameter ε also acts as the threshold for the shrinkage function. We also proposed an alternative by integrating the probability of error for ε ∈ (0, + ∞), which led to the expected ℓ1 distortion, which was minimized iteratively. We demonstrated that iterations improve the denoising performance and the resulting shrinkage estimator outperforms the classical ML estimator when the number of observations is small or the input SNR is low.
For performance comparisons with benchmarking denoising algorithms, we considered real ECG signals and Wavelab signals in Gaussian noise. Experimentally, we have found that an increase in subband size leads to a higher output SNR, and saturates beyond a point. When the subband size is small or the input SNR is low, MPE outperforms SURE.
We demonstrated that the optimum shrinkage parameter obtained by minimizing estimates of the PE/ℓ1 distortions increases monotonically with increase in the a posteriori SNR. This behavior of the shrinkage parameter is essential for denoising. A theoretical characterization of this behavior is needed and may lead to interesting inferences, which could lead to a rigorous convergence proof for the proposed iterative expected ℓ1 distortion minimization technique. Another important observation is that, for lower input SNRs, the proposed denoising framework yields a higher output SNR than MSE-based techniques. The improvement may be attributed to the fact that the MPE framework incorporates knowledge of the distribution of the observations, which goes beyond the second-order statistics considered in conventional MSE-based optimization. To the best of our knowledge, this is the first attempt at demonstrating competitive denoising performance with probability of error chosen as the distortion measure in a non-Bayesian setting.
Following our preliminary contribution [Reference Sadasivan, Mukherjee and Seelamantula28], recently, Kudryavtsev and Shestakov considered the probability that the maximum error between the estimates and true wavelet transform coefficients exceeds a critical value and analyzed the asymptotic behavior of the resulting optimal minimax threshold value considering specifically hard- and soft-thresholding schemes [Reference Kudryavtsev and Shestakov46]. However, the advantages of such a strategy in the context of a denoising task are yet to be ascertained.
ACKNOWLEDGMENTS
This work was supported by Indian Space Research Organization – Indian Institute of Science (ISRO-IISc) Space Technology Cell, and an IMPRINT project (No. 6000) funded by the Ministry of Human Resource Development and the Ministry of Health and Family Welfare, Government of India.
APPENDIX
A) PERTURBATION OF SURE-BASED POINTWISE SHRINKAGE
To analyze the perturbation in the location of the minimum of the SURE cost, in comparison with the true MSE, one needs to evaluate
where a opt(s) = s 2/(s 2 + σ2) and a opt(x) = 1 − σ2/x 2. Let
The Taylor-series expansion of h(x) about s yields
where h (n) is the nth derivative h. Using the first-order Taylor series approximation h(x) ≈ h(s) + w h (1)(s), we obtain
which, in turn, leads to an approximation of the perturbation probability $P_e^{{\rm SURE}}$:
Invoking $w\sim \mathcal {N}(0,\sigma ^2)$, and using the Chernoff bound [Reference Mitzenmacher and Upfal36], we obtain
Consequently, to satisfy an upper bound on the deviation probability of the form $P_e^{{\rm SURE}} \leq \alpha$, for a given δ > 0, one must ensure that
The condition in (A.1) translates to an equivalent condition on the minimum required SNR s 2/σ2 to achieve a certain $P_e^{{\rm SURE}}$.
B) EXPECTED ℓ1 RISK FOR GMM
For additive noise with the p.d.f. given in (8), we have
using (9) and (12). Letting μm = −((a − 1)s + θm)/aσm and u m = (ε − (a − 1)s − θm)/aσm, we get
Substituting (A.3) in (A.2) yields
which is the expression for the expected ℓ1 distortion for noise following a GMM distribution.