Sign In

Communications of the ACM

Research highlights

Self-Similarity-Based Image Denoising

blurry photo

Credit: Wikimedia

The search for efficient image denoising methods is still a valid challenge at the crossing of functional analysis and statistics. In spite of the sophistication of the recently proposed methods, most algorithms have not yet attained a desirable level of applicability. All show an outstanding performance when the image model corresponds to the algorithm assumptions but fail in general and create artifacts or remove image fine structures. The main focus of this paper is, first, to define a general mathematical and experimental methodology to compare and classify classical image denoising algorithms and, second, to describe the nonlocal means (NL-means) algorithm6 introduced in 2005 and its more recent extensions. The mathematical analysis is based on the analysis of the "method noise," defined as the difference between a digital image and its denoised version. NL-means, which uses image self-similarities, is proven to be asymptotically optimal under a generic statistical image model. The denoising performance of all considered methods are compared in four ways: mathematical, asymptotic order of magnitude of the method noise under regularity assumptions; perceptual-mathematical, the algorithms artifacts and their explanation as a violation of the image model; perceptual-mathematical, analysis of algorithms when applied to noise samples; quantitative experimental, by tables of L2 distances of the denoised version to the original image.

Back to Top


The goal of image denoising methods is to recover the original image from a noisy measurement:


where v(i) is the observed value, u(i) is the "true" value, and n(i) is the noise perturbation at a pixel i. The best simple way to model the effect of noise on a digital image is to add a Gaussian white noise. In that case, n(i) are i.i.d. Gaussian values with zero mean and variance σ2.

Several methods have been proposed to remove the noise and recover the true image u. Even though they may be very different in tools it must be emphasized that a wide class share the same basic remark: denoising is achieved by averaging. This averaging may be performed locally: the Gaussian smoothing model,27 the anisotropic filtering,2,35 and the neighborhood filtering;40,42,46 by the calculus of variations: the total variation minimization;39 or in the frequency domain: the empirical Wiener filters46 and wavelet thresholding methods.12,17

Formally we define a denoising method Dh as a decomposition


where v is the noisy image and h is a filtering parameter, which usually depends on the standard deviation of the noise. Ideally, Dhv is smoother than v and n(Dh, v) looks like the realization of a white noise.

The denoising methods should not alter the original image u. Now, most denoising methods degrade or remove the fine details and texture of u. In order to better understand this removal, we shall introduce and analyze the method noise. The method noise is defined as the difference between the original (always slightly noisy) image u and its denoised version.

The denoising methods should not introduce visual artifacts. The noise-to-noise principle requires that a denoising algorithm transforms a white noise into white noise. This paradoxical requirement seems to be the best way to characterize artifact-free algorithms.

We also propose and analyze the NL-means algorithm, which is defined by the simple formula


where cacm5405_a.gif dz is a normalizing constant, Ga is a Gaussian kernel, and h acts as a filtering parameter. This formula amounts to say that the denoised value at x is a mean of the values of all points whose Gaussian neighborhood looks like the neighborhood of x. The main difference of the NL-means algorithm with respect to local filters or frequency domain filters is the systematic use of all possible self-predictions the image can provide, in the spirit of Efros and Leung.19 Section 2 gives formulas for the method noise of several classic algorithms. Section 3 describes NL-means. Section 4 gives its consistency results. A substantial experimental Section 5 compares several classic algorithms by the new introduced criteria and Section 6 reviews many recent extensions, applications, and variants.

Back to Top


DEFINITION 1 (METHOD NOISE). Let u be an image and Dh a denoising operator depending on a filtering parameter h. Then, we define the method noise as the image difference u – Dhu.

The application of a denoising algorithm should not alter the non-noisy images. So the method noise should be very small when some kind of regularity for the image is assumed. If a denoising method performs well, the method noise must look like a noise even with non-noisy images and should contain as little structure as possible. Since even good quality images have some noise, it makes sense to evaluate any denoising method in that way, without the traditional "add noise and then remove it" trick. We shall list formulas permitting to compute and analyze the method noise for several classical local smoothing filters: the Gaussian filtering,27 the anisotropic filtering,2,35 the total variation minimization,39 and the neighborhood filtering.46 The formal analysis of the method noise for the frequency domain filters falls out of the scope of this paper. These method noises can also be computed but their interpretation depends on the particular choice of the wavelet or Fourier basis.

The Gaussian Filtering: The image isotropic linear filtering boils down to the convolution of the image by a linear symmetric kernel. The paradigm of such kernels is of course the Gaussian kernel cacm5405_b.gif. In that case, Gh has standard deviation h and it is easily seen that

THEOREM 1 (GABOR 1960). The image method noise of the Gaussian convolution satisfies u – Gh * u = −h2 Δ u + o(h2).

The Gaussian method noise is zero in harmonic parts of the image and very large near edges or texture, where the Laplacian cannot be small. As a consequence, the Gaussian convolution is optimal in flat parts of the image but edges and texture are blurred.

The Anisotropic Filtering: The anisotropic filter (AF) attempts to avoid the blurring effect of the Gaussian by convolving the image u at x only in the direction orthogonal to Du(x). The idea of this filter goes back to Perona and Malik35 and is interpreted in Alvarez et al.2 It is defined by


for x such that Du(x) ≠ 0 and where cacm5405_c.gif = (–y, x) and Gh is the one-dimensional Gauss function with variance h2. If one assumes that the original image u is twice continuously differentiable (C2) at x, it is easily shown by a second-order Taylor expansion that

THEOREM 2. The image method noise of AFh satisfies (for Du(x) ≠ 0)


By curv(u) (x), we denote the curvature, that is, the signed inverse of the radius of curvature of the level line passing by x. This method noise is zero wherever u behaves locally like a straight line and large in curved edges or texture (where the curvature and gradient operators take high values). As a consequence, the straight edges are well restored while flat and textured regions are degraded.

Total Variation Minimization: The total variation minimization was introduced by Rudin et al.39 Given a noisy image v(x), these authors proposed to recover the original image u(x) as the solution of the minimization problem:


where TV (u) denotes the total variation of u and λ is a given Lagrange multiplier. The minimum of the above minimization problem exists and is unique. The parameter λ is related to the noise statistics and controls the degree of filtering of the obtained solution.

THEOREM 3. The method noise of the total variation minimization is


As in the anisotropic case, straight edges are maintained because of their small curvature. However, details and texture can be over smoothed if λ is too small.

Neighborhood Filtering: The previous filters are based on a notion of spatial neighborhood or proximity. Neighborhood filters instead take into account gray-level values to define neighboring pixels. In the simplest and more extreme case, the denoised value at pixel i is an average of values at pixels that have a gray-level value close to u(i). The gray-level neighborhood is therefore


This is a fully nonlocal algorithm, since pixels belonging to the whole image are used for the estimation at pixel i. This algorithm can be written in a more continuous form:


where cacm5405_d.gif is an open and bounded set, and C(x) = cacm5405_e.gif dy is the normalization factor.

The Yaroslavsky neighborhood filters46 consider mixed neighborhoods B(i, h)࣮Bρ(i), where Bρ(i) is a ball of center i and radius ρ. So the method takes an average of the values of pixels that are both close in gray-level and spatial distance. This filter can be easily written in a continuous form as


where cacm5405_f.gif dy is the normalization factor. More recent versions, namely the SUSAN filter40 and the Bilateral filter42 weigh the distance to the reference pixel x instead of considering a fixed spatial neighborhood.

In the next theorem, we compute the asymptotic expansion of the Yaroslavky neighborhood filter when ρ, h → 0 have the same order of magnitude. The proof and generalizations of this result can be found in Buades et al.7

THEOREM 4. Suppose u isin.gif C2(Ω), and let ρ, h, α > 0 such that ρ, h → 0 and h = O(ρ). Let us consider the continuous function cacm5405_g.gif defined by cacm5405_h.gif for t ≠ 0, cacm5405_i.gif, where cacm5405_j.gif ds. Let cacm5405_k.gif be the continuous function defined by cacm5405_l.gif. Then, for x isin.gif Ω,


According to Theorem 4 the Yaroslavsky neighborhood filter acts as an evolution PDE with two terms. The first term is proportional to the second derivative of u in the direction ζ, which is tangent to the level line passing through x. The second term is proportional to the second derivative of u in the direction η, which is orthogonal to the level line passing through x.

The weighting coefficient of the tangent diffusion, uζζ, is given by cacm5405_m.gif. The function cacm5405_g.gif is positive and decreasing. Thus, there is always diffusion in that direction. The weight of the normal diffusion, uηη, is given by cacm5405_n.gif. As the function cacm5405_k.gif takes positive and negative values (see Figure 1), the filter behaves as a filtering/enhancing algorithm in the normal direction depending on |Du|. The intensity of the filtering in the tangent diffusion and the enhancing in the normal diffusion tend to zero when the gradient tends to infinity. Thus, points with a very large gradient are not altered.

The neighborhood filter asymptotically behaves as the Perona–Malik equation,35 also creating shocks inside smooth regions (see Buades et al.7 for more details on this comparison).

Back to Top


Given a discrete noisy image v = {v(i)|i isin.gif I}, the estimated value NL[v](i), for a pixel i, is computed as weighted average of all the pixels in the image:


where the family of weights {w(i, j)}j depend on the similarity between the pixels i and j and satisfy the usual conditions 0 ≤ w(i, j) ≤ 1 and cacm5405_p.gif w(i, j) = 1.

The similarity between two pixels i and j depends on the similarity of the intensity gray level vectors v(Ni) and v(Nj), where Nk denotes a square neighborhood of fixed size centered at a pixel k. This similarity is measured as a decreasing function of the weighted Euclidean distance, cacm5405_q.gif, where a > 0 is the standard deviation of the Gaussian kernel. The expectation of the Euclidean distance of the noisy neighborhoods is


This equality shows the robustness of the algorithm since in expectation the Euclidean distance preserves the order of similarity between pixels.

The pixels with a color neighborhood similar to v(Ni) have larger weights in the average, see Figure 2. These weights are defined by


where Z(i) is the normalizing constant and the parameter h controls the decay of the weights as a function of the Euclidean distances. NL-means not only compares the gray level at a single point but the geometrical configuration in a whole neighborhood. This fact allows a more robust comparison than neighborhood filters (see Figure 2). This figure illustrates how NL-means implicitly chooses a best averaging neighborhood depending on the image structure in a wide neighborhood.

In Section 2, we have computed explicitly the method noise of the local smoothing filters. These formulas are corroborated by the visual experiments of Figure 3. This figure displays the method noise for the standard image Boat, that is, the difference u–Dh(u), where the parameter h is been fixed in order to remove a noise with standard deviation 2.5. The method noise helps us in understanding the performance and limitations of the denoising algorithms, since removed details or texture correspond to a large method noise. We see in Figure 3 that the NL-means method noise does not present noticeable geometrical structures. Figure 4 explains this property since it shows how the NL-means algorithm chooses a weighting configuration adapted to the local and nonlocal geometry of the image.

Back to Top


Under stationarity assumptions, for a pixel i, the NL-means algorithm converges to the conditional expectation of i once observed a neighborhood of it. In this case, the stationarity conditions amount to say that as the size of the image grows, we can find many similar patches for all the details of the image.

Let V be a random field and suppose that the noisy image v is a realization of V. Let Z denote the sequence of random variables Zi = {Yi, Xi} where Yi = V(i) is real valued and Xi = V (Ni\{i}) is cacm5405_r.gif valued. The NL-means is an estimator of the conditional expectation r (i)=E[Yi|Xi = v(Ni\{i})].

THEOREM 5 (CONDITIONAL EXPECTATION THEOREM). Let Z = {V(i), V(Ni\{i})} for i = 1, 2, ... be a strictly stationary and mixing process. Let NLn denote the NL-means algorithm applied to the sequence cacm5405_s.gif. Then for j isin.gif {1, ..., n},


The full statement of the hypothesis of the theorem and its proof can be found in a more general framework in Roussas.38 This theorem tells us that the NL-means algorithm corrects the noisy image rather than trying to separate the noise (oscillatory) from the true image (smooth).

In the case that an additive white noise model is assumed, the next result shows that the conditional expectation is the function of V (Ni\{i}) that minimizes the mean square error with the true image u.

THEOREM 6. Let V, U, N be random fields on I such that V = U + N, where N is a signal-independent white noise. Then, the following statements hold good.

  1. E[V(i) |Xi=x] = E[U(i)| Xi=x] for all i isin.gif cacm5405_r.gif.
  2. The expected random variable E[U(i) | V(Ni\{i})] is the function of V (Ni\{i}) that minimizes the mean square error


Similar optimality theoretical results have been obtained in Ordentlich et al.34 and presented for the denoising of binary images.

Back to Top


In this section, we compare the local smoothing filters, the wavelet thresholding algorithms,17 sliding DCT Wiener filter,46 and the NL-means algorithm. The wavelet thresholding and the sliding DCT algorithms yield state-of-the-art results among frequency domain filters.

For computational purposes of the NL-means algorithm, the search of similar windows was restricted to a larger "search window" with size S × S pixels. In all experiments, the search window has 21 × 21 pixels and the similarity square neighborhood 3 × 3 pixels for color images and 5 × 5 pixels for gray images. When denoising a color image, the whole color patch containing the red, green, and blue pixels is compared. Thus, a single weight is obtained for any pair of pixels and used for the denoising of the three channels at this pixel.

The filtering parameter h has been fixed to 0.8σ when a noise of standard deviation σ is added. Due to the fast decay of the exponential kernel, large Euclidean distances lead to nearly zero weights acting as an automatic threshold (Figure 4).

Visual Comparison: Visual criteria remains essential to decide if the quality of the image has been improved by the denoising method. We display some denoising experiences comparing the NL-means algorithm with local smoothing and frequency domain filters. All experiments have been simulated by adding a Gaussian white noise of standard deviation σ to the true image. The objective is to compare the visual quality of the restored images, the nonpresence of artifacts, and the correct reconstruction of edges, texture, and details.

Due to the nature of the algorithm, the most favorable case for NL-means is the textured or periodic case. In this situation, for every pixel i, we can find a large set of samples with a very similar configuration. See Figure 4f for an example of the weight distribution of the NL-means algorithm for a periodic image. Figure 5 illustrates the performance of the NL-means for a natural texture and Figure 6 for an artificial periodic pattern.

Natural images also have enough redundancy to be restored by NL-means, see Figure 7. Flat zones present a huge number of similar configurations lying inside the same object, see Figure 4a. Straight or curved edges have a complete line of pixels with similar configurations, see Figure 4b, c. In addition, natural images allow us to find many similar configurations in far away pixels, as Figure 4f shows. Figure 7 shows an experiment on a natural image. This experience must be compared with Figure 3, where we display the method noise of the original image. The blurred or degraded structures of the restored images coincide with the noticeable structures of its method noise. Figure 8 shows that the frequency domain filters are well adapted to the recovery of oscillatory patterns. Although some artifacts are noticeable in both solutions, the stripes are well reconstructed. The DCT transform seems to be more adapted to this type of texture, and stripes are little better reconstructed. NL-means also performs well on this type of texture, due to its high degree of redundancy.

Noise-to-Noise Criterion: The noise-to-noise principle requires that a denoising algorithm transforms a white noise into white noise. This paradoxical requirement seems to be the best way to characterize artifact-free algorithms. The transformation of a white noise into any correlated signal creates structure and artifacts. Only white noise is perceptually devoid of visual structure, a principle first stated by Attneave.4 Figure 9 shows how denoising methods transform a white noise.

The convolution with a Gauss kernel is equivalent to the product in the Fourier domain with a Gauss kernel of inverse standard deviation. Therefore, convolving the noise with a kernel reinforces the low frequencies and cancels the high ones. Thus, the filtered noise actually shows big grains due to its prominent low frequencies.

Noise filtered by a wavelet thresholding is no more a white noise. The few coefficients with a magnitude larger than the threshold are spread all over the image. The pixels that do not belong to the support of one of these coefficients are set to zero. The visual result is a constant image with superposed wavelets as displayed in Figure 9. It is easy to prove that the denoised noise is spatially highly correlated.

Given a noise realization, the filtered value by the neighborhood filter at a pixel i only depends on its value n(i) and the parameters h and ρ. The neighborhood filter averages noise values at a distance from n(i) less or equal than h. Thus, when the size ρ of the neighborhood increases, by the law of large numbers the filtered value tends to the expectation of the Gauss distribution restricted to the interval (n(i) – h, n(i) + h). This filtered value is therefore a deterministic function of n(i) and h. Independent random variables are mapped by a deterministic function on independent variables. Thus the noise-to-noise requirement is asymptotically satisfied by the neighborhood filter. NL-means satisfies the noise-to-noise principle in a similar manner. However, a mathematical statement and proof of this property are more intricate and we shall skip them.

Numerical Comparison: Table 1 displays the mean square error for the denoising experiments given in the paper. This numerical measurement is the most objective one, since it does not rely on any visual interpretation. However, this error is not computable in a real problem and a small mean square error does not assure a high visual quality. So all above-discussed criteria seem necessary to compare the performance of denoising algorithms.

Back to Top


Our conclusions on the better denoising performance of nonlocal methods with respect to former state-of-the-art algorithms such as the total variation or the wavelet thresholding have been widely accepted. The "method noise" methodology to compare the denoising performance has been adopted ever since.

The NL-means algorithm is easily extended to the denoising of image sequences and video. The denoising algorithm involves indiscriminately pixels not belonging only to same frame but also to all frames in the image. The algorithm favors pixels with a similar local configuration, as the similar configurations move, so do the weights. Thus, the algorithm is able to follow the similar configurations when they move without any explicit motion computation (see Figure 10). This is not the case of classical movie denoising algorithms, which are motion compensated (see Buades et al.9 for more details on this discussion). The very same idea on movie denoising can be applied for super-resolution, an image zooming method by which several frames from a video, or several low resolution photographs, can be fused into a larger image.20,37

Improvements or adaptations of the NL-means algorithm have been proposed for the denoising of several types of data: in fluorescence microscopy,5 cryon microscopy,15 magnetic resonance imaging (MRI),31 and 3D data set points.47

Most successful improvement of NL-means combine the nonlocal principle with former classic algorithms and have indeed shown an improved denoising performance. Probably the best performing methods so far are the hybrid method BM3D proposed in Dabov et al.14 and the NL-PCA proposed in Zhang et al.48 Both algorithms combine not less than block-matching, a linear transform thresholding, and Wiener filtering.

The NL-means algorithm has also expanded to most image processing tasks: Demosaicking, which is the operation that transforms the "R or G or B" raw image in each camera into an "R and G and B" image;10,30 movie colorization21,26;image inpainting by proposing a nonlocal image inpainting variational framework with a unified treatment of geometry and texture3 (see also Wong and Orchard44); Zooming by a fractal like technique where examples are taken from the image itself at different scales18; movie flicker stabilization16 that compensates spurious oscillations in the colors of successive frames.

NL-means is a computationally demanding algorithm. Several papers have proposed fast and extremely fast (linear) implementations, by block preselection,29 by Gaussian KD-trees to classify image blocks,1 by SVD,33 by using the FFT to compute correlation between blocks43, and by statistical arguments.13 The statistical validity of the NL-means algorithm is wide open. See Ebrahimi and Vrscay,18 Kervrann et al.,24 and Thacker et al.41 (where a Bayesian interpretation is proposed) or Xu et al.45 (where a bias of NL-means is corrected).

The relationship of neighborhood filters to classic local PDE's has been discussed in Buades et al.,7,8 leading to an adaptation of NL-means that avoids the stair casing effect. Yet, the main interest has shifted to defining nonlocal PDEs. The extension of the NL-means method to define nonlocal image-adapted differential operators and nonlocal variational methods starts with Kindermann et al.,25 who propose to perform denoising and deblurring by nonlocal functionals. Several articles on deblurring have followed this variational line22,23,32 (for image segmentation) and Lou et al.28 for deconvolution and tomographic reconstruction.

A particular notion of nonlocal PDE has emerged, whose coefficients are actually image dependent. For instance, in Elmoataz et al.,21 the image colorization is viewed as the minimization of a discrete partial differential functional on the weighted block graph. Thus, it can be seen either as a nonlocal heat equation on the image or as a local heat equation on the space of image patches.

The exploration of image redundancy and its application to image restoration has led to new attempts at sparse image representations by block dictionaries.24 Algorithms and application have been developed by Chatterjee and Milanfar,11 Mairal et al.,30 and Protter and Elad.36

Back to Top


This work was partially financed by MCYIT grant number TIN2008–04752.

Back to Top


1. Adams, A., Gelfand, N., Dolson, J., Levoy, M. Gaussian kd-trees for fast high-dimensional filtering. ACM Trans. Graphics 28, 3 (2009), 1–12.

2. Alvarez, L., Lions, P., Morel, J. Image selective smoothing and edge detection by nonlinear diffusion. II. SIAM J. Numer. Anal. 29, 3 (1992), 845–866.

3. Arias, P., Caselles, V., Sapiro, G. A variational framework for non-local image inpainting. In Proceedings of the 7th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR '09) (Bonn, Germany, August 24–27, 2009), Springer, Heidelberg.

4. Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 61, 3 (1954), 183–193.

5. Boulanger, J., Sibarita, J., Kervrann, C., Bouthemy, P. Non-parametric regression for patch-based fluorescence microscopy image sequence denoising. In 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008 (Paris, France, May 14–17, 2008), ISBI 2008, 748–751.

6. Buades, A., Coll, B., Morel, J. A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4, 2 (2005), 490–530.

7. Buades, A., Coll, B., Morel, J. Neighborhood filters and PDE's. Numer. Math. 105, 1 (2006), 1–34.

8. Buades, A., Coll, B., Morel, J. The staircasing effect in neighborhood filters and its solution. IEEE Trans. I.P. 15, 6 (2006), 1499–1505.

9. Buades, A., Coll, B., Morel, J. Nonlocal image and movie denoising. Int. J. Comput. Vision 76, 2 (2008), 123–139.

10. Buades, A., Coll, B., Morel, J., Sbert, C. Self-similarity driven color demosaicking. IEEE Trans. I.P. 18, 6 (2009), 1192–1202.

11. Chatterjee, P., Milanfar, P. Image denoising using locally learned dictionaries. Comput. Imag. VII, SPIE7246 (2009), 72460V–72460V.

12. Coifman, R., Donoho, D. Translation-invariant de-noising. In Wavelets and Statistics. Lecture Notes in Statistics (Springer Verlag, New York, 1995), 125–150.

13. Coupé, P., Yger, P., Barillot, C. Fast non local means denoising for 3D MR images. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2006. Lecture Notes in Computer Science (Springer, New York, 2006), 33–40.

14. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. I.P. 16, 8 (2007), 2080–2095.

15. Darbon, J., Cunha, A., Chan, T., Osher, S., Jensen, G. Fast nonlocal filtering applied to electron cryomicroscopy. In 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (Paris, France, May 14–17, 2008), 1331–1334.

16. Delon, J., Desolneux, A. Stabilization of flicker-like effects in image sequences through local contrast correction. SIAM J. Imag. Sci. 3, 4 (2010), 703–734.

17. Donoho, D. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41 (1995), 613–627.

18. Ebrahimi, M., Vrscay, E. Solving the inverse problem of image zooming using "Self-Examples". In Volume 4633 of Lecture Notes in Computer Science (2007), 117.

19. Efros, A., Leung, T. Texture synthesis by non parametric sampling. In Volume 2 of Proceedings of the 7th IEEE International Conference on Computer Vision (Corfu, Greece, 1999), 1033–1038.

20. Elad, M., Datsenko, D. Example-based regularization deployed to superresolution reconstruction of a single image. Comput. J. 50, 4 (Apr. 2007), 1–16.

21. Elmoataz, A., Lezoray, O., Bougleux, S., Ta, V. Unifying local and nonlocal processing with partial difference operators on weighted graphs. In International Workshop on Local and Non-Local Approximation in Image Processing (Lausanne, Switzerland, August 23–24, 2008).

22. Gilboa, G., Osher, S. Nonlocal linear image regularization and supervised segmentation. Multiscale Model. Simul. 6, 2 (2008), 595–630.

23. Jung, M., Vese, L. Nonlocal variational image deblurring models in the presence of Gaussian or impulse noise. Scale Space and Variational Methods in Computer Vision (2009), 401–412.

24. Kervrann, C., Boulanger, J., Coupe, P. Bayesian non-local means filter image, redundancy and adaptive dictionaries for noise removal. In Volume 4485 of Lecture Notes in Computer Science (Ischia, Italy, May 30–June 2, 2007), 520.

25. Kindermann, S., Osher, S., Jones, P. Deblurring and denoising of images by nonlocal functionals. SIAM MMS 4, 4 (2006), 1091–1115.

26. Lezoray, O., Ta, V., Elmoataz, A. Nonlocal graph regularization for image colorization. In International Conference on Pattern Recognition (Tampa, FL, December 8–11, 2008).

27. Lindenbaum, M., Fischer, M., Bruckstein, A. On Gabor's contribution to image enhancement. Pattern Recognit. 27, 1 (1994), 1–8.

28. Lou, Y., Zhang, X., Osher, S., Bertozzi, A. Image recovery via nonlocal operators. J. Sci. Comput. 42, 2 (2010), 185–197.

29. Mahmoudi, M., Sapiro, G. Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE Signal Process. Lett. 12, 12 (2005), 839–842.

30. Mairal, J., Elad, M., Sapiro, G. Sparse representation for color image restoration. IEEE Trans. I.P. 17, 1 (2007), 53–69.

31. Manjón, J., Carbonell-Caballero, J., Lull, J., García-Martí, G., Martí-Bonmatí, L., Robles, M. MRI denoising using Non-Local Means. Med. Image Anal. 12, 4 (2008), 514–523.

32. Mignotte, M. A non-local regularization strategy for image deconvolution. Pattern Recognit. Lett. 29, 16 (2008), 2206–2212.

33. Orchard, J., Ebrahimi, M., Wong, A. Efficient non-local-means denoising using the SVD. In Proceedings of IEEE International Conference on Image Processing (San Diego, CA, October 12–15, 2008).

34. Ordentlich, E., Seroussi, G., Verdu, M.W.S., Weissman, T. A discrete universal denoiser and its application to binary images. In Volume 1 of Proceedings of IEEE International Conference on Image Processing (Barcelona, Spain, 2003), 117–120.

35. Perona, P., Malik, J. Scale-space and edge detection using anisotropic diffusion. IEEE Trans. PAMI 12, 7 (1990), 629–639.

36. Protter, M., Elad, M. Image sequence denoising via sparse and redundant representations. IEEE Trans. Image Process. 18, 1 (2009), 27–35.

37. Protter, M., Elad, M., Takeda, H., Milanfar, P. Generalizing the non-local-means to super-resolution reconstruction. IEEE Trans. Image Process. 18, 1 (January 2009), 36–51.

38. Roussas, G. Nonparametric regression estimation under mixing conditions. Stochastic Process. Appl. 36 (1990), 107–116.

39. Rudin, L., Osher, S., Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D 60 (1992), 259–268.

40. Smith, S., Brady, J. SUSAN : A new approach to low level image processing. Int. J. Comput. Vision 23, 1 (1997), 45–78.

41. Thacker, N., Manjon, J., Bromiley, P. A statistical interpretation of non-local means. In 5th International Conference on Visual Information Engineering (Xian, China, 2008), 250–255.

42. Tomasi, C., Manduchi, R. Bilateral filtering for gray and color images. In International Conference on Computer Vision (Bombay, India, 1998), 839–846.

43. Wang, J., Guo, Y., Ying, Y., Liu, Y., Peng, Q. Fast non-local algorithm for image denoising. In IEEE International Conference on Image Processing (Atlanta, GA, October 8–11, 2006), 1429–1432.

44. Wong, A., Orchard, J. A nonlocal-means approach to exemplar-based inpainting. In IEEE International Conference on Image Processing, 2600–2603, 2008.

45. Xu, H., Xu, J., Wu, F. On the biased estimation of Nonlocal Means filter. In IEEE International Conference on Multimedia and Expo (San Diego, CA, October 12–15, 2008), 1149–1152.

46. Yaroslavsky, L.P. Digital Picture Processing. Springer, New York, 1985.

47. Yoshizawa, S., Belyaev, A., Seidel, H. Smoothing by example: Mesh denoising by averaging with similarity-based weights. In IEEE International Conference on Shape Modeling and Applications (Matsushima, Japan, June 14–16, 2006), 9–9.

48. Zhang, L., Dong, W., Zhang, D., Shi, G. Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recognit. 43, 4 (2010), 1531–1549.

Back to Top


Antoni Buades (, Université Paris Descartes, 45, rue Saints Pères, Paris, France.

Bartomeu Coll (, Dpt Matematiques Informatica, Universitat Illes Balears, Ctra Valldemossa km 7.5, Palma de Mallorca, Spain.

Jean-Michel Morel (, CMLA , ENS Cachan, 61 av. Président Wilson, Cachan 94235, France.

Back to Top


The original version of this paper is entitled "A Review of Image Denoising Algorithms, with a New One." It was published in Multiscale Modeling and Simulation, 2005.


Back to Top


F1Figure 1. Magnitude of the tangent diffusion (continuous line) and normal diffusion (dashed line - -) of Theorem 4.

F2Figure 2. q1 and q2 have a large weight in NL-means because their similarity windows are similar to that of p, while the weight w(p, q3) is much smaller.

F3Figure 3. Image method noise. From left to right and from top to bottom: original image, Gaussian convolution, anisotropic filtering, total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and the NL-means algorithm. The parameters have been set for each method to remove a method noise with variance σ2 = 2.52.

F4Figure 4. On the right-hand side of each pair, we display the weight distribution used to estimate the central pixel of the left image by the NL-means algorithm. (a) In flat zones, the weights are distributed as a convolution filter (as a Gaussian convolution). (b) On straight edges, the weights are distributed in the direction of the edge (like for AF). (c) On curved edges, the weights favor pixels belonging to the same contour or level line, which is a strong improvement with respect to AF. (d) In a flat neighborhood, the weights are distributed in a gray-level neighborhood (like for a neighborhood filter). In the cases of (e) and (f), the weights are distributed across the more similar configurations, even though they are far away from the observed pixel. This shows a behavior similar to a nonlocal neighborhood filter or to an ideal Wiener filter.

F5Figure 5. NL-means denoising experiment with a Brodatz texture image. Left: Noisy image with standard deviation 30. Right: NL-means restored image. The Fourier transform of the noisy and restored images show how main features are preserved even at high frequencies.

F6Figure 6. Denoising experience on a periodic image. From left to right and from top to bottom: noisy image (standard deviation 35), total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filtering, and NL-means.

F7Figure 7. Denoising experience on a natural image. From left to right and from top to bottom: noisy image (standard deviation σ = 20), Gaussian convolution (h = 1.8), anisotropic filter (h = 2.4), total variation (λ = 0.04), the Yaroslavsky neighborhood filter (ρ = 7, h = 28), and the NL-means algorithm. Parameters have been set for each algorithm so that the removed quadratic energy is equal to the energy of the added noise.

F8Figure 8. Denoising experience on a natural image. From left to right and from top to bottom: noisy image (standard deviation 35), total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and NL-means.

F9Figure 9. The noise-to-noise criterion. From left to right and from top to bottom: original noise image of standard deviation 20, Gaussian convolution, anisotropic filtering, total variation, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and NL-means. Parameters have been fixed for each method so that the noise standard deviation is reduced by a factor 4.

F10Figure 10. Weight distribution of NL-means applied to a movie. In (a), (b), and (c) the first row shows a five frames image sequence. In the second row, the weight distribution used to estimate the central pixel (in white) of the middle frame is shown. The weights are equally distributed over the successive frames, including the current one. They actually involve all the candidates for the motion estimation instead of picking just one per frame. The aperture problem can be taken advantage of for a better denoising performance by involving more pixels in the average.

Back to Top


T1Table 1. Mean square error table.

Back to top

©2011 ACM  0001-0782/11/0500  $10.00

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.