Dictionary learning with low mutual coherence constraint
Introduction
Sparsity has been a key concept in a wide range of signal processing and machine learning problems over the last decade [1]. In particular, sparse signal approximation has been extensively utilized in a variety of applications, including image enhancement [2], [1]. To be more precise, let denote the target signal, and be a number of N atoms collected as the columns of a so-called dictionary . For more flexibility, the dictionary is usually chosen to be overcomplete, i.e., . Then, the approximation of over is written as , where is called the sparse representation vector, with most of its entries being zero. The sparse approximation problem is to find the sparsest . To this end, the following problem has to be solved:where , the so-called (pseudo) norm, returns the number of non-zero entries, and is an error tolerance. Various sparse recovery algorithms have been proposed, which are summarized in [3].
Dictionary has an important role in sparse approximation problems. There exist some predefined choices for the dictionary, including discrete cosine transform (DCT) for natural images and Gabor dictionaries for speech signals [4]. Nevertheless, it has been shown that learned dictionaries optimized over a set of training signals outperform predefined ones in many applications [1], [4], [5], [6], [7]. This process is known as dictionary learning (DL) [4]. In DL, given a training data matrix , a dictionary is learned from , in such a way that it provides sparse enough representations for ’s. This task can be formulated as the following problemwhere is the Frobenius norm, and and are admissible sets of and , respectively. is usually defined as , which will be also considered in this paper. Furthermore, the set constrains to have sparse columns.
Numerous DL algorithms have been proposed [8], which follow an alternating minimization approach to solve the DL problem in (2). That is, starting with an initial estimate for the dictionary, most DL solvers alternate between the following two stages:
- 1.
Sparse approximation (SA): The training signals are sparsely approximated over the current estimate of .
- 2.
Dictionary update (DU): The dictionary is updated by minimizing the approximation error of the SA stage.
To ensure computational tractability and successful performance of sparse approximation algorithms, the dictionary must satisfy certain properties. In fact, substantial efforts in investigating the theoretical aspect of the sparse approximation problem have revealed that the uniqueness and stability of sparse approximation are directly related to the dictionary [9], [10]. Mutual coherence (MC) [11] and restricted isometry property (RIP) [9] are two fundamental tools for evaluating the goodness of a dictionary. The MC for a dictionary with normalized columns is defined as
MC simply measures the similarity between distinct atoms of a dictionary. For dictionaries, the MC is lower bounded viawhich is known as the Welch bound [12]. In addition, is said to satisfy RIP of order s with constant if for any s-sparse signal we have [13]
The existing RIP-based performance guarantees for sparse approximation algorithms show that smaller values for are favorable. Intuitively, according to (5), when the dictionary behaves as an orthonormal basis and, so, the stability and uniqueness of the approximation are better satisfied. Evaluating the RIP for a dictionary, however, is an NP-hard problem [14]. Nevertheless, it can be roughly verified using MC. In fact, it has been shown that RIP and MC for a dictionary are linked via [13]. Furthermore, it can be shown that [1]. Consequently, MC as a practical measure of performance has received much attention [15], [16], [17].
Besides the sole sparse approximation problem, recent work indicates that MC and RIP play effective roles in theoretical investigations of the DL problem as a whole. For instance, it is argued in [18], [19] that a ground truth dictionary would be a local minimum to the DL cost function with an norm as the sparsity measure, provided that it is sufficiently incoherent along with some other assumptions on the problem parameters. In this case, the dictionary is said to be locally identifiable [18]. Moreover, Wu and Yu [19] showed recently that, under some conditions, the local identifiability is possible with sparsity level s to the order for a complete dictionary () with the MC value of . Also, in [20], RIP has been used as a determining assumption in providing local linear convergence for the alternating minimization approach used to solve the DL problem.
The importance of MC, explained by the above discussions, has inspired some work to propose algorithms for learning low-MC dictionaries. These work can be classified into two main groups: regularized and constrained approaches. The first group [21], [22], [23] targets the following problem to update the dictionarywhere is a regularization parameter, and
The use of as an incoherence promoting term is motivated by the following relationwhere the first term is responsible for minimizing the average squared inner-products between distinct atoms, and the last term encourages the atoms to have unit norms.
The second group of methods aims at solving the following constrained problemin which, is a fixed target MC level. Incoherent K-SVD (INK-SVD) [24] and iterative projection-rotation DL (IPR-DL) [25] are sample algorithms of this group. INK-SVD solves (9) by first finding the minimizer (denoted by ) ignoring the MC constraint and, then, solving the following matrix nearness problem to reach the desired mutual coherence:
To solve (10), the authors of [24] proposed a decorrelation step in which sub-dictionaries of highly correlated atoms are iteratively identified and pairs of atoms are decorrelated until the desired MC, namely , is reached. A disadvantage of this approach, as pointed out in [25], is that, the approximation error, i.e., , is not explicitly taken into account during the decorrelation process. To overcome this problem, Barchiesi et al. [25] proposed a novel decorrelation scheme, consisting of two steps. In the first step, called atoms decorrelation, an iterative projection algorithm is performed that ensures that the mutual coherence constraint is satisfied. This step is then followed by a dictionary rotation in which the dictionary is rotated to minimize the approximation error, without affecting its mutual coherence. This algorithm shows state-of-the-art performance, as confirmed by the simulations of [25].
In this paper, we propose new algorithms for learning low-coherence dictionaries. Our motivation is that the previous work cannot efficiently compromise between the representation ability of the learned dictionary and its MC. More precisely, as will be seen later in Section 2, the term used in (6) is a very rough approximation to MC. In fact, as its definition in (8) implies, using this term only the average squared inner-products between distinct atoms are penalized. Therefore, it is not particularly effective to minimize the maximum absolute inner-products between any two atoms, i.e., the MC. Furthermore, when , problem (6) reduces towhose solutions are unit-norm tight-frames (UNTFs) [26], [27] which are not guaranteed to have small enough MCs compared to what one would expect in this extreme scenario. In other words, a UNTF may have a large MC. So, one would need to further optimize it to achieve a small enough MC [28].
To address this issue, we introduce a new algorithm based on proximal methods [29] to solve the regularized (unconstrained) low-coherence DL problem that directly uses the MC definition in (3) instead of the approximate term . Our simulations reveal that the new algorithm is able to learn dictionaries with MCs far smaller than what the algorithms targeting problem (6) achieve. Additionally, it makes much better compromise between adapting the dictionary to the training set and minimizing the MC.
On the other hand, IPR-DL, as the state-of-the-art algorithm, does not solve (9) directly. Moreover, singular value decomposition (SVD) and eigenvalue decomposition (EVD) are used in its structure, which are expensive operations from computational load and memory usage points of view, especially in high-dimensional data settings. Our next algorithm solves the constrained low-coherence DL problem (9) directly, without resorting to a suboptimal approach as in INK-SVD and IPR-DL. In addition, the proposed algorithm does not use SVD or EVD. As will be seen in Section 5, compared with IPR-DL, our proposed unconstrained algorithm is able to efficiently minimize the approximation error while satisfying the target level of MC.
Our proposed regularized low-coherence DL algorithm has already been introduced in a conference paper [30]. However, the algorithm proposed in the current work is different from the one presented in [30]. More precisely, in [30] an atom-by-atom approach is taken that updates the atoms sequentially. However, in this work, we consider updating all the atoms at once. Moreover, here, we consider the constrained low-coherence DL problem, too, and in addition to presenting a new solver for the constrained case, we provide convergence guarantees for the dictionary update steps of both regularized and constrained problems based on previous studies.
The rest of the paper is presented in the following order. In Section 2, our new regularized low-coherence DL algorithm is introduced. The proposed constrained low-coherence DL algorithm is introduced in Section 3. Some discussions concerning the implementations of our new algorithms and a computational complexity analysis are given in Section 4. Simulation results will be reported in Section 5.
Throughout the paper, small and capital bold face characters are used for vector- and matrix-valued quantities, respectively. The -th entry of a matrix is denoted by , while designates its i-th column. The superscript T stands for matrix transposition. The identity matrix is denoted by . For a vector , its norm () is defined via . For a matrix , we denote its norm1 by and define it as . An norm-ball of radius r in and centered around the origin is defined as . The same notation is used for matrix norm-ball. For two vectors and in , their inner-product is represented as . The indicator function of a set is denoted by which takes a value of 0 when , and otherwise. The vectorization operator is denoted by , which converts its matrix argument to an equivalent column vector by stacking its columns on top of each other. The inverse vectorization operator is also denoted by . Moreover, Definition 1 [29] The Euclidean projection of a point to a non-empty set is defined as Definition 2 [29] The proximal mapping of a convex function is defined as
For , the proximal mapping is simply the projection onto [29]. That is,
Section snippets
Proposed regularized algorithm
In this section, a new regularized low-coherence DL algorithm, called RINC-DL, for regularized incoherent DL, is introduced which tries to minimize the general DL cost function augmented with the MC term. To start, notice the following equivalent definition of MC for unit-norm dictionaries
Using this formula, the regularized problem that we target is expressed as
Note that the regularization term does not depend on the coefficient matrix .
Proposed constrained algorithm
Our next algorithm, called CINC-DL, for constrained incoherent DL, aims at directly solving (9). As already discussed, this is in contrast to the approach of IPR-DL which first updates the dictionary neglecting the MC constraint and then iteratively optimizes the result to satisfy the MC constraint. By replacing with its equivalent definition in (13), problem (9) becomes
Our strategy for solving the above problem is the same as the one used in RINC-DL. To
Discussion
Similar to IPR-DL, CINC-DL is designed to be used in situations where a fixed upper-bound on is desired. On the other hand, RINC-DL targets applications in which the goal is to make a certain trade-off between minimizing the approximation error and . Compared with CINC-DL and IPR-DL, RINC-DL can be reduced to a plain DL, i.e., without the MC constraint, by simply setting . An advantage of CINC-DL (and also RINC-DL) over IPR-DL is that, by directly solving the constrained
Simulation results
The promising performance of low-coherence DL algorithms has already been verified in many applications, including face recognition, object classification, image denoising [23], functional magnetic resonance imaging (fMRI) [22], and sparse approximation of musical signals [24], [25]. In this section, we are going to compare the performance of our proposed algorithms with those of existing low-coherence DL algorithms, to see how well they can make a balance between reducing representation error
Conclusion and future work
In this paper, we addressed dictionary learning (DL) with mutual coherence constraint. In this regard, we proposed an unconstrained algorithm which targets the regularized DL problem in which the mutual coherence function is used as the regularizer. A new constrained algorithm was also proposed which solves the DL problem under a mutual coherence constraint. Both algorithms are based on penalty methods and proximal approaches. Our computational analysis revealed that, compared with the
CRediT authorship contribution statement
Mostafa Sadeghi: Conceptualization, Methodology, Software, Writing - original draft. Massoud Babaie-Zadeh: Supervision, Validation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This paper has been supported in part by Iran National Science Foundation (INSF) under contract No. 96000780, and by Research Deputy of Sharif University of Technology.
Mostafa Sadeghi received the B.Sc. degree from Ferdowsi University of Mashhad, Mashhad, Iran, in 2010, the M.Sc. degree from Sharif University of Technology, Tehran, Iran, in 2012, and the Ph.D. degree from the same university in 2018, all in electrical engineering. He is currently working as a postdoctoral researcher in the Perception team at Inria, Grenoble, France. His main research interests are latent variable generative models, unsupervised audio-visual speech enhancement, Bayesian
References (45)
- et al.
Sparsest solutions of underdetermined linear systems via -minimization for
Appl. Comput. Harmonic Anal.
(2009) - et al.
Grassmannian frames with applications to coding and communication
Appl. Comput. Harmonic Anal.
(2003) On the conditioning of random subdictionaries
Appl. Computat. Harmon. Anal.
(2008)- et al.
On the use of unitnorm tight frames to improve the average mse performance in compressive sensing applications
IEEE Signal Process. Lett.
(2012) Sparse and Redundant Representations
(2010)- et al.
Atomic decomposition by basis pursuit
SIAM Rev.
(2001) - et al.
Computational methods for sparse solution of linear inverse problems
Proc. IEEE
(2010) - et al.
Dictionaries for sparse representation modeling
Proc. IEEE
(2010) - et al.
MR image reconstruction from highly undersampled k-space data by dictionary learning
IEEE Trans. Med. Imaging
(2011) - S. Ravishankar, J.C. Ye, J.A. Fessler, Image reconstruction: From sparsity to data-adaptive methods and machine...
Efficient sum of outer products dictionary learning (soup-dil) and its application to inverse problems
IEEE Trans. Comput. Imaging
Dictionary Learning Algorithms and Applications
Near-optimal signal recovery from random projections: universal encoding strategies
IEEE Trans. Inf. Theory
Uncertainty principles and ideal atomic decomposition
IEEE Trans. Inf. Theory
A Mathematical Introduction to Compressive Sensing, Applied and Numerical Harmonic Analysis
The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing
IEEE Trans. Inf. Theory
Greed is good: algorithmic results for sparse approximation
IEEE Trans. Inf. Theory
On the exponential convergence of matching pursuit in quasi-incoherent dictionaries
IEEE Trans. Inf. Theory
Dictionary identificationúsparse matrix-factorization via -minimization
IEEE Trans. Inf. Theory
Learning dictionaries with bounded self-coherence
IEEE Signal Proc. Lett.
Cited by (0)
Mostafa Sadeghi received the B.Sc. degree from Ferdowsi University of Mashhad, Mashhad, Iran, in 2010, the M.Sc. degree from Sharif University of Technology, Tehran, Iran, in 2012, and the Ph.D. degree from the same university in 2018, all in electrical engineering. He is currently working as a postdoctoral researcher in the Perception team at Inria, Grenoble, France. His main research interests are latent variable generative models, unsupervised audio-visual speech enhancement, Bayesian inference and probabilistic machine learning, and local/global optimization.
Massoud Babaie-Zadeh received the B.S. degree in electrical engineering from Isfahan University of Technology, Isfahan, Iran in 1994, and the M.S degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1996, and the Ph.D degree in Signal Processing from Institute National Polytechnique of Grenoble (INPG), Grenoble, France, in 2002. He received the best Ph.D. thesis award of INPG for his Ph.D. dissertation. Since 2003, he has been a faculty member of the Electrical Engineering Department of Sharif University of Technology, Tehran, IRAN, in which, he is currently a full professor. His main research areas are Blind Source Separation (BSS) and Sparsity-aware Signal Processing.