Dictionary learning with low mutual coherence constraint

doi:10.1016/j.neucom.2020.04.135

Neurocomputing

Volume 407, 24 September 2020, Pages 163-174

https://doi.org/10.1016/j.neucom.2020.04.135 Get rights and content

Abstract

This paper presents efficient algorithms for learning low-coherence dictionaries. First, a new algorithm based on proximal methods is proposed to solve the dictionary learning (DL) problem regularized with the mutual coherence of dictionary. This is unlike the previous approaches that solve a regularized problem where an approximate incoherence promoting term, instead of the mutual coherence, is used to encourage low-coherency. Then, a new solver is proposed for constrained low-coherence DL problem, i.e., a DL problem with an explicit constraint on the mutual coherence of the dictionary. As opposed to current methods, which follow a suboptimal two-step approach, the new algorithm directly solves the associated DL problem. Using previous studies, convergence of the new schemes to critical points of the associated cost functions is also provided. Furthermore, it is shown that the proposed algorithms have lower iteration complexity than existing algorithms. Our simulation results on learning low-coherence dictionaries for natural image patches as well as image classification based on discriminative over-complete dictionary learning demonstrate the superiority of the proposed algorithms compared with the state-of-the-art method.

Introduction

Sparsity has been a key concept in a wide range of signal processing and machine learning problems over the last decade [1]. In particular, sparse signal approximation has been extensively utilized in a variety of applications, including image enhancement [2], [1]. To be more precise, let $y \in R^{n}$ denote the target signal, and ${\{d_{i}\}}_{i = 1}^{N}$ be a number of N atoms collected as the columns of a so-called dictionary $D = [d_{1}, \dots, d_{N}]$ . For more flexibility, the dictionary is usually chosen to be overcomplete, i.e., $N > n$ . Then, the approximation of $y$ over $D$ is written as $y \approx \sum_{i = 1}^{N} x_{i} d_{i} = Dx$ , where $x \in R^{N}$ is called the sparse representation vector, with most of its entries being zero. The sparse approximation problem is to find the sparsest $x$ . To this end, the following problem has to be solved: $\min_{x} ‖ x ‖_{0} s . t . ‖ y - Dx ‖_{2} ⩽ ∊,$ where $‖ . ‖_{0}$ , the so-called $ℓ_{0}$ (pseudo) norm, returns the number of non-zero entries, and $∊ ⩾ 0$ is an error tolerance. Various sparse recovery algorithms have been proposed, which are summarized in [3].

Dictionary has an important role in sparse approximation problems. There exist some predefined choices for the dictionary, including discrete cosine transform (DCT) for natural images and Gabor dictionaries for speech signals [4]. Nevertheless, it has been shown that learned dictionaries optimized over a set of training signals outperform predefined ones in many applications [1], [4], [5], [6], [7]. This process is known as dictionary learning (DL) [4]. In DL, given a training data matrix $Y = [y_{1}, \dots, y_{M}]$ , a dictionary $D$ is learned from $Y$ , in such a way that it provides sparse enough representations for $y_{i}$ ’s. This task can be formulated as the following problem $\min_{D \in D, X \in X} \frac{1}{2} ‖ Y - DX ‖_{F}^{2},$ where $‖ . ‖_{F}$ is the Frobenius norm, and $D$ and $X$ are admissible sets of $D$ and $X$ , respectively. $D$ is usually defined as $D ≜ \{D \in R^{n \times N} | \forall i, ‖ d_{i} ‖_{2} = 1\}$ , which will be also considered in this paper. Furthermore, the set $X$ constrains $X$ to have sparse columns.

Numerous DL algorithms have been proposed [8], which follow an alternating minimization approach to solve the DL problem in (2). That is, starting with an initial estimate for the dictionary, most DL solvers alternate between the following two stages:

1.
Sparse approximation (SA): The training signals are sparsely approximated over the current estimate of $D$ .
2.
Dictionary update (DU): The dictionary is updated by minimizing the approximation error of the SA stage.

To ensure computational tractability and successful performance of sparse approximation algorithms, the dictionary must satisfy certain properties. In fact, substantial efforts in investigating the theoretical aspect of the sparse approximation problem have revealed that the uniqueness and stability of sparse approximation are directly related to the dictionary [9], [10]. Mutual coherence (MC) [11] and restricted isometry property (RIP) [9] are two fundamental tools for evaluating the goodness of a dictionary. The MC for a dictionary $D$ with normalized columns is defined as $μ (D) ≜ \max_{i \neq j} | 〈 d_{i}, d_{j} 〉 | \cdot$

MC simply measures the similarity between distinct atoms of a dictionary. For $n \times N$ dictionaries, the MC is lower bounded via $μ ⩾ \sqrt{\frac{N - n}{n (N - 1)}},$ which is known as the Welch bound [12]. In addition, $D$ is said to satisfy RIP of order s with constant $δ_{s}$ if for any s-sparse signal $x$ we have [13] $(1 - δ_{s}) ‖ x ‖_{2}^{2} ⩽ ‖ Dx ‖_{2}^{2} ⩽ (1 + δ_{s}) ‖ x ‖_{2}^{2} \cdot$

The existing RIP-based performance guarantees for sparse approximation algorithms show that smaller values for $δ_{s}$ are favorable. Intuitively, according to (5), when $δ_{s} \to 0$ the dictionary behaves as an orthonormal basis and, so, the stability and uniqueness of the approximation are better satisfied. Evaluating the RIP for a dictionary, however, is an NP-hard problem [14]. Nevertheless, it can be roughly verified using MC. In fact, it has been shown that RIP and MC for a dictionary are linked via $δ_{s} ⩽ μ (s - 1)$ [13]. Furthermore, it can be shown that $δ_{2} = μ$ [1]. Consequently, MC as a practical measure of performance has received much attention [15], [16], [17].

Besides the sole sparse approximation problem, recent work indicates that MC and RIP play effective roles in theoretical investigations of the DL problem as a whole. For instance, it is argued in [18], [19] that a ground truth dictionary would be a local minimum to the DL cost function with an $ℓ_{1}$ norm as the sparsity measure, provided that it is sufficiently incoherent along with some other assumptions on the problem parameters. In this case, the dictionary is said to be locally identifiable [18]. Moreover, Wu and Yu [19] showed recently that, under some conditions, the local identifiability is possible with sparsity level s to the order $O (μ^{- 2})$ for a complete dictionary ( $n = N$ ) with the MC value of $μ$ . Also, in [20], RIP has been used as a determining assumption in providing local linear convergence for the alternating minimization approach used to solve the DL problem.

The importance of MC, explained by the above discussions, has inspired some work to propose algorithms for learning low-MC dictionaries. These work can be classified into two main groups: regularized and constrained approaches. The first group [21], [22], [23] targets the following problem to update the dictionary $\min_{D \in D} \frac{1}{2} ‖ Y - DX ‖_{F}^{2} + λ R (D),$ where $λ \geq 0$ is a regularization parameter, and $R (D) ≜ ‖ D^{T} D - I ‖_{F}^{2} \cdot$

The use of $R$ as an incoherence promoting term is motivated by the following relation $R (D) = \sum_{i \neq j} | 〈 d_{i}, d_{j} 〉 |^{2} + \sum_{i} {(〈 d_{i}, d_{i} 〉 - 1)}^{2},$ where the first term is responsible for minimizing the average squared inner-products between distinct atoms, and the last term encourages the atoms to have unit norms.

The second group of methods aims at solving the following constrained problem $\min_{D \in D} \frac{1}{2} ‖ Y - DX ‖_{F}^{2} s . t . μ (D) ⩽ μ_{0},$ in which, $μ_{0} > 0$ is a fixed target MC level. Incoherent K-SVD (INK-SVD) [24] and iterative projection-rotation DL (IPR-DL) [25] are sample algorithms of this group. INK-SVD solves (9) by first finding the minimizer (denoted by $\bar{D}$ ) ignoring the MC constraint and, then, solving the following matrix nearness problem to reach the desired mutual coherence: $\min_{D} \frac{1}{2} ‖ D - \bar{D} ‖_{F}^{2} s . t . μ (D) ⩽ μ_{0} \cdot$

To solve (10), the authors of [24] proposed a decorrelation step in which sub-dictionaries of highly correlated atoms are iteratively identified and pairs of atoms are decorrelated until the desired MC, namely $μ_{0}$ , is reached. A disadvantage of this approach, as pointed out in [25], is that, the approximation error, i.e., $‖ Y - DX ‖_{F}$ , is not explicitly taken into account during the decorrelation process. To overcome this problem, Barchiesi et al. [25] proposed a novel decorrelation scheme, consisting of two steps. In the first step, called atoms decorrelation, an iterative projection algorithm is performed that ensures that the mutual coherence constraint is satisfied. This step is then followed by a dictionary rotation in which the dictionary is rotated to minimize the approximation error, without affecting its mutual coherence. This algorithm shows state-of-the-art performance, as confirmed by the simulations of [25].

In this paper, we propose new algorithms for learning low-coherence dictionaries. Our motivation is that the previous work cannot efficiently compromise between the representation ability of the learned dictionary and its MC. More precisely, as will be seen later in Section 2, the term $R$ used in (6) is a very rough approximation to MC. In fact, as its definition in (8) implies, using this term only the average squared inner-products between distinct atoms are penalized. Therefore, it is not particularly effective to minimize the maximum absolute inner-products between any two atoms, i.e., the MC. Furthermore, when $λ \to \infty$ , problem (6) reduces to $\min_{D \in D} R (D),$ whose solutions are unit-norm tight-frames (UNTFs) [26], [27] which are not guaranteed to have small enough MCs compared to what one would expect in this extreme scenario. In other words, a UNTF may have a large MC. So, one would need to further optimize it to achieve a small enough MC [28].

To address this issue, we introduce a new algorithm based on proximal methods [29] to solve the regularized (unconstrained) low-coherence DL problem that directly uses the MC definition in (3) instead of the approximate term $R$ . Our simulations reveal that the new algorithm is able to learn dictionaries with MCs far smaller than what the algorithms targeting problem (6) achieve. Additionally, it makes much better compromise between adapting the dictionary to the training set and minimizing the MC.

On the other hand, IPR-DL, as the state-of-the-art algorithm, does not solve (9) directly. Moreover, singular value decomposition (SVD) and eigenvalue decomposition (EVD) are used in its structure, which are expensive operations from computational load and memory usage points of view, especially in high-dimensional data settings. Our next algorithm solves the constrained low-coherence DL problem (9) directly, without resorting to a suboptimal approach as in INK-SVD and IPR-DL. In addition, the proposed algorithm does not use SVD or EVD. As will be seen in Section 5, compared with IPR-DL, our proposed unconstrained algorithm is able to efficiently minimize the approximation error while satisfying the target level of MC.

Our proposed regularized low-coherence DL algorithm has already been introduced in a conference paper [30]. However, the algorithm proposed in the current work is different from the one presented in [30]. More precisely, in [30] an atom-by-atom approach is taken that updates the atoms sequentially. However, in this work, we consider updating all the atoms at once. Moreover, here, we consider the constrained low-coherence DL problem, too, and in addition to presenting a new solver for the constrained case, we provide convergence guarantees for the dictionary update steps of both regularized and constrained problems based on previous studies.

The rest of the paper is presented in the following order. In Section 2, our new regularized low-coherence DL algorithm is introduced. The proposed constrained low-coherence DL algorithm is introduced in Section 3. Some discussions concerning the implementations of our new algorithms and a computational complexity analysis are given in Section 4. Simulation results will be reported in Section 5.

Throughout the paper, small and capital bold face characters are used for vector- and matrix-valued quantities, respectively. The $(i, j)$ -th entry of a matrix $X$ is denoted by $x_{ij}$ , while $x_{i}$ designates its i-th column. The superscript T stands for matrix transposition. The identity matrix is denoted by $I$ . For a vector $x$ , its $ℓ_{p}$ norm ( $p ⩾ 1$ ) is defined via $‖ x ‖_{p} ≜ {(\sum_{i} | x_{i} |^{p})}^{1 / p}$ . For a matrix $X$ , we denote its $ℓ_{\infty}$ norm¹ by $‖ X ‖_{\infty}$ and define it as $‖ X ‖_{\infty} ≜ \max_{i, j} | x_{ij} |$ . An $ℓ_{p}$ norm-ball of radius r in $R^{n}$ and centered around the origin is defined as $B_{p}^{r} ≜ \{x \in R^{n} | ‖ x ‖_{p} ⩽ r\}$ . The same notation is used for $ℓ_{p}$ matrix norm-ball. For two vectors $u$ and $v$ in $R^{n}$ , their inner-product is represented as $〈 u, v 〉 ≜ \sum_{i} u_{i} v_{i}$ . The indicator function of a set $S$ is denoted by $I_{S} (x)$ which takes a value of 0 when $x \in S$ , and $\infty$ otherwise. The vectorization operator is denoted by $vec (.)$ , which converts its matrix argument to an equivalent column vector by stacking its columns on top of each other. The inverse vectorization operator is also denoted by ${vec}^{- 1} (.)$ . Moreover,

Definition 1 [29]

The Euclidean projection of a point $x \in R^{n}$ to a non-empty set $S \subseteq R^{n}$ is defined as $P_{S} (x) ≜ \underset{u \in S}{argmin} \frac{1}{2} ‖ x - u ‖_{2}^{2} \cdot$

Definition 2

[29] The proximal mapping of a convex function $g : dom g ⟶ R$ is defined as ${prox}_{g} (x) ≜ \underset{u \in dom g}{argmin} \{\frac{1}{2} ‖ x - u ‖_{2}^{2} + g (u)\} \cdot$

For $g (x) = I_{S} (x)$ , the proximal mapping is simply the projection onto $S$ [29]. That is, ${prox}_{I_{S}} (x) = P_{S} (x) \cdot$

Section snippets

Proposed regularized algorithm

In this section, a new regularized low-coherence DL algorithm, called RINC-DL, for regularized incoherent DL, is introduced which tries to minimize the general DL cost function augmented with the MC term. To start, notice the following equivalent definition of MC for unit-norm dictionaries $μ (D) = ‖ D^{T} D - I ‖_{\infty} \cdot$

Using this formula, the regularized problem that we target is expressed as $\min_{D \in D, X \in X} \frac{1}{2} ‖ Y - DX ‖_{F}^{2} + λ ‖ D^{T} D - I ‖_{\infty} \cdot$

Note that the regularization term does not depend on the coefficient matrix $X$ .

Proposed constrained algorithm

Our next algorithm, called CINC-DL, for constrained incoherent DL, aims at directly solving (9). As already discussed, this is in contrast to the approach of IPR-DL which first updates the dictionary neglecting the MC constraint and then iteratively optimizes the result to satisfy the MC constraint. By replacing $μ (D)$ with its equivalent definition in (13), problem (9) becomes $\min_{D \in D} \frac{1}{2} ‖ Y - DX ‖_{F}^{2} s . t . ‖ D^{T} D - I ‖_{\infty} ⩽ μ_{0} \cdot$

Our strategy for solving the above problem is the same as the one used in RINC-DL. To

Discussion

Similar to IPR-DL, CINC-DL is designed to be used in situations where a fixed upper-bound on $μ (D)$ is desired. On the other hand, RINC-DL targets applications in which the goal is to make a certain trade-off between minimizing the approximation error and $μ (D)$ . Compared with CINC-DL and IPR-DL, RINC-DL can be reduced to a plain DL, i.e., without the MC constraint, by simply setting $λ = 0$ . An advantage of CINC-DL (and also RINC-DL) over IPR-DL is that, by directly solving the constrained

Simulation results

The promising performance of low-coherence DL algorithms has already been verified in many applications, including face recognition, object classification, image denoising [23], functional magnetic resonance imaging (fMRI) [22], and sparse approximation of musical signals [24], [25]. In this section, we are going to compare the performance of our proposed algorithms with those of existing low-coherence DL algorithms, to see how well they can make a balance between reducing representation error

Conclusion and future work

In this paper, we addressed dictionary learning (DL) with mutual coherence constraint. In this regard, we proposed an unconstrained algorithm which targets the regularized DL problem in which the mutual coherence function is used as the regularizer. A new constrained algorithm was also proposed which solves the DL problem under a mutual coherence constraint. Both algorithms are based on penalty methods and proximal approaches. Our computational analysis revealed that, compared with the

CRediT authorship contribution statement

Mostafa Sadeghi: Conceptualization, Methodology, Software, Writing - original draft. Massoud Babaie-Zadeh: Supervision, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This paper has been supported in part by Iran National Science Foundation (INSF) under contract No. 96000780, and by Research Deputy of Sharif University of Technology.

Mostafa Sadeghi received the B.Sc. degree from Ferdowsi University of Mashhad, Mashhad, Iran, in 2010, the M.Sc. degree from Sharif University of Technology, Tehran, Iran, in 2012, and the Ph.D. degree from the same university in 2018, all in electrical engineering. He is currently working as a postdoctoral researcher in the Perception team at Inria, Grenoble, France. His main research interests are latent variable generative models, unsupervised audio-visual speech enhancement, Bayesian

References (45)

S. Foucart et al.
Sparsest solutions of underdetermined linear systems via $ℓ_{q}$ -minimization for $0 < q ⩽ 1$
Appl. Comput. Harmonic Anal.
(2009)
T. Strohmer et al.
Grassmannian frames with applications to coding and communication
Appl. Comput. Harmonic Anal.
(2003)
J.A. Tropp
On the conditioning of random subdictionaries
Appl. Computat. Harmon. Anal.
(2008)
W. Chen et al.
On the use of unitnorm tight frames to improve the average mse performance in compressive sensing applications
IEEE Signal Process. Lett.
(2012)
M. Elad
Sparse and Redundant Representations
(2010)
S.S. Chen et al.
Atomic decomposition by basis pursuit
SIAM Rev.
(2001)
J.A. Tropp et al.
Computational methods for sparse solution of linear inverse problems
Proc. IEEE
(2010)
R. Rubinstein et al.
Dictionaries for sparse representation modeling
Proc. IEEE
(2010)
S. Ravishankar et al.
MR image reconstruction from highly undersampled k-space data by dictionary learning
IEEE Trans. Med. Imaging
(2011)
S. Ravishankar, J.C. Ye, J.A. Fessler, Image reconstruction: From sparsity to data-adaptive methods and machine...

S. Ravishankar et al.

Efficient sum of outer products dictionary learning (soup-dil) and its application to inverse problems

IEEE Trans. Comput. Imaging

(2017)

B. Dumitrescu et al.

Dictionary Learning Algorithms and Applications

(2018)

E.J. Candès et al.

Near-optimal signal recovery from random projections: universal encoding strategies

IEEE Trans. Inf. Theory

(2006)

D.L. Donoho et al.

Uncertainty principles and ideal atomic decomposition

IEEE Trans. Inf. Theory

(2001)

S. Foucart et al.

A Mathematical Introduction to Compressive Sensing, Applied and Numerical Harmonic Analysis

(2013)

A.M. Tillmann et al.

The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing

IEEE Trans. Inf. Theory

(2014)

J.A. Tropp

Greed is good: algorithmic results for sparse approximation

IEEE Trans. Inf. Theory

(2004)

R. Gribonval et al.

On the exponential convergence of matching pursuit in quasi-incoherent dictionaries

IEEE Trans. Inf. Theory

(2006)

R. Gribonval et al.

Dictionary identificationúsparse matrix-factorization via $ℓ_{1}$ -minimization

IEEE Trans. Inf. Theory

(2010)

S. Wu, B. Yu, Local identifiability of l1-minimization dictionary learning: a sufficient and almost necessary...

A. Agarwal, A. Anandkumar, P. Jain, P. Netrapalli, R. Tandon, Learning sparsely used overcomplete dictionaries via...

C.D. Sigg et al.

Learning dictionaries with bounded self-coherence

IEEE Signal Proc. Lett.

(2012)

Cited by (0)

Massoud Babaie-Zadeh received the B.S. degree in electrical engineering from Isfahan University of Technology, Isfahan, Iran in 1994, and the M.S degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1996, and the Ph.D degree in Signal Processing from Institute National Polytechnique of Grenoble (INPG), Grenoble, France, in 2002. He received the best Ph.D. thesis award of INPG for his Ph.D. dissertation. Since 2003, he has been a faculty member of the Electrical Engineering Department of Sharif University of Technology, Tehran, IRAN, in which, he is currently a full professor. His main research areas are Blind Source Separation (BSS) and Sparsity-aware Signal Processing.

View full text

Dictionary learning with low mutual coherence constraint

Abstract

Introduction

Section snippets

Proposed regularized algorithm

Proposed constrained algorithm

Discussion

Simulation results

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Appl. Comput. Harmonic Anal.

Appl. Comput. Harmonic Anal.

Appl. Computat. Harmon. Anal.

IEEE Signal Process. Lett.

Sparse and Redundant Representations

Atomic decomposition by basis pursuit

SIAM Rev.

Computational methods for sparse solution of linear inverse problems

Proc. IEEE

Dictionaries for sparse representation modeling

Proc. IEEE

MR image reconstruction from highly undersampled k-space data by dictionary learning

IEEE Trans. Med. Imaging

Efficient sum of outer products dictionary learning (soup-dil) and its application to inverse problems

IEEE Trans. Comput. Imaging

Dictionary Learning Algorithms and Applications

Near-optimal signal recovery from random projections: universal encoding strategies

IEEE Trans. Inf. Theory

Uncertainty principles and ideal atomic decomposition

IEEE Trans. Inf. Theory

A Mathematical Introduction to Compressive Sensing, Applied and Numerical Harmonic Analysis

The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing

IEEE Trans. Inf. Theory

Greed is good: algorithmic results for sparse approximation

IEEE Trans. Inf. Theory

On the exponential convergence of matching pursuit in quasi-incoherent dictionaries

IEEE Trans. Inf. Theory

Dictionary identificationúsparse matrix-factorization via ℓ1 -minimization

IEEE Trans. Inf. Theory

Learning dictionaries with bounded self-coherence

IEEE Signal Proc. Lett.

Dictionary identificationúsparse matrix-factorization via $ℓ_{1}$ -minimization