Proximity Operator of x↦λ∥Mx∥2

Given a closed, convex and proper function $f : R^{n} \to R$ , its proximity operator $prox_{f} : R^{n} \to R^{n}$ is defined as

prox_{f} (y) = argmin_{x \in R^{n}} {f (x) + \frac{1}{2} ∥ x - y ∥_{2}^{2}} .

The scaled Euclidean norm $λ ∥ \cdot ∥_{2}$ with $λ > 0$ has a closed-form proximity operator given by

prox_{λ ∥ \cdot ∥_{2}} (y) = (1 - \frac{λ}{max { ∥ y ∥ _{2} , λ }}) y .

This can be derived using the Moreau identity or by applying optimality conditions directly.

We now consider a generalization of this result. Suppose $M \in R^{m \times n}$ is any real-valued matrix. There is a closed-form expression for the proximity operator of $x \mapsto λ ∥ M x ∥_{2}$ , provided we interpret “closed-form” liberally to allow the solution of a one-dimensional secular equation. Stack Exchange user River Li posted a similar formula for diagonal $M$ in 2020, but the result extends to arbitrary matrices $M$ .

Theorem: Proximity Operator of

x \mapsto λ ∥ M x ∥_{2}

Let $M \in R^{m \times n}$ and $λ > 0$ . Denote the positive singular values of $M$ by $σ_{1} \geq \dots \geq σ_{r} > 0$ and the corresponding right-singular vectors by $v_{1}, \dots, v_{r}$ . Then

prox_{λ ∥ M \cdot ∥_{2}} (y) = {Π_{ker (M)} (y) (I + \frac{λ ^{2}}{η} M^{⊤} M)^{- 1} y if ∥ (M^{+})^{⊤} y ∥_{2} \leq λ otherwise, (1)

where $Π_{ker (M)}$ is the projection onto the kernel (also known as the null space) of $M$ , $M^{+}$ denotes the Moore–Penrose pseudoinverse of $M$ , and $η > 0$ is the unique positive solution to the equation

i = 1 \sum r (\frac{λ ^{2} σ _{i}^{2}}{( η + λ ^{2} σ _{i}^{2} ) ^{2}}) (v_{i}^{⊤} y)^{2} = 1 . (2)

Before demonstrating this result, we make several observations about its structure. The norm threshold condition can be equivalently written as $∥ Σ_{1}^{- 1} V_{1}^{⊤} y ∥_{2} \leq λ$ , where $Σ_{1} = diag (σ_{1}, \dots, σ_{r})$ and $V_{1} = [v_{1} \dots v_{r}] \in R^{n \times r}$ , using the fact that $∥ \cdot ∥_{2}$ is invariant under orthogonal transformation. Equation (2) generally yields a polynomial equation in $η$ of degree $2 r$ , which means it does not admit a closed-form solution for $r > 2$ . When $∥ (M^{+})^{⊤} y ∥_{2} > λ$ , however, equation (2) has a unique positive solution. The expression on the left-hand side is strictly decreasing for $η \geq 0$ , taking the value $\frac{1}{λ} (M^{+})^{⊤} y_{2}^{2} - 1 > 0$ at $η = 0$ and decreasing to zero as $η \to \infty$ . These monotonicity properties guarantee the existence and uniqueness of the solution.

Proof

We provide a proof using the SVD. Our goal is to compute $x^{⋆} = prox_{λ ∥ M \cdot ∥_{2}} (y)$ , the optimal solution to the problem

minimize λ ∥ M x ∥_{2} + \frac{1}{2} ∥ x - y ∥_{2}^{2}

with variable $x \in R^{n}$ .

Subspace decomposition. The key insight is that this problem decomposes naturally with respect to the fundamental subspaces of $M$ . Let $M = U Σ V^{⊤}$ be the SVD of $M$ , with rank $r$ . We write

V = [V_{1} V_{2}] and Σ = [Σ_{1} 0 00],

where $V_{1} = [v_{1}, \dots, v_{r}] \in R^{n \times r}$ spans the image of $M^{⊤}$ , $V_{2} \in R^{n \times (n - r)}$ spans the kernel of $M$ , and $Σ_{1} = diag (σ_{1}, \dots, σ_{r}) \in R^{r \times r}$ . We decompose $R^{n} = im (M^{⊤}) \oplus ker (M)$ , and we correspondingly decompose $x$ and $y$ as $x = x_{R} + x_{K}$ and $y = y_{R} + y_{K}$ , where the subscripts $R$ and $K$ denote the components in $im (M^{⊤})$ and $ker (M)$ , respectively. Note that $y_{K} = Π_{ker (M)} (y)$ .

The objective function decouples because $M x_{K} = 0$ and the components are orthogonal, giving

λ ∥ M x ∥_{2} + \frac{1}{2} ∥ x - y ∥_{2}^{2} = (λ ∥ M x_{R} ∥_{2} + \frac{1}{2} ∥ x_{R} - y_{R} ∥_{2}^{2}) + \frac{1}{2} ∥ x_{K} - y_{K} ∥_{2}^{2} .

Minimizing over $x_{K}$ immediately yields $x_{K}^{⋆} = y_{K}$ .

We now focus on minimizing over $x_{R} \in im (M^{⊤})$ . We parameterize $x_{R} = V_{1} a$ and $y_{R} = V_{1} b$ , where $a, b \in R^{r}$ . Note that $b = V_{1}^{⊤} y$ , so $b_{i} = v_{i}^{⊤} y$ . We have $∥ M x_{R} ∥_{2} = ∥ Σ_{1} a ∥_{2}$ and $∥ x_{R} - y_{R} ∥_{2} = ∥ a - b ∥_{2}$ . The problem reduces to

minimize λ ∥ Σ_{1} a ∥_{2} + \frac{1}{2} ∥ a - b ∥_{2}^{2}

with variable $a \in R^{r}$ .

Optimality condition. The optimality condition for the minimizer $a^{⋆}$ can be written as $b - a^{⋆} \in λ \partial f (a^{⋆})$ , where $f (a) = ∥ Σ_{1} a ∥_{2}$ . Since $Σ_{1}$ is invertible, we can write the subdifferential as $\partial f (a^{⋆}) = Σ_{1} \partial g (Σ_{1} a^{⋆})$ , where $g (z) = ∥ z ∥_{2}$ . Thus, we require some $s \in \partial g (Σ_{1} a^{⋆})$ such that

b - a^{⋆} = λ Σ_{1} s .

The structure of the solution depends on whether the optimal point $a^{⋆}$ is zero or nonzero.

The case $a^{⋆} = 0$ . Suppose that $a^{⋆} = 0$ . Then the optimality condition requires $b = λ Σ_{1} s$ with $s \in \partial g (0)$ , which means $∥ s ∥_{2} \leq 1$ . This implies $s = \frac{1}{λ} Σ_{1}^{- 1} b$ , so the condition becomes $∥ Σ_{1}^{- 1} b ∥_{2} \leq λ$ . To get the condition provided in equation (1), we note the identity

∥ Σ_{1}^{- 1} b ∥_{2}^{2} = i = 1 \sum r \frac{b _{i}^{2}}{σ _{i}^{2}} = i = 1 \sum r \frac{( v _{i}^{⊤} y ) ^{2}}{σ _{i}^{2}} = ∥ (M^{+})^{⊤} y ∥_{2}^{2} .

Thus, $a^{⋆} = 0$ if and only if $∥ (M^{+})^{⊤} y ∥_{2} \leq λ$ . In this case, $x_{R}^{⋆} = 0$ , and the solution is $x^{⋆} = x_{K}^{⋆} = y_{K} = Π_{ker (M)} (y)$ .

The case $a^{⋆} \neq = 0$ . Suppose now that $a^{⋆} \neq = 0$ . Since $Σ_{1}$ is positive definite, we have $Σ_{1} a^{⋆} \neq = 0$ . The subdifferential of $g$ at a nonzero point reduces to a singleton, so $s$ is unique and given by $s = Σ_{1} a^{⋆} /∥ Σ_{1} a^{⋆} ∥_{2}$ . Let $γ = ∥ Σ_{1} a^{⋆} ∥_{2} > 0$ . Substituting $s = Σ_{1} a^{⋆} / γ$ and rearranging, the optimality condition becomes

b = (I + \frac{λ}{γ} Σ_{1}^{2}) a^{⋆} .

Solving for $a^{⋆}$ yields

a^{⋆} = (I + \frac{λ}{γ} Σ_{1}^{2})^{- 1} b .

We determine $γ$ using its definition,

γ^{2} = ∥ Σ_{1} a^{⋆} ∥_{2}^{2} = i = 1 \sum r (\frac{σ _{i} b _{i}}{1 + \frac{λ}{γ} σ _{i}^{2}})^{2} = i = 1 \sum r (\frac{γ σ _{i} b _{i}}{γ + λ σ _{i}^{2}})^{2} .

Dividing by $γ^{2}$ (since $γ > 0$ ) gives

k (γ) := i = 1 \sum r \frac{σ _{i}^{2} b _{i}^{2}}{( γ + λ σ _{i}^{2} ) ^{2}} = 1.

The function $k (γ)$ is strictly decreasing for $γ \geq 0$ , with $lim_{γ \to \infty} k (γ) = 0$ . The value at $γ = 0$ is

k (0) = \frac{1}{λ ^{2}} i = 1 \sum r \frac{b _{i}^{2}}{σ _{i}^{2}} = \frac{1}{λ ^{2}} ∥ (M^{+})^{⊤} y ∥_{2}^{2} .

A unique positive solution $γ > 0$ exists if and only if $k (0) > 1$ , which is equivalent to $∥ (M^{+})^{⊤} y ∥_{2} > λ$ . In this case, we define $η = λγ$ . Since $γ > 0$ , we have $η > 0$ . Substituting $γ = η / λ$ into the equation $k (γ) = 1$ yields

1 = i = 1 \sum r \frac{σ _{i}^{2} ( v _{i}^{⊤} y ) ^{2}}{( η / λ + λ σ _{i}^{2} ) ^{2}} = i = 1 \sum r \frac{λ ^{2} σ _{i}^{2} ( v _{i}^{⊤} y ) ^{2}}{( η + λ ^{2} σ _{i}^{2} ) ^{2}} .

This confirms equation (2), and $η$ is the unique positive solution.

We verify the form of $x^{⋆}$ . Let $α = λ / γ = λ^{2} / η$ . We want to show $x^{⋆} = (I + α M^{⊤} M)^{- 1} y$ . Using the SVD representation, we have

(I + α M^{⊤} M)^{- 1} = V (I + α Σ^{⊤} Σ)^{- 1} V^{⊤} .

The matrix $D_{α} = I + α Σ^{⊤} Σ$ is diagonal, with diagonal entries $1 + α σ_{i}^{2}$ for $i \leq r$ and $1$ for $i > r$ , giving

(I + α M^{⊤} M)^{- 1} y = V D_{α}^{- 1} V^{⊤} y .

Decomposing $V^{⊤} y$ into its first $r$ components $b$ and the remaining components, we obtain

(I + α M^{⊤} M)^{- 1} y = x_{R}^{⋆} i = 1 \sum r \frac{v _{i}^{⊤} y}{1 + α σ _{i}^{2}} v_{i} + x_{K}^{⋆} i = r + 1 \sum n (v_{i}^{⊤} y) v_{i} .

The first term is $V_{1} a^{⋆}$ since $a_{i}^{⋆} = b_{i} / (1 + α σ_{i}^{2})$ . The second term is $y_{K}$ . Thus, $x^{⋆} = x_{R}^{⋆} + x_{K}^{⋆} = (I + \frac{λ ^{2}}{η} M^{⊤} M)^{- 1} y$ .

This completes the proof of the proposition.