## SPGL1 and TV minimization ?

Recently, I was using the SPGL1 toolbox to recover some “compressed sensed” images. As a reminder, SPGL1 implements the method described in “Probing the Pareto Frontier for basis pursuit solutions” of Michael P. Friedlander and Ewout van den Berg. It solves the Basis Pursuit DeNoise (or $BP_\sigma$) problem with a error power $\sigma >0$

$\min_{u\in\mathbb{R}^N} \|u\|_1\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,$

where $A\,\in\,\mathbb{R}^{m\times N}$ is the usual measurement matrix for a measurement vector $b\in \mathbb{R}^m$, and $\|u\|_{1}$ and $\|u\|_{1}$ are the $\ell_1$ and the $\ell_2$ norm of the vector $u\in\mathbb{R}^N$ respectively. In short, as shown by E. Candès, J. Romberg and T. Tao, if $A$ is well behaved, i.e. if it satisfies the so-called Restricted Isometry Property for any $2K$ sparse signals, then the solution of $BP_\sigma$ approximates (with a controlled error) a $K$ sparse (or compressible) signal $v$ such that $b = Av + n$, where $n$ is an additive noise vector with power $\|n\|_2 \leq \sigma$.

The reason of this post is the following : I’m wondering if SPGL1 could be “easily” transformed into a solver of the Basis Pursuit with the Total Variation (TV) norm. That is, the minimization problem $TV_{\sigma}$

$\min_{u\in\mathbb{R}^N} \|u\|_{TV}\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,$

where $\|u\|_{TV} = \|D u\|_1$ with $(D u)_j = (D_1 u)_j + i\,(D_2 u)_j$ is the $j$th component of the complex finite difference operator applied on the vectorized image $u$ of $N$ pixels (in a set of coordinates $x_1$ and $x_2$).  I have used here a “complexification” trick putting the finite differences $D_1$ and $D_2$ according to the directions $x_1$ and $x_2$ in the real part and the imaginary part respectively of the complex operator $D$. The TV norm of $u$ is then really the $\ell_1$ norm of $Du$.

This problem is particularly well designed for the reconstruction of compressed sensed images since most of them are very sparse in the “gradient basis” (see for instance some references about Compressed Sensing for MRI). Minimizing the TV norm, since performed in the spatial domain, is also sometimes more efficient than minimizing the $\ell_1$ norm is a particular sparsity basis (e.g. 2-D wavelets, curvelets, …).

Therefore, I would say that, as for the initial SPGL1 theoretical framework, it could be interesting to study the Pareto frontier related to $TV_\sigma$, even if the TV norm is now a quasi-norm, i.e.  $\|u\|_{TV} =0$ does not imply $u =0$ but $u_j = c$ for a certain constant $c \in \mathbb{R}$.

To explain better that point, let me first summarize the paper of Friedlander and van den Berg quoted above. They proposed to solve $BP_\sigma$ by solving a LASSO problem (or $LS_\tau$) regulated by a parameter $\tau>0$,

$\min_{u\in\mathbb{R}^N} \|Au - b\|_2\ \textrm{such that}\ \|u\|_1 \leq \tau.$

If I’m right, the key idea is that there exists a $\tau_\sigma$ such that $LS_{\tau_\sigma}$ is equivalent to $BP_\sigma$. The problem is thus to assess this point. SPGL1 finds $\tau_\sigma$ iteratively using the fact that all the $LS_\tau$ problems define a smooth and decreasing curve of $\tau$ (the Pareto curve) from the $\ell_2$ norm of the residual $r_\tau = b - Au_\tau$, where $u_\tau$ is the solution of $LS_{\tau}$. More precisely, the function

$\phi(\tau) \triangleq \|r_\tau\|_2$

is decreasing from $\phi(0) = \|b\|_2$ to a value $\tau_{\sigma}>0$ such that

$\phi(\tau_\sigma) = \sigma.$

Interestingly, the derivative $\phi'(\tau)$ exists on ${[}0, \tau{]}$ and it is simply equal to $-\|A^Ty_\tau\|_{\infty}$ with $y_\tau = r_\tau / \|r_\tau\|_2$.

As explained, on the point $\tau=\tau_\sigma$, the problem $LS_\tau$ provides the solution to $BP_\sigma$. But since both $\phi$ and $\phi'$ are known, a Newton method on this Pareto curve can then iteratively estimate $\tau_\sigma$ from the implicit equation $\phi(\tau_\sigma) = \sigma$. Practically, this is done by solving of an approximate $LS_\tau$ at each $\tau$ (and the convergence of the Newton method is still linear).

At the end, the whole approach is very efficient for solving high dimensional BPDN problems (such that BPDN for images) and the final computation cost is mainly due to the cost of the forward and transposed multiplication of the matrix/operator $A$ with vectors.

So what happens now if the $\ell_1$ norm is replaced by the TV norm in this process ? If we switch from $BP_\sigma$ to $TV_\sigma$ ? Is there a “SPGL1 way” to solve that ?

The function $\phi$ resulting from such a context would have now the initial point $\phi^2(0) = \|b\|^2_2 - (b^TA\bf 1)^2/\|A\bf 1\|^2_2$ (with $\bf 1$ the constant vector) since a zero TV norm means a constant $u = c \bf 1$ (the value of $\phi(0)$ arises just from the minimization on $c$). Notice that if $A$ is for instance a Gaussian measurement matrix, $\phi(0)$ will be very close to $\|b\|_2$ since the expectation value of the average of any row is zero.

For the rest, I’m unfortunately not sufficiently familiar with convex optimization theory to deduce what is $\phi'$ for the TV framework (hum. I should definitely study that).

However, for the $\ell_1$ case, $\phi(\tau)$ (i.e. $LS_\tau$) is computed approximately for each $\tau$. This approximation, which is also iterative, uses a special projection operator to guarantee that the current candidate solution in the iteration remains feasible, i.e. remains in the $\ell_1$ ball $\{u : \|u\|_1 \leq \tau\}$. As usual, this projection is accomplished through a Soft Thresholding procedure, i.e. as a solution of the problem

$\min_{u} \|w -u\|^2_2 + \lambda \|u\|_{1}$,

where $w$ is the point to project, and where $\lambda>0$ is set so that the projection is inside the $\ell_1$ ball above.

For the TV minimization case, the TV ball $\{u : \|u\|_{TV} \leq \tau\}$ defining the feasible set of the approximate LASSO procedure would possibly generate a projection operator equivalent to the one solving the problem

$\min_{u} \|w -u\|^2_2 + \lambda \|u\|_{TV}$.

This is somehow related to one of the lessons provided in the TwIST paper (“A new twIst: two-step iterative shrinkage/thresholding algorithms for image restoration”) of J. Bioucas-Dias and M. Figueiredo about the so-called Moreau function : There is a deep link between some iterative resolutions of a regularized BP problem using a given sparsity metric, e.g. the $\ell_1$ or the TV norm, and the canonic denoising method of this metric, i.e. when the measurement is the identity operator, giving Soft Thresholding or TV denoising respectively.

Thanks to the implementation of Antonin Chambolle (used also by TwIST), this last canonic TV minimization can be computed very quickly. Therefore, if needed, the required projection on the TV ball above can be also inserted in a potential “SPGL1 for TV sparsity problem”.

OK… I agree that all that is just a very rough intuition. There is a lot of points to clarify and to develop. However, if you know something about all that (or if you detect that I’m totally wrong), or if you just want to comment this idea, feel free to use the comment box below …

This entry was posted in Compressed Sensing. Bookmark the permalink.

### 23 Responses to SPGL1 and TV minimization ?

1. Dear Laurent,

As far as the Lagrangian formulation is concerned, you can replace the iterative thresholding algorithm proposed by so many researcher (including Figueiredo Nowak and Daubechies et al) by proximal iterations, where the soft thresholding is replaced by some inner iterations of Chambolle ROF algorithm (published in JMIV). For more information about proximal iterations (and convergence theorem), you should have a look at recent preprints of Patrick Combettes at Paris 6.

Intuitively, you just follow the surrogate functional derivation made by Daubechies in her CPAM paper with Defrise and Demol, but replace L1 norm of coefficient in orthobasis by TV norm, which leads to the resolution of the lagrangian ROF.

Another options is to make the change of variagle v=grad(u) so that the TV-L2 problem becomes a L1-L2 (with complex coefficients as you noticed), and solve with iterative thresholding for v. The issue is that during iterations, you need to impose the constraint rot(v)=0, which requires to solve the poisson equation at each iteration. Works great in practice (its 2 FFTs if you use periodic boundary conditions) !

Also, in the paper of Chambolle, there is a nice trick to solve for deblurring when the operator is a projection, so it will also work for CS if the matrix is a random projector !
It is a 1 line of code modification of Chambolle algorithm, very nice. At it’s like the fastest algorithm on earth to solve CS-TV 😉

2. jackdurden says:

Thanks a lot Gabriel for these important comments.

I see more clearly now the links with other works.
I’ll have a look to the references you propose.

3. BTW, I think Chambolle’s paper is difficult to find online, but you can get a preprint here

It is a must read for any one interested by TV minimization. There are two very cool tricks at the end, one to solve the L2 constraint optimization (the lambda of the lagrangian penalization is updated by a simple rule at each iteration) and the other to solve inverse problems with projectors.

The paper takes the example of deblurring, but it also works great for TV-inpainting or TV-CS with projection on random fourier subset ! Its is like a 5 lines of matlab code algorithm (once you have grad and div operator implemented somewhere), its converges very fast (its is a fixed point iteration with a speed of convergence).

4. Igor Carron says:

Gabriel,

Has this CS-TV algorithm been implemented and released somewhere ?

Igor.

5. I will update my sparsity toolbox asap. I will send you an email when the code is released.

6. jackdurden says:

Gabriel,

by CS-TV, you mean the regularized problem or the constrained $TV_{\sigma}$ above ?

Since initial Chambolle’s code is related to the regularized TV reconstruction. No ?

7. Igor Carron says:

Gabriel,

Thanks.

Igor.

8. Dear Jackdurden,

Chambolle’s algorithm solves the regularized TV, but at the end of the paper (section 4), he explains how to update at each step the regularization parameter so that the solution converges to the solution of TV_sigma. You just multiply lambda by sigma/error where error is the L2 error at current step.

Chambolle’s algorithm solve TV denoising (ROF), meaning A=Identity, but at the end of the paper (section 5), he explains how to extend it when A is an orthogonal projector, meaning A A*=Identity (the sensing vectors are orthogonal), which is the case for partial Fourier matrices or orthogonalised gaussian matrices or sub-sampled noiselets or …

The idea is to use Chambolle algorithm to denoise an initial image u (you can start by u=0), and at each step of the algorithm, project this image to be denoised on the measurement constraint A* u=y:
u <- u + A* ( y – A u ) (*)
and then you continue using sereral iteration of Chambolle’s algorithm, equation (9) of the paper.

So you’ve got two embbeded for loops (one for the update step (*) and then several iteration of Chambolle’s algorithm) but if you solve with sigma=0, you can just do 1 iteration of Chambolle’s algorithm at each step.

9. jackdurden says:

Thank you again for the nice explanations,

Best,
Laurent

10. I have put here some details about my own implementation of Chambolle’s algorithm

11. jackdurden says:

Sorry, some of your comments were referenced as spam by WordPress. I have untagged them and they appear in the list now. Thank you very much for this code. I’ll play with ASAP.

is very interesting !

Best
Laurent

12. Chambolle’s algo for CS is a neat idea. I have used it with an ortho projector as proposed by Chambolle to solve an inpainting problem and it’s indeed a single-line of code to modify and worked like a treat ! So it should work very well for CS too provided your measurement matrix is a subsampling of an ortho system. What if it is not exactly an ortho system but “close to” ?

Pierre

13. jackdurden says:

Pierre,

answer to your question “may” then come from a TV-SPGL1 using Chambolle’s ROF code as a part of the LASSO solver involved, as the soft-thresholding (i.e. the $\ell_1$ denoising technique) is currently used for the $\ell_1$ sparsity metric.

//Mais bon…. avec des “may” (ou des “si” ) on met Paris en bouteille ;-)//

14. jackdurden says:

Or another one is to generalize the update of the regularization parameter of A. Chambolle (that allows the convergence to the constraint problem), to the Iterative Soft Thresholding procedure of TwIST, where the measurement matrix can be a RIP matrix (not necessarily an orthoprojector).

Laurent

15. Dear Pierre and Laurent,

Actually yes (if I am not wrong), the algorithm generalizes to the case where A is not an orthogonal projector. You need to replace the orthogonal projection on the constraints Au=b

u <- u + A^+ ( b – A u )

(where A^+=A* is the pseudo inverse, equal to A* because we are dealing with an orthogonal projector), by a gradient descent step

u than the operator norm of A (same as in the paper of Daubechies et al, and Combettes proved that it can be extended to twice the operator norm).

Then you apply Chambolle’s algorithm with a regularization parameter lambda/tau instead of lambda.

This corresponds to proximal iterations, since Chambolle algorithm is actually computing the proximal operator associated to the TV norm.

Intuitively, as emphasized by Laurent, this corresponds to replacing the soft thresholding by the ROF resolution, which are prox operator of respectively the L1 norm in an ortho-basis and the TV norm.

I am not sure wether the update of lambda to match the L2 constraint is still garantied to work in this case.

16. Apparently I have some problems with my old school ASCII equations … let’s try in LaTeX …

$u \leftarrow A^+ ( b-Au )$

and the modified one

$u \leftarrow \frac{1}{\tau} A^* ( b-Au )$

where $\tau$ should be larger than the operator norm of A for the iterations to converge.

17. Ouch sorry, the iterations should be in fact

$u \leftarrow u + \frac{1}{\tau} A^* ( b-Au ).$

18. I see, that’s interesting since it generalizes the class of measurement matrices for which you can now apply the algo. Some testing is needed 🙂

19. Igor Carron says:

Laurent,

Surely you have noted that there is new version of SPGL1 (v. 1.6). It’ll be featured on Nuit Blanche on Monday.

Cheers,

Igor.

20. jackdurden says:

No. I didn’t notice that new version. Thank you. BTW, I sent a mail to the authors of spgl1 about our recent TV discussion above.

Laurent

21. I have put here