SPGL1 and TV minimization ?

Recently, I was using the SPGL1 toolbox to recover some “compressed sensed” images. As a reminder, SPGL1 implements the method described in “Probing the Pareto Frontier for basis pursuit solutions” of Michael P. Friedlander and Ewout van den Berg. It solves the Basis Pursuit DeNoise (or BP_\sigma) problem with a error power \sigma >0

\min_{u\in\mathbb{R}^N} \|u\|_1\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,

where A\,\in\,\mathbb{R}^{m\times N} is the usual measurement matrix for a measurement vector b\in \mathbb{R}^m, and \|u\|_{1} and \|u\|_{1} are the \ell_1 and the \ell_2 norm of the vector u\in\mathbb{R}^N respectively. In short, as shown by E. Candès, J. Romberg and T. Tao, if A is well behaved, i.e. if it satisfies the so-called Restricted Isometry Property for any 2K sparse signals, then the solution of BP_\sigma approximates (with a controlled error) a K sparse (or compressible) signal v such that b = Av + n, where n is an additive noise vector with power \|n\|_2 \leq \sigma.

The reason of this post is the following : I’m wondering if SPGL1 could be “easily” transformed into a solver of the Basis Pursuit with the Total Variation (TV) norm. That is, the minimization problem TV_{\sigma}

\min_{u\in\mathbb{R}^N} \|u\|_{TV}\ \textrm{such that}\ \|Au - b\|_2 \leq \sigma,

where \|u\|_{TV} = \|D u\|_1 with (D u)_j = (D_1 u)_j + i\,(D_2 u)_j is the jth component of the complex finite difference operator applied on the vectorized image u of N pixels (in a set of coordinates x_1 and x_2).  I have used here a “complexification” trick putting the finite differences D_1 and D_2 according to the directions x_1 and x_2 in the real part and the imaginary part respectively of the complex operator D. The TV norm of u is then really the \ell_1 norm of Du.

This problem is particularly well designed for the reconstruction of compressed sensed images since most of them are very sparse in the “gradient basis” (see for instance some references about Compressed Sensing for MRI). Minimizing the TV norm, since performed in the spatial domain, is also sometimes more efficient than minimizing the \ell_1 norm is a particular sparsity basis (e.g. 2-D wavelets, curvelets, …).

Therefore, I would say that, as for the initial SPGL1 theoretical framework, it could be interesting to study the Pareto frontier related to TV_\sigma, even if the TV norm is now a quasi-norm, i.e.  \|u\|_{TV} =0 does not imply u =0 but u_j = c for a certain constant c \in \mathbb{R}.

To explain better that point, let me first summarize the paper of Friedlander and van den Berg quoted above. They proposed to solve BP_\sigma by solving a LASSO problem (or LS_\tau) regulated by a parameter \tau>0,

\min_{u\in\mathbb{R}^N} \|Au - b\|_2\ \textrm{such that}\ \|u\|_1 \leq \tau.

If I’m right, the key idea is that there exists a \tau_\sigma such that LS_{\tau_\sigma} is equivalent to BP_\sigma. The problem is thus to assess this point. SPGL1 finds \tau_\sigma iteratively using the fact that all the LS_\tau problems define a smooth and decreasing curve of \tau (the Pareto curve) from the \ell_2 norm of the residual r_\tau = b - Au_\tau, where u_\tau is the solution of LS_{\tau}. More precisely, the function

\phi(\tau) \triangleq \|r_\tau\|_2

is decreasing from \phi(0) = \|b\|_2 to a value \tau_{\sigma}>0 such that

\phi(\tau_\sigma) = \sigma.

Interestingly, the derivative \phi'(\tau) exists on {[}0, \tau{]} and it is simply equal to -\|A^Ty_\tau\|_{\infty} with y_\tau = r_\tau / \|r_\tau\|_2.

As explained, on the point \tau=\tau_\sigma, the problem LS_\tau provides the solution to BP_\sigma. But since both \phi and \phi' are known, a Newton method on this Pareto curve can then iteratively estimate \tau_\sigma from the implicit equation \phi(\tau_\sigma) = \sigma. Practically, this is done by solving of an approximate LS_\tau at each \tau (and the convergence of the Newton method is still linear).

At the end, the whole approach is very efficient for solving high dimensional BPDN problems (such that BPDN for images) and the final computation cost is mainly due to the cost of the forward and transposed multiplication of the matrix/operator A with vectors.

So what happens now if the \ell_1 norm is replaced by the TV norm in this process ? If we switch from BP_\sigma to TV_\sigma ? Is there a “SPGL1 way” to solve that ?

The function \phi resulting from such a context would have now the initial point \phi^2(0) = \|b\|^2_2 - (b^TA\bf 1)^2/\|A\bf 1\|^2_2 (with \bf 1 the constant vector) since a zero TV norm means a constant u = c \bf 1 (the value of \phi(0) arises just from the minimization on c). Notice that if A is for instance a Gaussian measurement matrix, \phi(0) will be very close to \|b\|_2 since the expectation value of the average of any row is zero.

For the rest, I’m unfortunately not sufficiently familiar with convex optimization theory to deduce what is \phi' for the TV framework (hum. I should definitely study that).

However, for the \ell_1 case, \phi(\tau) (i.e. LS_\tau) is computed approximately for each \tau. This approximation, which is also iterative, uses a special projection operator to guarantee that the current candidate solution in the iteration remains feasible, i.e. remains in the \ell_1 ball \{u : \|u\|_1 \leq \tau\}. As usual, this projection is accomplished through a Soft Thresholding procedure, i.e. as a solution of the problem

\min_{u} \|w -u\|^2_2 + \lambda \|u\|_{1},

where w is the point to project, and where \lambda>0 is set so that the projection is inside the \ell_1 ball above.

For the TV minimization case, the TV ball \{u : \|u\|_{TV} \leq \tau\} defining the feasible set of the approximate LASSO procedure would possibly generate a projection operator equivalent to the one solving the problem

\min_{u} \|w -u\|^2_2 + \lambda \|u\|_{TV}.

This is somehow related to one of the lessons provided in the TwIST paper (“A new twIst: two-step iterative shrinkage/thresholding algorithms for image restoration”) of J. Bioucas-Dias and M. Figueiredo about the so-called Moreau function : There is a deep link between some iterative resolutions of a regularized BP problem using a given sparsity metric, e.g. the \ell_1 or the TV norm, and the canonic denoising method of this metric, i.e. when the measurement is the identity operator, giving Soft Thresholding or TV denoising respectively.

Thanks to the implementation of Antonin Chambolle (used also by TwIST), this last canonic TV minimization can be computed very quickly. Therefore, if needed, the required projection on the TV ball above can be also inserted in a potential “SPGL1 for TV sparsity problem”.

OK… I agree that all that is just a very rough intuition. There is a lot of points to clarify and to develop. However, if you know something about all that (or if you detect that I’m totally wrong), or if you just want to comment this idea, feel free to use the comment box below …

About these ads
This entry was posted in Compressed Sensing. Bookmark the permalink.

23 Responses to SPGL1 and TV minimization ?

  1. Dear Laurent,

    As far as the Lagrangian formulation is concerned, you can replace the iterative thresholding algorithm proposed by so many researcher (including Figueiredo Nowak and Daubechies et al) by proximal iterations, where the soft thresholding is replaced by some inner iterations of Chambolle ROF algorithm (published in JMIV). For more information about proximal iterations (and convergence theorem), you should have a look at recent preprints of Patrick Combettes at Paris 6.

    Intuitively, you just follow the surrogate functional derivation made by Daubechies in her CPAM paper with Defrise and Demol, but replace L1 norm of coefficient in orthobasis by TV norm, which leads to the resolution of the lagrangian ROF.

    Another options is to make the change of variagle v=grad(u) so that the TV-L2 problem becomes a L1-L2 (with complex coefficients as you noticed), and solve with iterative thresholding for v. The issue is that during iterations, you need to impose the constraint rot(v)=0, which requires to solve the poisson equation at each iteration. Works great in practice (its 2 FFTs if you use periodic boundary conditions) !

    Also, in the paper of Chambolle, there is a nice trick to solve for deblurring when the operator is a projection, so it will also work for CS if the matrix is a random projector !
    It is a 1 line of code modification of Chambolle algorithm, very nice. At it’s like the fastest algorithm on earth to solve CS-TV ;)

  2. jackdurden says:

    Thanks a lot Gabriel for these important comments.

    I see more clearly now the links with other works.
    I’ll have a look to the references you propose.

  3. BTW, I think Chambolle’s paper is difficult to find online, but you can get a preprint here
    http://www.ceremade.dauphine.fr/preprints/CMD/2002-40.ps.gz

    It is a must read for any one interested by TV minimization. There are two very cool tricks at the end, one to solve the L2 constraint optimization (the lambda of the lagrangian penalization is updated by a simple rule at each iteration) and the other to solve inverse problems with projectors.

    The paper takes the example of deblurring, but it also works great for TV-inpainting or TV-CS with projection on random fourier subset ! Its is like a 5 lines of matlab code algorithm (once you have grad and div operator implemented somewhere), its converges very fast (its is a fixed point iteration with a speed of convergence).

  4. Igor Carron says:

    Gabriel,

    Has this CS-TV algorithm been implemented and released somewhere ?

    Igor.

  5. I will update my sparsity toolbox asap. I will send you an email when the code is released.

  6. jackdurden says:

    Gabriel,

    by CS-TV, you mean the regularized problem or the constrained TV_{\sigma} above ?

    Since initial Chambolle’s code is related to the regularized TV reconstruction. No ?

  7. Igor Carron says:

    Gabriel,

    Thanks.

    Igor.

  8. Dear Jackdurden,

    Chambolle’s algorithm solves the regularized TV, but at the end of the paper (section 4), he explains how to update at each step the regularization parameter so that the solution converges to the solution of TV_sigma. You just multiply lambda by sigma/error where error is the L2 error at current step.

    Chambolle’s algorithm solve TV denoising (ROF), meaning A=Identity, but at the end of the paper (section 5), he explains how to extend it when A is an orthogonal projector, meaning A A*=Identity (the sensing vectors are orthogonal), which is the case for partial Fourier matrices or orthogonalised gaussian matrices or sub-sampled noiselets or …

    The idea is to use Chambolle algorithm to denoise an initial image u (you can start by u=0), and at each step of the algorithm, project this image to be denoised on the measurement constraint A* u=y:
    u <- u + A* ( y – A u ) (*)
    and then you continue using sereral iteration of Chambolle’s algorithm, equation (9) of the paper.

    So you’ve got two embbeded for loops (one for the update step (*) and then several iteration of Chambolle’s algorithm) but if you solve with sigma=0, you can just do 1 iteration of Chambolle’s algorithm at each step.

  9. jackdurden says:

    Thank you again for the nice explanations,

    Best,
    Laurent

  10. I have put here some details about my own implementation of Chambolle’s algorithm
    http://www.ceremade.dauphine.fr/~peyre/upload/cs-tv/html/content.html

  11. jackdurden says:

    Sorry, some of your comments were referenced as spam by WordPress. I have untagged them and they appear in the list now. Thank you very much for this code. I’ll play with ASAP.

    … and your page
    http://www.ceremade.dauphine.fr/~peyre/upload/cs-tv/html/content.html
    is very interesting !

    Best
    Laurent

  12. Chambolle’s algo for CS is a neat idea. I have used it with an ortho projector as proposed by Chambolle to solve an inpainting problem and it’s indeed a single-line of code to modify and worked like a treat ! So it should work very well for CS too provided your measurement matrix is a subsampling of an ortho system. What if it is not exactly an ortho system but “close to” ?

    Pierre

  13. jackdurden says:

    Pierre,

    answer to your question “may” then come from a TV-SPGL1 using Chambolle’s ROF code as a part of the LASSO solver involved, as the soft-thresholding (i.e. the \ell_1 denoising technique) is currently used for the \ell_1 sparsity metric.

    //Mais bon…. avec des “may” (ou des “si” ) on met Paris en bouteille ;-)//

  14. jackdurden says:

    Or another one is to generalize the update of the regularization parameter of A. Chambolle (that allows the convergence to the constraint problem), to the Iterative Soft Thresholding procedure of TwIST, where the measurement matrix can be a RIP matrix (not necessarily an orthoprojector).

    Laurent

  15. Dear Pierre and Laurent,

    Actually yes (if I am not wrong), the algorithm generalizes to the case where A is not an orthogonal projector. You need to replace the orthogonal projection on the constraints Au=b

    u <- u + A^+ ( b – A u )

    (where A^+=A* is the pseudo inverse, equal to A* because we are dealing with an orthogonal projector), by a gradient descent step

    u than the operator norm of A (same as in the paper of Daubechies et al, and Combettes proved that it can be extended to twice the operator norm).

    Then you apply Chambolle’s algorithm with a regularization parameter lambda/tau instead of lambda.

    This corresponds to proximal iterations, since Chambolle algorithm is actually computing the proximal operator associated to the TV norm.

    Intuitively, as emphasized by Laurent, this corresponds to replacing the soft thresholding by the ROF resolution, which are prox operator of respectively the L1 norm in an ortho-basis and the TV norm.

    I am not sure wether the update of lambda to match the L2 constraint is still garantied to work in this case.

  16. Apparently I have some problems with my old school ASCII equations … let’s try in LaTeX …

    The initial iteration reads

    u \leftarrow A^+ ( b-Au )

    and the modified one

    u \leftarrow \frac{1}{\tau} A^* ( b-Au )

    where \tau should be larger than the operator norm of A for the iterations to converge.

  17. Ouch sorry, the iterations should be in fact

    u \leftarrow u + \frac{1}{\tau} A^* ( b-Au ).

  18. I see, that’s interesting since it generalizes the class of measurement matrices for which you can now apply the algo. Some testing is needed :)

  19. Igor Carron says:

    Laurent,

    Surely you have noted that there is new version of SPGL1 (v. 1.6). It’ll be featured on Nuit Blanche on Monday.

    Cheers,

    Igor.

  20. jackdurden says:

    No. I didn’t notice that new version. Thank you. BTW, I sent a mail to the authors of spgl1 about our recent TV discussion above.

    Laurent

  21. I have put here
    http://www.ceremade.dauphine.fr/~peyre/upload/cs-tv/OptimReview.pdf
    more details about iterative thresholding and Chambolle’s algorithm !

  22. jackdurden says:

    Thank you Gabriel. Very good review. I noticed however that the correct link is :

    http://www.ceremade.dauphine.fr/~peyre/cs-tv/OptimReview.pdf

    Best,
    Laurent

  23. Pingback: SPGL1 and TV: Answers from SPGL1 Authors « Le Petit Chercheur Illustré

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s