I have decided to avoid using the too conventional matlab environment. Rather, I took this exercise as an opportunity to learn the “ipython notebooks” and the wonderful tools provided by the SciPy python ecosystem.

In short, for those of you who don’t know it, the ipython notebooks allow you to generate actual scientific HTML reports with (latex rendered) explanations and graphics.

The result cannot be properly presented on this blog (hosted on WordPress), so, I decided to share the report through IPython Notebook Viewer website.

Here it is:

“Testing a Quasi-Isometric Embedding”

*(update 21/11/2013)* … and a variant of it estimating a “curve of failure” (rather than playing with standard deviation analysis):

“Testing a Quasi-Isometric Embedding with Percentile Analysis”

Moreover, on these two links, you have also the possibility to download the corresponding script for running it on your own IPython notebook system.

If you have any comments or corrections, don’t hesitate to add them below in the “comment” section. Enjoy!

]]>

Last July, I read the biography of Paul Erdős written by Paul Hoffman and entitled “*The Man Who Loved Only Numbers*“. This is really a wonderful book sprinkled with many anecdotes about the particular life of this great mathematician and about his appealing mathematical obsessions (including prime numbers).

At one location of this book my attention was caught by the mention of what is called the “Buffon’s needle problem”. It seems to be a very old and well-known problem in the field of “geometrical probability” and I discovered later that Emmanuel Kowalski (Math dep., ETH Zürich, Switzerland) explained it in one of his blog posts.

In short, this problem, formulated by Georges-Louis Leclerc, Comte de Buffon in France in one of the numerous volumes of his impressive work entitled “L’Histoire Naturelle”, is formulated as follows:

“I suppose that in a room where the floor is simply divided by parallel joints one throws a stick (N/A: later called “needle”) in the air, and that one of the players bets that the stick will not cross any of the parallels on the floor, and that the other in contrast bets that the stick will cross some of these parallels; one asks for the chances of these two players.”

The English translation is due to [1]. The solution (published by Leclerc in 1777) is astonishingly simple: for a short needle compared to the separation between two consecutive parallels, the probability of having one intersection between the needle and the parallels is equal to the needle length times ! If the needle is longer, then this probability is less easy to express but the expectation of the number of intersections (which can now be bigger than one) remains equal to this value. Surprisingly, this result still holds if the needle is replaced by a finite smooth curve that some authors then call the “noodle” problem (e.g., in [5])

The reason why this problem rang a bell is related to its similarity with a quantization process!

Indeed, think for a while to the needle as the segment formed by two points in the plane and assume all the parallel joints normal to the first canonical axis of . Let us also think to the area defined by two consecutive joints as an infinite strip of width . Then, the number of intersections that this “needle” makes with the grid of parallel joints is related to the distance between the two strips occupied by the two points, i.e., to the distance between a uniform quantization (or rounding off) of the -coordinate of the two points!

From this observation, if I realized that if we randomly turn these two points with a random rotation of and if a random translation along the -axis is added to their coordinates, the context of the initial Buffon’s problem is exactly recovered!

Interestingly enough, after this randomized transformation, the first coordinate of one of the two points (defining the needle extremities), say , reads

where is a uniform random variable on the circle . What you observe on the right of the last equivalence is nothing but a random projection of the point on the direction .

This was really amazing to discover: after these very simple developments, I had in front of me a kind of triple equivalence between Buffon’s needle problem, quantization process in the plane and a well-known linear random projection procedure. This boded well for a possible extension of this context to high-dimensional (random) projection procedures, e.g., those used in common linear dimensionality reduction methods and in the compressed sensing theory.

Actually, this gave me a new point of view for solving these two connected questions: How to combine the well-known Johnson-Lindenstrauss Lemma with a quantization of the embedding it proposes? What (new) distortion of the embedding can we expect from this non-linear operation?

Let me recall the tenet of the JL Lemma: For a set of points, if you fix and , as soon as , there exist a mapping such that, for all pairs ,

with some possible variants on the selected norms, e.g., from some measure concentration results in Banach space [6], the result is still true with the same condition on if we take the norm of . This is this variant that matters in the rest of this post.

It took me some while but after having generalized Buffon’s needle problem to a -dimensional space (where the needle is still a 1-D segment “thrown” randomly in a grid of -dimensional parallel hyperplane that are apart) — which provided a few interesting asymptotic relations concerning this probabilistic problem — I was also able to generalize the previous equivalence as follows: *Uniformly quantizing the random projections in of two points in and measuring the difference between their quantized values is fully equivalent to study the number of intersections made by the segment determined by those two points (seen as a Buffon’s needle) with a parallel grid of -dimensional hyperplanes.*

This equivalence was the starting point to discover the following proposition (the main result of the paper referenced above) which can be seen as a quantized form of the Johnson-Lindenstrauss Lemma:

Let be a set of points. Fix and . For , there exist a non-linear mapping and two constants such that, for all pairs ,

Moreover, this mapping can be randomly constructed by

where is a uniform quantization of bin width , is a Gaussian random matrix and is a uniform random vector over . Except for the quantization, this construction is similar to the one introduced in [7] (for non-regular quantizers).

Without entering into the details, the explanation of this result comes from the fact that the random projection can be seen as a random rotation of followed by a random scaling of the vector amplitude. Therefore, conditionally to this amplitude, the equivalence with Buffon’s problem is recovered for a (scaled) needle determined by the vectors and above, the *dithering* playing the role of the random needle shift.

Interestingly, compared to the common JL Lemma, the mapping is now “quasi-isometric“: we observe both an **additive** and a **multiplicative distortions** on the embedded distances of . These two distortions, however, decay as

when increases!

This kind additive distortion decay was already observed for “binary” (or one-bit) quantization procedure [2, 3, 4] applied on random projection of points (e.g., for 1-bit compressed sensing). Above, we still observe such a distortion for the (multi-bit) quantization above and, moreover, this distortion is combined with a multiplicative one while **both decay** when increases. This fact is new, to the best of my knowledge.

Moreover, for coarse quantization, i.e., for high compared to

the typical size of , the distortion is mainly additive, while for small we tend to a classical Lipschitz isometric embedding, as provided by the JL Lemma.

Interested blog reader can have a look to my paper for a clearer (I hope) presentation of this informal summary. His summary is as follows:

“In 1733, Georges-Louis Leclerc, Comte de Buffon in France, set the ground of geometric probability theory by defining an enlightening problem: What is the probability that a needle thrown randomly on a ground made of equispaced parallel strips lies on two of them? In this work, we show that the solution to this problem, and its generalization to dimensions, allows us to discover a quantized form of the Johnson-Lindenstrauss (JL) Lemma, i.e., one that combines a linear dimensionality reduction procedure with a uniform quantization of precision . In particular, given a finite set of points and a distortion level , as soon as , we can (randomly) construct a mapping from to that approximately preserves the pairwise distances between the points of . Interestingly, compared to the common JL Lemma, the mapping is quasi-isometric and we observe both an additive and a multiplicative distortions on the embedded distances. These two distortions, however, decay as when increases. Moreover, for coarse quantization, i.e., for high compared to the set radius, the distortion is mainly additive, while for small we tend to a Lipschitz isometric embedding. Finally, we show that there exists “almost” a quasi-isometric embedding of in . This one involves a non-linear distortion of the -distance in that vanishes for distant points in this set. Noticeably, the additive distortion in this case is slower and decays as .”

Hoping there is no killing bug in my developments, any comments are also welcome.

**References:**

[1] J. D. Hey, T. M. Neugebauer, and C. M. Pasca, “Georges-Louis Leclerc de Buffons Essays on Moral Arithmetic,” in The Selten School of Behavioral Economics, pp. 245–282. Springer, 2010.

[2] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors,” IEEE Transactions on Information Theory, Vol. 59(4), pp. 2082-2102, 2013.

[3] Y. Plan and R. Vershynin, “One-bit compressed sensing by linear programming,” Communications on Pure and Applied Mathematics, to appear. arXiv:1109.4299, 2011.

[4] M. Goemans and D. Williamson, “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming,” Journ. ACM, vol. 42, no. 6, pp. 1145, 1995.

[5] J. F. Ramaley, “Buffon’s noodle problem,” The American Mathematical Monthly, vol. 76, no. 8, pp. 916–918, 1969.

[6] M. Ledoux and M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes, Springer, 1991.

[7] P. T. Boufounos, “Universal rate-efficient scalar quantization.” Information Theory, IEEE Transactions on 58.3 (2012): 1861-1872.

[8] G. C. Buffon, “Essai d’arithmetique morale,” Supplément à l’histoire naturelle, vol. 4, 1777. See also: http://www. buffon.cnrs.fr

]]>

I was wondering if these could help in showing that a simple variant of *basis pursuit denoising* using a -fidelity constraint, *i.e.*, a solver, is optimal in recovering sparse signals from sparsely corrupted compressed measurements. After all, one of the key ingredient in 1-bit CS is the operator that is, interestingly, the (sub) gradient of the -norm, and for which many random embedding properties have been recently proved [1,2,4].

The answer seems to be positive when you merge these results with the simplified BPDN optimality proof of E. Candès [3]. I have gathered these developments in a very short technical report on arXiv:

Laurent Jacques, “On the optimality of a L1/L1 solver for sparse signal recovery from sparsely corrupted compressive measurements” (Submitted on 20 Mar 2013)

Abstract:This short note proves the instance optimality of a solver, i.e., a variant of basis pursuit denoising with a -fidelity constraint, when applied to the estimation of sparse (or compressible) signals observed by sparsely corrupted compressive measurements. The approach simply combines two known results due to Y. Plan, R. Vershynin and E. Candès.

Briefly, in the context where a sparse or compressible signal is observed by a random Gaussian matrix , *i.e.*, with , according to the noisy sensing model

,

where is a “sparse” noise with bounded -norm (), the main point of this note is to show that the program

provides, under certain conditions, a bounded reconstruction error (aka - instance optimal):

Noticeably, the two conditions (2) and (3) are not unrealistic, I mean, they are not worst than assuming the common *restricted isometry pro**perty* ;-). Indeed, thanks to [1,2], we can show that they hold for random Gaussian matrices as soon as :

As explained in the note, it seems also that the dependency in can be improved to for having (5). The question of proving the same dependence for (6) is open. You’ll find more details (and proofs) in the note.

Comments are of course welcome

*References*:

[1] Y. Plan and R. Vershynin, “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach,” IEEE Transactions on Information Theory, to appear., 2012.

[2] Y. Plan and R. Vershynin, “Dimension reduction by random hyperplane tessellations,” arXiv preprint arXiv:1111.4452, 2011.

[3] E. Candès, “The restricted isometry property and its implications for compressed sensing,” Compte Rendus de l’Academie des Sciences, Paris, Serie I, vol. 346, pp. 589–592, 2008.

[4] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, “Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors”

IEEE Transactions on Information Theory, in press.

]]>

.

This is maybe obvious and it probably serves to nothing but here is anyway the argument.

Take a -sparse vector in and generate randomly two Gaussian matrices and in with i.i.d. entries drawn from . From the vectors and , you can form two new vectors and such that and .

Then, it is easy to show that matrix

is actually Gaussian except in the direction of (where it vanishes to 0).

This can be seen more clearly in the case where . Then the first column of is and the rest of the matrix is independent of and . Conditionally to the value of these two diagonal matrices, this part of is therefore Gaussian with each entry () of variance . Then, the condition can be removed by expectation rule to lead to the cdf of , and then to the pdf by differentiation, recovering the Gaussian distribution in the orthogonal space of .

However, cannot be RIP. First, obviously since which shows, by construction, that at least one -sparse vector (that is, ) is in the null space of . Second, by taking vectors in , we clearly have for any . Therefore, we can alway take the norm of sufficiently small so that is far from .

Of course, for the space of -sparse vectors orthogonal to , the matrix is still RIP. It is easy to follow the argument above for proving that is Gaussian in this space and then to use classical RIP proof [2].

All this is also very close from the “Cancel-then-Recover” strategy developed in [3]. The only purpose of this post to prove the (useless) result that combining two Gaussian matrices as in (1) leads to a non-RIP matrix.

*References*:

[1] Candes, Emmanuel J., and Terence Tao. “Decoding by linear programming.” *Information Theory, IEEE Transactions on* 51.12 (2005): 4203-4215.

[2] Baraniuk, Richard, et al. “A simple proof of the restricted isometry property for random matrices.” *Constructive Approximation* 28.3 (2008): 253-263.

[3] Davenport, Mark A., et al. “Signal processing with compressive measurements.” *Selected Topics in Signal Processing, IEEE Journal of* 4.2 (2010): 445-460.

]]>

“New all-sky map shows the magnetic fields of the Milky Way with the highest precision“

by Niels Oppermann et al. (arxiv work available here)

Selected excerpt:

*“… One way to measure cosmic magnetic fields, which has been known for over 150 years, makes use of an effect known as Faraday rotation. When polarized light passes through a magnetized medium, the plane of polarization rotates. The amount of rotation depends, among other things, on the strength and direction of the magnetic field. Therefore, observing such rotation allows one to investigate the properties of the intervening magnetic fields.”*

Mmmm… very interesting, at least for my personal knowledge of the wonderful tomographical problem zoo (amongst gravitational lensing, interferometry, MRI, deflectometry).

P.S. Wow… 16 months without any post here. I’m really bad.

]]>

I wrote in 2008 a tiny Matlab toolbox (see here) to convince myself that the Noiselet followed a Cooley-Tukey implementation already followed by the Walsh-Hadamard transform. It should have been optimized in C but I lacked of time to write this.Since this first code, I realized that Justin Romberg wrote already in 2006 with Peter Stobbe a fast code (also O(N log N) but much faster than mine) available here:

People could be interested in using Justin’s code since, as it will be clarified from my answers given below, it is already adapted to real valued signals, i.e., it produces real valued noiselets coefficients.

As for the the *Random Fourier Ensemble *sensing, what I personally do when I use noiselet sensing is to pick uniformly at random complex values in half the noiselet-frequency domain, and concatenate the real and the imaginary part into a real vector of length . The adjoint (transposed) operation — often needed in most of Compressed Sensing solvers — must of course sum the previously split real and imaginary parts into complex values before to pad the complementary measured domain with zeros and run the inverse noiselet transform.

To understand the special treatment of the real and the imaginary parts (and not simply by considering it similar to what is done for Random Fourier Ensemble), let us consider the origin, that is, Coifman et al. Noiselets paper.

Recall that in this paper, two kinds of noiselets are defined. The first basis, the common Noiselets basis on the interval , is defined thanks to the recursive formulas:

The second basis, or *Dragon Noiselets*, is slightly different. Its elements are symmetric under the change of coordinates . Their recursive definition is

To be more precise, the two sets

,

,

are orthonormal bases for piecewise constant functions at resolution , that is, for functions of

In Coifman et al. paper, the recursive definition of Eq. (2) (and also Eq (4) for Dragon Noiselets), which connects the noiselet functions between the noiselet index and indices or , is simply a common butterfly diagram that sustains the Cooley-Tukey implementation of the Noiselet transform.

The coefficients involved in Eqs (2) and (4) are simply , which are of course complex conjugate of each other.

Therefore, in the Noiselet transform of a real vector of length (in one to one correspondance with the piecewise constant functions of ) involving the noiselets of indices , the resulting decomposition diagram is fully symmetric (with a complex conjugation) under a flip of indices , for .

This shows that

,

with the complex conjugation, if is real, and allows us to define “Real Random Noiselet Ensemble” by picking uniformly at random complex values in the half domain , that is independent real values in total, as obtained by concatenating the real and the imaginary parts (see above).

Therefore, for real valued signals, as for Fourier, the two halves of the noiselet spectrum are not independent, and therefore, only one half is necessary to perform useful CS measurements.

Justin’s code is close to this interpretation by using a real valued version of the symmetric Dragon Noiselets described in the initial Coifman et al. paper.

**Q2. Are noiselets always binary? or do they take +1, -1, 0 values like Haar wavelets?**

~~Actually, a noiselet of index take the complex values , never .This can be easily seen from the recursive formula of Eq. (2).~~

They fill also the whole interval .

*Update — 26/8/2013:**
*I was obviously wrong above on the values that noiselets can take (Thank you to Kamlesh Pawar for the detection).

Noiselet amplitude can never be zero, however, either the real part or the imaginary part (not both) can vanish for certain .

So, to be correct and from few computations, a noiselet of index with takes, over the interval , the complex values if is odd, and or if is even.

In particular, we see that the amplitude of these noiselets is always for the considered indexes.

**Q3. Walsh functions have the property that they are binary and zero mean, so that one half of the values are 1 and the other half are -1. Is it the same case with the real and/or imag parts of the noiselet transform?**

To be correct, Walsh-Hadamard functions have mean equal to 1 if their index is a power of 2 and 0 else, starting with the [0,1] indicator function of index 1.

For Noiselets, they are all of unit average, meaning that the imaginary part has the zero average property. This can be proved easily (by induction) from their recursive definition in Coifman et al. paper (Eqts (2) and (4)). Interestingly, their unit average, that is their projection on the unit constant function, shows directly that a constant function is not sparse at all in the noiselet basis since its “noiselet spectrum” is just flat.

In fact, it is explained in Coifman paper that any Haar-Walsh wavelet packets, that is, elementary functions of formula

with the Walsh functions (including the Haar functions), have a flat noiselet spectrum (all coefficients of unit amplitude), leading to the well known good (in)coherence results (that is, low coherence). To recall, the coherence is given by for the Haar wvaelet basis, and it corresponds to slightly higher values for the Daubechies wavelets D4 and D8 respectively (see, e.g., E.J. Candès and M.B. Wakin, “An introduction to compressive sampling”, IEEE Sig. Proc. Mag., 25(2):21–30, 2008.)

**Q4. How come noiselets require O(N logN) computations rather than O(N) like the haar transform?**

This is a verrry common confusion. The difference comes from the locality of the Haar basis elements.

For the Haar transform, you can use the well known pyramidal algorithm running in computations. You start from the approximation coefficients computed at the finest scale, using then the wavelet scaling relations to compute the detail and approximation coefficients at the second scale, and so on. Because of the sub-sampling occuring at each scale, the complexity is proportional to the number of coefficients, that is, it is .

For the 3 bases Hadamard-Walsh, Noiselets and Fourier, their non-locality (i.e., their support is the whole segment [0, 1]) you cannot run a similar alorithm. However, you can use the Cooley-Tukey algorithm arising from the Butterfly diagrams linked to the corresponding recursive definitions (Eqs (2) and (4) above).

This one is in , since the final diagram has levels, each involving multiplication-additions.

—

Feel free to comment this post and ask other questions. It will provide perhaps eventually a general Noiselet FAQ/HOWTO

]]>

I just found this interesting paper about concentration properties of submodular function (very common in “Graph Cut” methods for instance) on arxiv:

## A note on concentration of submodular functions. (arXiv:1005.2791v1 [cs.DM])

Jan Vondrak, May 18, 2010

We survey a few concentration inequalities for submodular and fractionally subadditive functions of independent random variables, implied by the entropy method for self-bounding functions. The power of these concentration bounds is that they are dimension-free, in particular implying standard deviation O(\sqrt{\E[f]}) rather than O(\sqrt{n}) which can be obtained for any 1-Lipschitz function of n variables.

In particular, the author shows some interesting concentration results in his corollary 3.2.

Without having performed any developments, I’m wondering if this result could serve to define a new class of matrices (or non-linear operators) satisfying either the Johnson-Lindenstrauss Lemma or the Restricted Isometry Property.

For instance, by starting from Bernoulli vectors , *i.e.*, the rows of a sensing matrix, and defining some specific submodular (or self-bounding) functions (e.g. for some sparse vector and a “kind” function ), I’m wondering if the concentration results above are better than those coming from the classical concentration inequalities (based on the Lipschitz properties of or . See e.g., the books of Ledoux and Talagrand)?

Ok, all this is perhaps just due to too early thoughts …. before my mug of black coffee

]]>

Yesterday I found some very funny (math) jokes on Bjørn’s maths blog about **“How to catch a lion in the Sahara desert”** with some … mathematical tools.

Bjørn collected there many ways to realize this task from many places on the web. There are really tons of examples. To give you an idea, here is the Schrodinger’s method:

*“At any given moment there is a positive probability that there is a lion in the cage. Sit down and wait.”*

or this one :

*“The method of inverse geometry: We place a spherical cage in the desert and enter it. We then perform an inverse operation with respect to the cage. The lion is then inside the cage and we are outside.”*

So, let’s try something about Compressed Sensing. (Note: if you have something better than my infamous suggestion, I would be very happy to read it as a comment to this post.)

*“How to catch a lion in the Sahara desert” *

*The compressed sensing way: First you consider that only one lion in a big desert is definitely a very sparse situation by comparing lion’s size and the desert area. No need for a cage, just project randomly the whole desert into a dune of just 5 times the lion’s weight ! Since the lion obviously died in this shrinking operation, you use the RIP (!) .. and *relaxed

Image: Wikipedia

]]>

Nicely, they sent me interesting answers (many thanks to them). Here they are (using the notations of the previous post) :

Michael’s answer is about the need of a TV Lasso solver :

“It’s an intriguing project that you describe. I suppose in principle the theory behind spgl1 should readily extend to TV (though I haven’t thought how a semi-norm might change things). But I’m not sure how easy it’ll be to solve the “TV-Lasso” subproblems. Would be great if you can see a way to do it efficiently. “

Ewout on his side explained this :

“The idea you suggest may very well be feasible, as the approach taken in SPGL1 can be extended to other norms (i.e., not just the one-norm), as long as the dual norm is known and there is way to orthogonally project onto the ball induced by the primal norm. In fact, the newly released version of SPGL1 takes advantage of this and now supports two new formulations.

I heard (I haven’t had time to read the paper) that Chambolle has described the dual to the

TV-norm. Since the derivate of on the appropriate interval is given by the dual norm on , that part should be fine (for the one norm this gives the infinity norm).

In SPGL1 we solve the Lasso problem using a spectrally projected gradient method, which

means we need to have an orthogonal projector for the one-norm ball of radius . It is not immediately obvious how to (efficiently) solve the related problem (for a given ):

minimize subject to .

However, the general approach taken in SPGL1 does not really care about how the Lasso

subproblem is solved, so if there is any efficient way to solve

minimize subject to ,

then that would be equally good. Unfortunately it seems the complexification trick (see the previous post) works only from the image to the differences; when working with the differences themselves, additional constraints would be needed to ensure consistency in the image; i.e., that

summing up the difference of going right first and then down, be equal to the sum of

going down first and then right.”

In a second mail, Ewout added an explanation on this last remark :

“I was thinking that perhaps, instead of minimizing over the signal it would be possible to minimize over the differences (expressed in complex numbers in the two-dimensional setting). The problem with that is that most complex vectors do not represent difference vectors (i.e., the differences would not add up properly). For such an approach to work, this consistency would have to be enforced by adding some constraints.”

Actually, I saw similar considerations in A. Chambolle‘s paper: “*An Algorithm for Total Variation* Minimization and Applications”. It is even more clear in the paper he wrote with J.-F. Aujol, “Dual Norms and Image Decomposition Models”. They develop there the notions of TV (semi) norm for different exponent (i.e. in the norm used on the norm of the gradient components) and in particular they answer to the problem of finding and computing the corresponding dual norms. For the usual TV norm, this leads to the *G-norm* :

where, as for the continuous setting, is the discrete divergence operator defined as the adjoint of the finite difference gradient operator used to defined the TV norm. In other words, , where and .

Unfortunately, the G norm computation seems not so obvious that the one of its dual counterpart and an optimization method must be used. I don’t know if this could lead to an efficient implementation of a TV spgl1.

]]>

where is the usual measurement matrix for a measurement vector , and and are the and the norm of the vector respectively. In short, as shown by E. Candès, J. Romberg and T. Tao, if is well behaved, i.e. if it satisfies the so-called *Restricted Isometry Property* for any sparse signals, then the solution of approximates (with a controlled error) a sparse (or compressible) signal such that , where is an additive noise vector with power .

The reason of this post is the following : **I’m wondering if SPGL1 could be “easily” transformed into a solver of the Basis Pursuit with the Total Variation (TV) norm.** That is, the minimization problem

where with is the th component of the complex finite difference operator applied on the vectorized image of pixels (in a set of coordinates and ). I have used here a “complexification” trick putting the finite differences and according to the directions and in the real part and the imaginary part respectively of the complex operator . The TV norm of is then really the norm of .

This problem is particularly well designed for the reconstruction of compressed sensed images since most of them are very sparse in the “gradient basis” (see for instance some references about Compressed Sensing for MRI). Minimizing the TV norm, since performed in the spatial domain, is also sometimes more efficient than minimizing the norm is a particular sparsity basis (e.g. 2-D wavelets, curvelets, …).

Therefore, I would say that, as for the initial SPGL1 theoretical framework, it could be interesting to study the *Pareto frontier* related to , even if the TV norm is now a quasi-norm, i.e. does not imply but for a certain constant .

To explain better that point, let me first summarize the paper of Friedlander and van den Berg quoted above. They proposed to solve by solving a *LASSO* problem (or ) regulated by a parameter ,

If I’m right, the key idea is that there exists a such that is equivalent to . The problem is thus to assess this point. SPGL1 finds iteratively using the fact that all the problems define a smooth and decreasing curve of (the *Pareto curve*) from the norm of the residual , where is the solution of . More precisely, the function

is decreasing from to a value such that

Interestingly, the derivative exists on and it is simply equal to with .

As explained, on the point , the problem provides the solution to . But since both and are known, a Newton method on this Pareto curve can then iteratively estimate from the implicit equation . Practically, this is done by solving of an approximate at each (and the convergence of the Newton method is still linear).

At the end, the whole approach is very efficient for solving high dimensional BPDN problems (such that BPDN for images) and the final computation cost is mainly due to the cost of the forward and transposed multiplication of the matrix/operator with vectors.

**So what happens now if the norm is replaced by the TV norm in this process ? If we switch from to ? Is there a “SPGL1 way” to solve that ?**

The function resulting from such a context would have now the initial point (with the constant vector) since a zero TV norm means a constant (the value of arises just from the minimization on ). Notice that if is for instance a Gaussian measurement matrix, will be very close to since the expectation value of the average of any row is zero.

For the rest, I’m unfortunately not sufficiently familiar with convex optimization theory to deduce what is for the TV framework (hum. I should definitely study that).

However, for the case, (i.e. ) is computed approximately for each . This approximation, which is also iterative, uses a special projection operator to guarantee that the current candidate solution in the iteration remains feasible, i.e. remains in the ball . As usual, this projection is accomplished through a *Soft Thresholding* procedure, i.e. as a solution of the problem

,

where is the point to project, and where is set so that the projection is inside the ball above.

For the TV minimization case, the TV ball defining the feasible set of the approximate LASSO procedure would possibly generate a projection operator equivalent to the one solving the problem

.

This is somehow related to one of the lessons provided in the TwIST paper (*“A new twIst: two-step iterative shrinkage/thresholding algorithms for image restoration”*) of J. Bioucas-Dias and M. Figueiredo about the so-called *Moreau* function : **There is a deep link between some iterative resolutions of a regularized BP problem using a given sparsity metric, e.g. the or the TV norm, and the canonic denoising method of this metric, i.e. when the measurement is the identity operator, giving Soft Thresholding or TV denoising respectively.**

Thanks to the implementation of Antonin Chambolle (used also by TwIST), this last canonic TV minimization can be computed very quickly. Therefore, if needed, the required projection on the TV ball above can be also inserted in a potential “SPGL1 for TV sparsity problem”.

OK… I agree that all that is just a very rough intuition. There is a lot of points to clarify and to develop. However, if you know something about all that (or if you detect that I’m totally wrong), or if you just want to comment this idea, feel free to use the comment box below …

]]>