Thursday, September 16, 2010

Learning Functions in High Dimensions

Ever since Bernard Beauzamy asked me the question on the sampling needed to determine a function, the question has gotten stuck in the back of my mind. Bernard has devised the Experimental Probabilistic Hypersurface (EPH) but some other more theoretical development have popped  up in the past two years. Here is a list of the different papers and links to Nuit Blanche (a blog mostly focused on Compressed Sensing) that try to provide an answer to the subject.  Without further due, here is the list:

Learning Functions of Few Arbitrary Linear Parameters in High Dimensions by Massimo Fornasier, Karin Schnass, Jan Vybíral. The abstract reads:
Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function $g$, and an {\it arbitrary} choice of the matrix $A$, we present in this paper 1. a sampling choice of the points $\{x_i\}$ drawn at random for each function approximation; 2. an algorithm for computing the approximating function, whose complexity is at most polynomial in the dimension $d$ and in the number $m$ of points. Due to the arbitrariness of $A$, the choice of the sampling points will be according to suitable random distributions and our results hold with overwhelming probability. Our approach uses tools taken from the {\it compressed sensing} framework, recent Chernoff bounds for sums of positive-semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions.
From this entry

From this entry:

Following up on some elements shown in the third Paris Lecture of Ron DeVore, here is the more substantive paper: Approximation of functions of few variables in high dimensions by Ron DeVore, Guergana Petrova, Przemyslaw Wojtaszczyk. The abstract reads:

Let f be a continuous function defined on \Omega:= [0, 1]^N which depends on only l coordinate variables, f(x1, . . . , xN) = g(xi1 , . . . , xi` ). We assume that we are given m and are allowed to ask for the values of f at m points in \Omega. If g is in Lip1 and the coordinates i_1, . . . , i_l are known to us, then by asking for the values of f at m = L^l uniformly spaced points, we could recover f to the accuracy |g|Lip1L−1 in the norm of C(). This paper studies whether we can obtain similar results when the coordinates i_1, . . . , i_l are not known to us. A prototypical result of this paper is that by asking for C(l)L^l(log2 N) adaptively chosen point values of f, we can recover f in the uniform norm to accuracy |g|Lip1L−1 when g 2 Lip1. Similar results are proven for more general smoothness conditions on g. Results are also proven under the assumption that f can be approximated to some tolerance \epsilon (which is not known) by functions of l variables.
I note that the authors make a connection to the Junta's problem as discussed by Dick Lipton recently and mentioned here.

From this entry:

Albert Cohen just released the 4th course notes of Ron DeVore's 4th lecture in Paris entitled Capturing Functions in Infinite Dimensions ( the third lecture is here, and the first and second are here), The abstract reads:
The following are notes on stochastic and parametric PDEs of the short course in Paris. Lecture 4: Capturing Functions in Infinite Dimensions. Finally, we want to give an example where the problem is to recover a function of infinitely many variables. We will first show how such problems occur in the context of stochastic partial differential equations.
From here:

The Course Notes of the third lecture given by Ron DeVore in Paris has been released on Albert Cohen's page. It features "Capturing Functions in High Dimensions" and seems to aim at giving bounds for nonlinear compressed sensing and should have an impact in manifold signal processing. Interesting. The first two lectures were mentioned here. The beginning of the lecture starts with

1.1 Classifying High Dimensional Functions:
Our last two lectures will study the problem of approximating (or capturing through queries) a function f defined on ⊂ R^N with N very large. The usual way of classifying functions is by smoothness. The more derivatives a function has the nicer it is and the more efficiently it can be numerically approximated. However, as we move into high space dimension, this type of classification will suffer from the so-called curse of dimensionality which we shall now quantify
From here:
Albert Cohen just released Ron Devore's lecture notes he is presenting in Paris entitled: Foundations of Compressed Sensing.
I
From here:

* Ronald DeVore: Recovering sparsity in high dimensions

The abstract reads:
We assume that we are in $R^N$ with $N$ large. The first problem we consider is that there is a function $f$ defined on $Omega:=[0,1]^N$ which is a function of just $k$ of the coordinate variables: $f(x_1,dots,x_N)=g(x_{j_1},dots,x_{j_k})$ where $j_1,dots,j_k$ are not known to us. We want to approximate $f$ from some of its point values. We first assume that we are allowed to choose a collection of points in $Omega$ and ask for the values of $f$ at these points. We are interested in what error we can achieve in terms of the number of points when we assume some smoothness of $g$ in the form of Lipschitz or higher smoothness conditions.
We shall consider two settings: adaptive and non-adaptive. In the adaptive setting, we are allowed to ask for a value of $f$ and then on the basis of the answer we get we can ask for another value. In the non-adaptive setting, we must prescribe the $m$ points in advance.A second problem we shall consider is when $f$ is not necessarily only a function of $k$ variables but it can be approximated to some tolerance $epsilon$ by such a function. We seek again sets of points where the knowledge of the values of $f$ at such points will allow us to approximate $f$ well. Our main consideration is to derive results which are not severely effected by the size of $N$, i.e. are not victim of the curse of dimensionality. We shall see that this is possible.