Some duality bounds for nonconvex problems, II

Introduction

We continue the discussion in the previous note for [1]. Consider the following problem,

$$z^\ast = \inf ~ f(x) + g(Ax), ~\text{s.t.} ~ x \in \mathcal{X}, Ax \in \mathcal{Y}. \tag{P}$$

Here $f$ is a real-valued lower semi-continuous function on a convex set $\mathcal{X} \subset \mathbb{E}$, and $g$ is a convex function on $\mathbb{R}^m$. Now suppose the primal variable has some decomposition structure,

$$\mathcal{X} = \prod_{i \in [I]} \mathcal{X}_i, \quad f(x) = \tfrac{1}{I}\textstyle\sum_{i \in [I]} f_i(x_i), \quad A = \tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i.$$

and the number of blocks $I$ is large with respect to the dimension of $\mathcal{Y}$:

$$I \gg m.$$

and each $\mathcal{X}_i \subseteq \mathbb{E}_i$ is a convex set of a corresponding lower-dimensional linear space $\mathbb{E}_i$, and $A_i \in \mathcal{L}(\mathbb{E}_i, \mathbb{R}^m)$ is a linear mapping to Euclidean space. Then, the problem reads,

$$\begin{align*} z^\ast = \inf ~&\left\{\tfrac{1}{I}\textstyle\sum_{i \in [I]} f_i(x_i) + g\left(\tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i x_i\right)\right\}, \\ \text{s.t.}~ &\tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i x_i \in \mathcal{Y}, ~ x_i \in \mathcal{X}_i, \forall i \in [I]. \end{align*}$$

With some algebra, the dual function is,

$$d(y) = - f_\ast(-A^\ast y) - g_\ast(y) = - \tfrac{1}{I} \textstyle\sum_{i \in [I]} f_{i,\ast}(-A_i^\ast y) - g_{i,\ast}(y).$$

Main result. In the previous note, we have,

$$z^\ast - d^\ast \le \tfrac{1}{I} \textstyle\sum_{i \in [I]} \rho(f_i) \le \max_{i \in [I]} \rho(f_i).$$

The whole point is, we can find a much better bound,

Suppose that $\mathcal{X}_i, \mathcal{Y}$ are convex sets in the corresponding ambient spaces $\mathbb{E}_i, \mathbb{R}^m$, and $f_i$ are lower semi-continuous functions on $\mathcal{X}_i$, and $g$ is a convex function on $\mathbb{R}^m$. Then, if the Slater's condition holds, i.e.,

$$0 \in \textrm{int}(\mathcal{Y} - A(\mathcal{X})),$$

then,

$$\boxed{ z^\ast - d^\ast \le {\color{#BE5845}{(m+1)I^{-1}}} \max_{i \in [I]} \rho(f_i). }$$

which depends on the dimension of image of linear mapping $A$ and diminishes as the number of blocks $I$ increases.

Shapley-Folkman lemma. We introduce the following lemma. The lemma may be intuitively understood as saying that, if the number of summed sets exceeds the dimension of the vector space, then their Minkowski sum is approximately convex.

Let $K_1, ..., K_N$ be a finite family of compact sets in $\mathbb{R}^m$. For every $\xi \in \mathbb{R}^m$ such that

$$\xi \in \mathrm{conv}\left(\textstyle\sum_{n \in [N]} K_n\right) = \textstyle\sum_{n \in [N]} \mathrm{conv}(K_n),$$

there exists a subset of indices $S = S(\xi) \subseteq [N]$ with $|S| \le m$ such that,

$$\xi \in \textstyle\sum_{n \in S} \mathrm{conv}(K_n) + \textstyle\sum_{n \notin S} K_n.$$

This means, for any $\xi \in \mathrm{conv}(\textstyle\sum_{n \in [N]} K_n)$, we could find the decomposition of $\xi$ into a sum of a convex combination of elements in $\mathrm{conv}(K_n)$ and elements in $K_n$:

$$\xi = \textstyle\sum_{n \in S} x_n + \textstyle\sum_{n \notin S} y_n, \quad |S| \le m.$$

where $x_n \in \mathrm{conv}(K_n)$ and $y_n \in K_n$. The Shapley-Folkman-Starr theorem uses this lemma to show the Hausdorff distance between the Minkowski sum and the convex hull of the Minkowski sum is bounded.

Proof of main result

Similar to our last post, we know there exists $y \in \mathbb{R}^m$,

$$d(y) = w \le - f_\ast(-A^\ast y) - g_\ast(y).$$

For every $\epsilon > 0$, we can find $\mu \in \mathcal{M}(\mathcal{X}\times\mathcal{Y})$ such that,

$$w + \epsilon \ge \int_{\mathcal{X}\times\mathcal{Y}} f(x) + g(y) \, \mathrm{d} \mu, ~\text{s.t.} \int_{\mathcal{X}} A(x) \, \mathrm{d}\mu = \int_{\mathcal{Y}} y \, \mathrm{d}\mu.$$

Construct,

$$\begin{align*} \Phi: &\mathcal{X}\times\mathcal{Y}\mapsto \mathbb{R} \times \mathbb{R}^m, \\ \Phi(x, y) &= (f(x) + g(y), Ax - y) \\ &= (I^{-1}\textstyle\sum_{i \in [I]} f_i(x_i) + g(y), I^{-1}\textstyle\sum_{i \in [I]} A_i x_i - y) \\ &= \left(I^{-1}\textstyle\sum_{i \in [I]} (f_i(x_i), A_i x_i)\right) + (g(y), -y) \end{align*}$$

The image is,

$$\begin{align*} \Phi(\mathcal{X}\times\mathcal{Y}) = I^{-1}\textstyle\sum_{i \in [I]} F_i + G, \\ F_i = \lbrace(f_i(x_i), A_i x_i) \mid x_i \in \mathcal{X}_i\rbrace, \quad G = \lbrace(g(y), -y) \mid y \in \mathcal{Y}\rbrace. \end{align*}$$

Using this notation, we see,

$$\begin{align*} (w + \epsilon, 0_m) &\in \mathrm{conv}(\Phi(\mathcal{X}\times\mathcal{Y}) + \mathbb{R}_{>0} \times \mathbb{R}^m)\\ &\quad = G + (\mathbb{R}_{>0} \times \mathbb{R}^m) + \mathrm{conv}(I^{-1}\textstyle\sum_{i \in [I]} F_i). \end{align*}$$

Reorganizing,

$$\exists c > 0, \quad (w + \epsilon - g(y) - c, y) \in \mathrm{conv}(I^{-1}\textstyle\sum_{i \in [I]} F_i).$$

Here we apply the Shapley-Folkman lemma for $\mathbb{R}^{m+1}$, so it indicates the decomposition for some $S$ with $|S| \le m+1$,

$$\begin{align*} w + \epsilon - g(y) - c &= I^{-1}\textstyle\sum_{i \in S} f_i(x_i) + I^{-1}\textstyle\sum_{i \notin S} f_i(x_i), \\ y &= I^{-1}\textstyle\sum_{i \in S} A_ix_i + I^{-1}\textstyle\sum_{i \notin S} A_ix_i, \\ \end{align*}$$

where $x_i \in \mathrm{conv}(F_i), i \in S$, and $x_i \in F_i, i \notin S$, this implies we have a discrete probability measure $\mu = (\mu_1,..., \mu_n)$, so that,

$$\int_{\mathcal{X}_i} x \, \mathrm{d} \mu_i = x_i, \forall i \in S.$$

Thus,

$$\begin{align*} w + \epsilon - g(y) - c &= I^{-1}\textstyle\sum_{i \in S} \int_{\mathcal{X}_i} f_i(x) \, \mathrm{d} \mu_i + I^{-1}\textstyle\sum_{i \notin S} f(x_i) \\ &= I^{-1}\textstyle\sum_{i \in S} \left(\int_{\mathcal{X}_i} f_i(x) \, \mathrm{d} \mu_i - f_i\left(x_i\right)\right) + I^{-1}\textstyle\sum_{i \in S} \left(f_i\left(x_i\right)\right) + I^{-1}\textstyle\sum_{i \notin S} f(x_i) \\ &= I^{-1}\textstyle\sum_{i \in S} \left(\int_{\mathcal{X}_i} f_i(x) \, \mathrm{d} \mu_i - f_i\left(x_i\right)\right) + I^{-1}\textstyle\sum_{i \in [I]} f(x_i). \end{align*}$$

This means,

$$\begin{align*} w + \epsilon &\ge I^{-1}\textstyle\sum_{i \in S} \left(\int_{\mathcal{X}_i} f_i(x) \, \mathrm{d} \mu_i - f_i\left(x_i\right)\right) + I^{-1}\textstyle\sum_{i \in [I]} f(x_i) + g(y) \\ &= I^{-1}\textstyle\sum_{i \in S} \left(\int_{\mathcal{X}_i} f_i(x) \, \mathrm{d} \mu_i - f_i\left(\int_{\mathcal{X}_i} x \, \mathrm{d} \mu_i\right)\right) + g(y) + f(x). \end{align*}$$

Taking the inf over right hand side, we have,

$$d(y) = w = \inf_{\epsilon > 0} \left\{w + \epsilon\right\} \ge -I^{-1}\textstyle\sum_{i \in S} \rho(f_i) + z^\ast.$$

Hence, we conclude,

$$z^\ast - d^\ast \le {\color{#BE5845}{(m+1)I^{-1}}} \max_{i \in [I]} \rho(f_i).$$

Application

Consider the following problem in the Euclidean space $\mathbb{R}^n$,

$$\begin{align*} z^\ast = \inf ~& f(x) = \textstyle\sum_{i \in [I]} f_i(x_i) \\ \text{s.t.}~ & \textstyle\sum_{i \in [I]} A_i x_i \le b, \\ & x_i \in \mathcal{X}_i, \forall i \in [I]. \end{align*}$$

Each $f_i$ is a lower semi-continuous function on $\mathcal{X}_i \subseteq \mathbb{R}^{n_i}$, and each $A_i \in \mathbb{R}^{m \times n_i}$ is a linear mapping to $\mathbb{R}^m$, $b \in \mathbb{R}^m$ is a "resource" vector. It is easy to rewrite the problem by using

$$Ax = \textstyle\sum_{i \in [I]} A_i x_i, \quad \mathcal{Y} = \lbrace u \in \mathbb{R}^m : u \le b \rbrace, \quad g \equiv 0.$$

On the other hand, consider the Lagrangian dual,

$$\begin{align*} d(y) &= \inf_{x_i \in \mathcal{X}_i} \left\{\textstyle\sum_{i \in [I]} f_i(x_i) + \langle y, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle\right\}, \quad y \in \mathbb{R}_+^m \\ &= - \textstyle\sum_{i \in [I]} f_{i,\ast}(-A_i^\ast y) - \langle y, b \rangle \end{align*}$$

which matches the general dual $d(y) = -f_\ast(-A^\ast y) - \sigma_{\ast,\mathcal{Y}}(y)$, since now

$$\sigma_{\ast,\mathcal{Y}}(y) = \sup_{u \le b} \langle y, u \rangle = \langle y, b \rangle, \forall y \ge 0.$$

Assuming Slater's condition holds, the main theorem gives, there exists some $y^\ast \in \mathbb{R}_+^m$ such that,

$$d(y^\ast) = d^\ast, \quad z^\ast - d^\ast \le {\color{#BE5845}{(m+1)}} \max_{i \in [I]} \rho(f_i).$$

The relative strength of a Lagrangian dual bound to the original problem is diminishing as the number of blocks $I$ increases.

References
  1. [1] Aubin, J.P., Ekeland, I.: Estimates of the duality gap in nonconvex optimization. Mathematics of Operations Research, 1(3), 225–245 (1976).