Some duality bounds for nonconvex problems, I

Introduction

We reproduce the famous paper [1]. Consider a real-valued lower semi-continuous function $f$ on a convex set $\mathcal{X} \subset \mathbb{E}$, where $\mathbb{E}$ is a Hausdorff linear space. We measure the following "convexity error" of $f$,

$$\rho(f) = \sup_{a \in \Delta^K} \left\{f\left(\textstyle\sum_{k \in [K]} a_k x_k\right) - \textstyle\sum_{k \in [K]} a_k f(x_k)\right\}, x_k \in \mathcal{X}, \forall k \in [K].$$

where we use a finite family of points $x_1, ...., x_K, K < \infty$, and $\Delta^{K}$ is the probability simplex of $\mathbb{R}_+^K$. If use the Dirac measure $\delta$,

$$\mathcal{M}(\mathcal{X}) = \left\{ \textstyle\sum_{k \in [K]} a_k \delta(\mathcal{X}_k) \mid a_k \in [0, 1], \textstyle\sum_{k \in [K]} a_k = 1 \right\}.$$

The set of discrete probability distributions on $\mu \in \mathcal{X}$. Then, $\rho(f)$ could be written as,

$$\rho(f) = \sup_{\mu \in \mathcal{M}(\mathcal{X})} f\left(\int x \, \mathrm{d} \mu\right) - \int f(x) \, \mathrm{d} \mu.$$

Minkowski sum. The Minkowski sum of two sets $A, B \subseteq \mathbb{E}$ is defined as

$$A + B = \lbrace a + b \mid a \in A, b \in B \rbrace.$$

Fenchel conjugate. The Fenchel conjugate of $f$ (assume at least it is lower semi-continuous) is defined as,

$$f_\ast(y) = \sup_{x \in \mathcal{X}} \left\{\langle y, x \rangle - f(x)\right\}, \quad y \in \mathbb{E}_\ast.$$

Suppose the indicator function,

$$\sigma_{\mathcal{Y}}(y) = \begin{cases} 0, & y \in \mathcal{Y}, \\ \infty, & y \notin \mathcal{Y}. \end{cases}$$

Then, the dual is called the support function of $\mathcal{Y}$,

$$\sigma_{^\ast,\mathcal{Y}}(y) = \sup_{y \in \mathbb{E}} \left\{\langle y, x \rangle - \sigma_{\mathcal{Y}}(x)\right\} = \sup_{y \in \mathbb{E}} \left\{\langle y, x \rangle, \; x \in \mathcal{X} \right\}.$$

Basic properties of $\rho$. The authors of [1] proved that,

$$\begin{align*} &0 \le \rho(f) \le \infty \\ &\rho(f) = 0 \iff f \text{ is convex}. \end{align*}$$

Main result. Consider, for lsc $f, g$, the primal problem

$$z^\ast = \inf ~ f(x) + g(Ax), ~\text{s.t.} ~ x \in \mathcal{X}, Ax \in \mathcal{Y}. \tag{P}$$

Here,

  • We allow $g$ to be convex.
  • $\mathcal{X} \subseteq \mathbb{E}$ is a convex set of a Hausdoff linear space $\mathbb{E}$.
  • $\mathcal{Y} \subseteq \mathbb{R}^m$ and $A \in \mathcal{L}(\mathbb{E}, \mathbb{R}^m)$ is a linear mapping to Euclidean space.

One can also read,

$$\text{(P)} \Leftrightarrow \inf_{x \in \mathcal{X}, y \in \mathcal{Y}} \left\{f(x) + g(y) \right\}, ~\text{s.t.} ~ Ax = y$$

So the dual function is,

$$\begin{align*} d(p) &= \inf_{x \in \mathcal{X}, y \in \mathcal{Y}} \left\{\langle p, Ax - y \rangle + f(x) + g(y) \right\} \\ &= - \sup_{x \in \mathcal{X}, y \in \mathcal{Y}} \left\{\langle -A^\ast p, x \rangle - f(x) + \langle p, y \rangle - g(y) \right\} \\ &= - f_\ast(-A^\ast p) - g_\ast(p). \end{align*}$$

and

$$d^\ast= \sup_{p \in \mathbb{R}^m} d(p). \tag{D}$$

If $g \equiv 0$, then the problem is,

$$z^\ast = \inf_{x \in \mathcal{X}, Ax \in \mathcal{Y}} ~ f(x) = \inf_{x \in \mathcal{X}} ~ f(x) + \sigma_{\mathcal{Y}}(Ax).$$

It is easy to see the dual problem is,

$$d^\ast= \sup_{p \in \mathbb{R}^m} d(p) = ~ - f_\ast(-A^\ast p) - \sigma_{\ast,\mathcal{Y}}(p),$$

Here $A^\ast$ is the adjoint of $A$,

$$A^\ast: \mathbb{R}^m \to \mathbb{E}_\ast,$$

(recall $\mathbb{E}_\ast$ is the dual space of $\mathbb{E}$). The goal here is to prove that,

If $0 \in \textrm{int}(A(\mathcal{X}) - \mathcal{Y})$ (Slater's condition, the interior of constraint set is nonempty), and if $g$ is convex, then,

$$0 \le z^\ast - d^\ast \le \rho(f).$$

Weak duality. By definition,

$$\begin{align*} \forall p \in \mathbb{R}^m, \quad d(p) &= \inf_{x \in \mathcal{X}, y \in \mathcal{Y}} \left\{\langle p, Ax - y \rangle + f(x) + g(y) \right\} \\ &\le \langle p, Ax - y \rangle + f(x) + g(y), ~ \forall x \in \mathcal{X}, y \in \mathcal{Y} \\ &\le f(x) + g(y), \quad \forall x \in \mathcal{X}, y \in \mathcal{Y}, Ax = y. \end{align*}$$

Hence,

$$d^\ast = \sup_{p \in \mathbb{R}^m} d(p) \le \inf_{x \in \mathcal{X}, Ax \in \mathcal{Y}} \left\{f(x) + g(Ax) \right\} = z^\ast.$$

Proof

Consider a mapping, $\Phi: \mathcal{X}\times\mathcal{Y} \to \mathbb{R}\times\mathbb{R}^m$,

$$\Phi(x, y) = (f(x) + g(y), Ax - y).$$

and using the discrete probability distribution over $\mathcal{X}\times\mathcal{Y}$, let $\mu \in \mathcal{M}(\mathcal{X}\times\mathcal{Y})$, and consider

$$\begin{align*} \mathbb{R} \ni w &= \inf \int_{\mathcal{X}\times\mathcal{Y}} f(x) + g(y) \, \mathrm{d} \mu, ~\text{s.t.}~ \int_{\mathcal{X}} A(x) \, \mathrm{d}\mu = \int_{\mathcal{Y}} y \, \mathrm{d}\mu \\ &= \inf \textstyle\sum_{i} a_i f(x_i) + a_i g(y_i), ~\text{s.t.}~ \textstyle\sum_{i} a_i A(x_i) = \textstyle\sum_{i} a_i y_i, ~ a_i \in \Delta^I. \end{align*}$$

The last equality simply uses an explicit probability vector $a$. Setting $\mu = \delta(x, y)$, we have,

$$w \le \inf_{x\in \mathcal{X}, y\in\mathcal{Y}} \left\{ f(x) + g(y), ~\text{s.t.}~ Ax = y\right\} = z^\ast.$$

Step I. Show that $\mathbb{R} \times \mathbb{R}^m \ni (w, 0)$ is not in the convex hull of

$$\Phi(\mathcal{X}\times\mathcal{Y}) + \mathbb{R}_{>0} \times \mathbb{R}^m,$$

the sum is in Minkowski sense. This means it enlarges the image by allowing the objective to strictly increase ($\mathbb{R}_{>0}$) and the constraint residual to shift freely ($\mathbb{R}^m$):

$$(w, 0) \notin \mathrm{conv}(\Phi + \mathbb{R}_+ \times \mathbb{R}^m).$$

Otherwise, there exists $\mu \in \mathcal{M}$, such that,

$$w > \inf \int_{\mathcal{X}\times\mathcal{Y}} f(x) + g(y) \, \mathrm{d} \mu, \quad \int_{\mathcal{X}} A(x) \, \mathrm{d}\mu = \int_{\mathcal{Y}} y \, \mathrm{d}\mu$$

which contradicts the definition of $w$.

Step II. Because $(w, 0) \notin \mathrm{conv}(\Phi + \mathbb{R}_+ \times \mathbb{R}^m)$, in the finite-dimensional space, there is a separation, i.e., there exists $(a, p) \in \mathbb{R} \times \mathbb{R}^m$ such that,

$$a w = a w + \langle p, 0 \rangle \le a (f(x) + g(y) + c) + \langle p, Ax - y \rangle, \quad \forall x \in \mathcal{X}, y \in \mathcal{Y}, c > 0.$$

First, $a$ cannot be zero. If so, then

$$\langle p, Ax - y \rangle \ge 0, \quad \forall x \in \mathcal{X}, y \in \mathcal{Y}.$$

Using the alternative lemma, we have since $0 \in \textrm{int}(A(\mathcal{X}) - \mathcal{Y})$,

$$\inf_{x\in \mathcal{X}, y\in\mathcal{Y}} \langle p, Ax - y \rangle < 0 \quad\text{or}\quad p = 0.$$

(This mimics the proof for strong duality in convex optimization with Slater's condition.) So we must have $a > 0$. Scaling both sides by $1/a$, we have,

$$\begin{align*} w &\le f(x) + g(y) + \langle p, Ax - y \rangle + c, \quad \forall x \in \mathcal{X}, y \in \mathcal{Y}, \quad p \leftarrow \tfrac{1}{a}p \\ &\le \inf_{x\in \mathcal{X}, y\in\mathcal{Y}} \left\{f(x) + g(y) + \langle p, Ax - y \rangle\right\} + \inf_{c\in\mathbb{R}_{>0}} c \\ &= \inf_{x\in \mathcal{X}, y\in\mathcal{Y}} \left\{f(x) + g(y) + \langle p, Ax - y \rangle\right\} \\ &= - f_\ast(-A^\ast p) - g_\ast(p). \end{align*}$$

Step III. For any $\epsilon > 0$, we have some $\mu$, such that,

$$\begin{align*} \int f(x) + g(y) \, \mathrm{d} \mu &\le w + \epsilon \\ \int A(x) \, \mathrm{d} \mu &= \int y \, \mathrm{d} \mu. \end{align*}$$

As such, note $A$ is linear, this means, taking the mean point $\int x , \mathrm{d} \mu = \bar x$,

$$A \bar x = A \int x \, \mathrm{d} \mu = \int A(x) \, \mathrm{d} \mu$$

and analogously for $y$, so we have,

$$\begin{align*} w + \epsilon & \ge f(\bar x) + g(\bar y) + \int f(x) + g(y) \, \mathrm{d} \mu - (f(\bar x) + g(\bar y)) \\ & \ge f(\bar x) + g(\bar y) - \sup_{x \in \mathcal{X}, y \in \mathcal{Y}} \left(f\left(\int x \, \mathrm{d} \mu\right) + g\left(\int y \, \mathrm{d} \mu\right) - \int f(x) + g(y) \, \mathrm{d} \mu\right) \\ & = f(\bar x) + g(\bar y) - \rho(f) - \rho(g). \end{align*}$$

Hence, we have,

$$\forall \epsilon > 0, \quad - f_\ast(-A^\ast p) - g_\ast(p) + \epsilon \ge f(\bar x) + g(\bar y) - \rho(f) - \rho(g).$$

Take infimum over both sides, we have,

$$\begin{align*} d(p) = \inf_{\epsilon > 0} \left\{ - f_\ast(-A^\ast p) + \epsilon \right\} &\ge \inf_{\bar x \in \mathcal{X}, \bar y \in \mathcal{Y}, A\bar x = \bar y} \left\{f(\bar x) + g(\bar y) - \rho(f) - \rho(g)\right\} \\ &= z^\ast - \rho(f) - \rho(g). \end{align*}$$

Hence, we conclude since $g$ is convex,

$$\boxed{z^\ast - d^\ast \le \rho(f) + \rho(g) = \rho(f).}$$

This completes the proof.

Application

An interesting application here, is if the original primal variable has some decomposition structure,

$$\mathcal{X} = \prod_{i \in [I]} \mathcal{X}_i, \quad f(x) = \tfrac{1}{I}\textstyle\sum_{i \in [I]} f_i(x_i), \quad A = \tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i.$$

and each $\mathcal{X}_i \subseteq \mathbb{E}_i$ is a convex set of a corresponding lower-dimensional linear space $\mathbb{E}_i$, and $A_i \in \mathcal{L}(\mathbb{E}_i, \mathbb{R}^m)$. Then, the problem reads,

$$\begin{align*} z^\ast = \inf ~&\left\{\tfrac{1}{I}\textstyle\sum_{i \in [I]} f_i(x_i) + g\left(\tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i x_i\right)\right\}, \\ \text{s.t.}~ &\tfrac{1}{I}\textstyle\sum_{i \in [I]} A_i x_i \in \mathcal{Y}, ~ x_i \in \mathcal{X}_i, \forall i \in [I]. \end{align*}$$

With some algebra, the dual function is,

$$d(p) = - \tfrac{1}{I} \textstyle\sum_{i \in [I]} f_{i,\ast}(-A_i^\ast p) - g_{i,\ast}(p).$$

So as we use the general theorem above, we have,

$$z^\ast - d^\ast \le \tfrac{1}{I} \textstyle\sum_{i \in [I]} \rho(f_i) \le \max_{i \in [I]} \rho(f_i).$$

The whole point is, we can find a much better bound,

$$z^\ast - d^\ast \le {\color{#BE5845}{(m+1)I^{-1}}} \max_{i \in [I]} \rho(f_i).$$

which depends on the dimension of image of linear mapping $A$ and diminishes as the number of blocks $I$ increases. This leaves as a question for our next note.

References
  1. [1] Aubin, J.P., Ekeland, I.: Estimates of the duality gap in nonconvex optimization. Mathematics of Operations Research, 1(3), 225–245 (1976).