Some duality bounds for nonconvex problems, III

Introduction

In this note we discuss a typical class of problem where the convexity is lost. Similar to two previous notes (part I, part II), we consider the case where the problem is block-structured and the number of blocks are large. But now we focus on how to provide solutions, not just bounds.

Consider block-structured mixed-integer programming (MIP) problem,

$$\begin{align*} z^\ast = \min_{x \in \mathbb{R}^n} ~ f(x) =& \textstyle\sum_{i \in [I]} f_i(x_i) \\ \text{s.t.}~ & \textstyle\sum_{i \in [I]} A_i x_i \le b, \\ & x_i \in \mathcal{X}_i \subseteq \mathbb{R}^{n_i}, \forall i \in [I], \end{align*}$$

where $\textstyle\sum_{i \in [I]} n_i = n$, $x = (x_1, ..., x_I)$ and $x_i \in \mathbb{R}^{n_i}$ (some maybe integers), and $b_i \in \mathbb{R}^{m}$ and each $A_i \in \mathbb{R}^{m \times n_i}$. Each $f_i$ is a linear function. We make the following assumptions.

For any $i \in [I]$, the following problem is easy (e.g., in polynomial time) for any $c \in \mathbb{R}^{n_i}$:

$$\begin{align*} z_i(c) = \min_{x_i \in \mathcal{X}_i} ~& f_i(x_i) + \langle c, x_i \rangle \\ \text{s.t.}~ & x_i \in \mathcal{X}_i \end{align*}$$

We could use the following oracle,

$$x_i \in x_i(c_i) = \arg \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) + \langle c_i, x_i \rangle.$$

Because $\mathcal{X}_i$ could be discrete, $f_i$ is linear, we could say we have the access to

$$\begin{align*} z_i(c) = \min_{x_i \in \mathcal{X}_i} ~& f_i(x_i) + \langle c, x_i \rangle \\ \text{s.t.}~ & x_i \in \mathrm{conv}(\mathcal{X}_i). \end{align*}$$

Lagrangian dual. The Lagrangian, dual function, takes the form,

$$\begin{align*} \mathcal{L}(x, \lambda) &= \textstyle\sum_{i \in [I]} f_i(x_i) + \langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} f_i(x_i) + \langle A_i^\top\lambda, x_i \rangle \\ d(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \qquad \text{denote } x(\lambda) \in \arg \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} z_i(A_i^\top\lambda) \\ d^\ast &= \max_{\lambda \in \mathbb{R}_+^m} d(\lambda). \end{align*}$$

Standard results in integer programming tell us we only have weak duality,

$$d(\lambda) \le d^\ast \le z^\ast.$$

The Lagrangian dual bound $d^\ast$ is equal to the bound provided by a Danzig-Wolfe procedure (i.e., from a column generation method), because it comes from the same convexification of the original problem. If $\mathcal{X}_i$ is discrete there is not much we can say about the gap $z^\ast - d^\ast$. Even if you obtain a good dual solution, $\lambda^\ast$, the minimizer $x(\lambda^\ast)$ may not be feasible to the original problem.

Augmented Lagrangian dual. The augmented Lagrangian, dual function, takes the form,

$$\begin{align*} \mathcal{L}_\rho(x, \lambda) &= \textstyle\sum_{i \in [I]} f_i(x_i) + \langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle + \tfrac{\rho}{2}\left\|\left(\textstyle\sum_{i \in [I]} A_i x_i - b\right)_+\right\|^2, \quad \rho > 0, \\ d_\rho(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}_\rho(x, \lambda) \\ d_\rho^\ast &= \max_{\lambda \in \mathbb{R}_+^m} d_\rho(\lambda). \end{align*}$$

This means AL add a penalty to the plain Lagrangian dual, here, the penalty $\psi(\cdot) = |\cdot|^2_2$. In general, AL still provides a lower bound,

$$\begin{align*} \forall \lambda \in \mathbb{R}_+^m, \quad d_\rho(\lambda) &\le \min_{x_i \in \mathcal{X}_i, \forall i \in [I]} \mathcal{L}_\rho(x, \lambda), ~\text{s.t.}~ \textstyle\sum_{i \in [I]} A_i x_i \le b \\ &\le \min_{x_i \in \mathcal{X}_i, \forall i \in [I]} f(x), ~\text{s.t.}~ \textstyle\sum_{i \in [I]} A_i x_i \le b \\ &= z^\ast. \end{align*}$$

The second inequality is true because at any feasible solution to the original problem,

$$\mathcal{L}_\rho(x, \lambda) = f(x) + \underbrace{\langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle}_{\le 0} + \underbrace{\tfrac{\rho}{2}\left\|\left(\textstyle\sum_{i \in [I]} A_i x_i - b\right)_+\right\|^2}_{= 0} \le f(x).$$

Compared to Linear Relaxation $z_{\textrm{LP}}$ (which is, relax $\mathcal{X}_i$ to be pure real number sets), It is well known that

$$z_{\textrm{LP}} \le d^\ast \le d_\rho^\ast \le z^\ast.$$

Compared to the Lagrangian dual, AL improves the lower bound, but also loses the separability. The benefit of introducing the penalty is, under some mild condition [3], we could have the "exact penalty" property, i.e., $d_\rho^\ast = z^\ast$ for some $\rho < \infty$. Generally, the size of $\rho$ depends on:

  • Error bound conditions, or non-degeneracy, or some constraint qualification to make the dual variable bounded. We found a special observation in [5] when dealing with the high-speed railway scheduling problem.
  • This is also interesting because for pure penalty methods (without the dual variable), it is very rare that "exact penalty" goes with a differentiable $\psi$. For nonlinear continuous problems, $\|\cdot\|_1$ sometimes induces exact penalty property while $\|\cdot\|_2$ does not. Also see [2].

For a practitioner, if it is (almost) exact penalty, any solution of $x(\lambda^\ast)$ corresponding to $d_\rho^\ast(\lambda^\ast)$ is a feasible solution of the original problem (primal recovery).

Problem with many blocks

In multi-agent systems, $A_i$ is the consumption of resource of $i$-th agent. In this case, the original problem (for $z^\ast$) is a "centralized" resource allocation problem. The dual method here can be seen as finding some price vector $p = A_i^\top\lambda$ so the process is "decentralized", i.e., each agent can decide how much resource to consume solely from the price-taking behavior.

$$\begin{align*} d(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} z_i(p), \quad p = A_i^\top\lambda. \end{align*}$$

Instead of centralized allocation (which is expensive), we only update $\lambda$ and collect the "best response" of each agent.

If we know $\mathrm{conv}(\mathcal{X}_i)$ is bounded, then it can be convexified to a finite set of points. The "expenditure-minimization" problem is,

$$\begin{align*} \forall c, \quad z_i(c) &= \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) + \langle c, x_i \rangle \\ &= \min_{x_i \in \mathrm{conv}(\mathcal{X}_i)} ~ f_i(x_i) + \langle c, x_i \rangle \\ &= \min_{\lambda_i \in \mathbb{R}_+^{|E_i|}} ~ f_i(x_i) + \langle c, x_i \rangle, ~ x_i = \textstyle\sum_{e \in E_i} \lambda_i^e x_i^e, \textstyle\sum_{e \in E_i} \lambda_i^e = 1. \end{align*}$$

Here we enumerate the extreme points of $\mathcal{X}_i$ as $x_i^1, ..., x_i^{|E_i|}$, which suffices to represent $x_i$ since it is bounded; otherwise we need extreme rays. The following result is due to [4] but actually from [1] with a specialization to produce the $\rho_i$ (see previous two notes.)

For any $i$,

$$\rho_i = \max_{x_i \in \mathcal{X}_i} ~ f_i(x_i) - \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) < \infty.$$

assume that for any $x_i \in \mathrm{conv}(\mathcal{X}_i)$, there exists $x_i' \in \mathcal{X}_i$ such that,

$$A_i x_i' \le A_i x_i.$$

Then, the Lagrangian dual bound satisfies,

$$z^\ast - d^\ast \le {\color{#BE5845}{m}} \max_{i \in [I]} \rho_i.$$

Hence, if $z^\ast = \Omega(|I|)$, and all $\rho_i \le \Theta(1)$, we have,

$$\tfrac{z^\ast - d^\ast}{z^\ast} \le \mathcal{O}({\color{#BE5845}{m}} |I|^{-1}).$$

This means, as we increase the number of blocks, the gap between $z^\ast$ and $d^\ast$ is decreasing. The problem is, this result is nonconstructive: it does not provide a procedure to compute $x_i'$ that is feasible to the original problem.

There is an interesting way to fix this in [4]. First, lower the resource vector

$$\exists \gamma \in \mathbb{R}_+^m, \qquad \bar b = b - \gamma.$$

Then, one can solve the Lagrangian dual problem with $\bar b$, collect all $x_{\bar b}(\lambda(\bar b))$. It is shown to be feasible to the original problem. The strength of this approach is guaranteed by an argument as follows.

$$\tfrac{z^\ast(\bar b) - z^\ast}{z^\ast} \le (m + {\color{#BE5845}{\mathcal{O}(\|\gamma\|_\infty)}}) \cdot {\color{#BE5845}{|I|^{-1}}} \cdot \max_{i \in [I]} \rho_i.$$

The gap of such a heuristic is diminishing as the number of blocks increases.

References
  1. [1] Aubin, J.P., Ekeland, I.: Estimates of the duality gap in nonconvex optimization. Mathematics of Operations Research, 1(3), 225–245 (1976).
  2. [2] Bertsekas, D.P.: Necessary and sufficient conditions for a penalty method to be exact. Mathematical Programming, 9(1), 87–99 (1975).
  3. [3] Feizollahi, M.J., Ahmed, S., Sun, A.: Exact augmented Lagrangian duality for mixed integer linear programming. Mathematical Programming, 161(1-2), 365–387 (2017).
  4. [4] Vujanic, R., Mohajerin Esfahani, P., Goulart, P.J., Mariéthoz, S., Morari, M.: A decomposition method for large scale MILPs, with performance guarantees and a power system application. Automatica, 67, 144–156 (2016).
  5. [5] Wang, R., Zhang, C., Pu, S., Gao, J., Wen, Z.: A customized augmented Lagrangian method for block-structured integer programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 9439–9455 (2024).