Some duality bounds for nonconvex problems, III
Introduction
In this note we discuss a typical class of problem where the convexity is lost. Similar to two previous notes (part I, part II), we consider the case where the problem is block-structured and the number of blocks are large. But now we focus on how to provide solutions, not just bounds.
Consider block-structured mixed-integer programming (MIP) problem,
$$\begin{align*} z^\ast = \min_{x \in \mathbb{R}^n} ~ f(x) =& \textstyle\sum_{i \in [I]} f_i(x_i) \\ \text{s.t.}~ & \textstyle\sum_{i \in [I]} A_i x_i \le b, \\ & x_i \in \mathcal{X}_i \subseteq \mathbb{R}^{n_i}, \forall i \in [I], \end{align*}$$
where $\textstyle\sum_{i \in [I]} n_i = n$, $x = (x_1, ..., x_I)$ and $x_i \in \mathbb{R}^{n_i}$ (some maybe integers), and $b_i \in \mathbb{R}^{m}$ and each $A_i \in \mathbb{R}^{m \times n_i}$. Each $f_i$ is a linear function. We make the following assumptions.
$$\begin{align*} z_i(c) = \min_{x_i \in \mathcal{X}_i} ~& f_i(x_i) + \langle c, x_i \rangle \\ \text{s.t.}~ & x_i \in \mathcal{X}_i \end{align*}$$
We could use the following oracle,$$x_i \in x_i(c_i) = \arg \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) + \langle c_i, x_i \rangle.$$
Because $\mathcal{X}_i$ could be discrete, $f_i$ is linear, we could say we have the access to$$\begin{align*} z_i(c) = \min_{x_i \in \mathcal{X}_i} ~& f_i(x_i) + \langle c, x_i \rangle \\ \text{s.t.}~ & x_i \in \mathrm{conv}(\mathcal{X}_i). \end{align*}$$
Lagrangian dual. The Lagrangian, dual function, takes the form,
$$\begin{align*} \mathcal{L}(x, \lambda) &= \textstyle\sum_{i \in [I]} f_i(x_i) + \langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} f_i(x_i) + \langle A_i^\top\lambda, x_i \rangle \\ d(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \qquad \text{denote } x(\lambda) \in \arg \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} z_i(A_i^\top\lambda) \\ d^\ast &= \max_{\lambda \in \mathbb{R}_+^m} d(\lambda). \end{align*}$$
Standard results in integer programming tell us we only have weak duality,
$$d(\lambda) \le d^\ast \le z^\ast.$$
The Lagrangian dual bound $d^\ast$ is equal to the bound provided by a Danzig-Wolfe procedure (i.e., from a column generation method), because it comes from the same convexification of the original problem. If $\mathcal{X}_i$ is discrete there is not much we can say about the gap $z^\ast - d^\ast$. Even if you obtain a good dual solution, $\lambda^\ast$, the minimizer $x(\lambda^\ast)$ may not be feasible to the original problem.
Augmented Lagrangian dual. The augmented Lagrangian, dual function, takes the form,
$$\begin{align*} \mathcal{L}_\rho(x, \lambda) &= \textstyle\sum_{i \in [I]} f_i(x_i) + \langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle + \tfrac{\rho}{2}\left\|\left(\textstyle\sum_{i \in [I]} A_i x_i - b\right)_+\right\|^2, \quad \rho > 0, \\ d_\rho(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}_\rho(x, \lambda) \\ d_\rho^\ast &= \max_{\lambda \in \mathbb{R}_+^m} d_\rho(\lambda). \end{align*}$$
This means AL add a penalty to the plain Lagrangian dual, here, the penalty $\psi(\cdot) = |\cdot|^2_2$. In general, AL still provides a lower bound,
$$\begin{align*} \forall \lambda \in \mathbb{R}_+^m, \quad d_\rho(\lambda) &\le \min_{x_i \in \mathcal{X}_i, \forall i \in [I]} \mathcal{L}_\rho(x, \lambda), ~\text{s.t.}~ \textstyle\sum_{i \in [I]} A_i x_i \le b \\ &\le \min_{x_i \in \mathcal{X}_i, \forall i \in [I]} f(x), ~\text{s.t.}~ \textstyle\sum_{i \in [I]} A_i x_i \le b \\ &= z^\ast. \end{align*}$$
The second inequality is true because at any feasible solution to the original problem,
$$\mathcal{L}_\rho(x, \lambda) = f(x) + \underbrace{\langle \lambda, \textstyle\sum_{i \in [I]} A_i x_i - b \rangle}_{\le 0} + \underbrace{\tfrac{\rho}{2}\left\|\left(\textstyle\sum_{i \in [I]} A_i x_i - b\right)_+\right\|^2}_{= 0} \le f(x).$$
Compared to Linear Relaxation $z_{\textrm{LP}}$ (which is, relax $\mathcal{X}_i$ to be pure real number sets), It is well known that
$$z_{\textrm{LP}} \le d^\ast \le d_\rho^\ast \le z^\ast.$$
Compared to the Lagrangian dual, AL improves the lower bound, but also loses the separability. The benefit of introducing the penalty is, under some mild condition [3], we could have the "exact penalty" property, i.e., $d_\rho^\ast = z^\ast$ for some $\rho < \infty$. Generally, the size of $\rho$ depends on:
- Error bound conditions, or non-degeneracy, or some constraint qualification to make the dual variable bounded. We found a special observation in [5] when dealing with the high-speed railway scheduling problem.
- This is also interesting because for pure penalty methods (without the dual variable), it is very rare that "exact penalty" goes with a differentiable $\psi$. For nonlinear continuous problems, $\|\cdot\|_1$ sometimes induces exact penalty property while $\|\cdot\|_2$ does not. Also see [2].
For a practitioner, if it is (almost) exact penalty, any solution of $x(\lambda^\ast)$ corresponding to $d_\rho^\ast(\lambda^\ast)$ is a feasible solution of the original problem (primal recovery).
Problem with many blocks
In multi-agent systems, $A_i$ is the consumption of resource of $i$-th agent. In this case, the original problem (for $z^\ast$) is a "centralized" resource allocation problem. The dual method here can be seen as finding some price vector $p = A_i^\top\lambda$ so the process is "decentralized", i.e., each agent can decide how much resource to consume solely from the price-taking behavior.
$$\begin{align*} d(\lambda) &= \min_{x \in \mathbb{R}^n} \mathcal{L}(x, \lambda) \\ &= -\langle \lambda, b \rangle + \textstyle\sum_{i \in [I]} z_i(p), \quad p = A_i^\top\lambda. \end{align*}$$
Instead of centralized allocation (which is expensive), we only update $\lambda$ and collect the "best response" of each agent.
If we know $\mathrm{conv}(\mathcal{X}_i)$ is bounded, then it can be convexified to a finite set of points. The "expenditure-minimization" problem is,
$$\begin{align*} \forall c, \quad z_i(c) &= \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) + \langle c, x_i \rangle \\ &= \min_{x_i \in \mathrm{conv}(\mathcal{X}_i)} ~ f_i(x_i) + \langle c, x_i \rangle \\ &= \min_{\lambda_i \in \mathbb{R}_+^{|E_i|}} ~ f_i(x_i) + \langle c, x_i \rangle, ~ x_i = \textstyle\sum_{e \in E_i} \lambda_i^e x_i^e, \textstyle\sum_{e \in E_i} \lambda_i^e = 1. \end{align*}$$
Here we enumerate the extreme points of $\mathcal{X}_i$ as $x_i^1, ..., x_i^{|E_i|}$, which suffices to represent $x_i$ since it is bounded; otherwise we need extreme rays. The following result is due to [4] but actually from [1] with a specialization to produce the $\rho_i$ (see previous two notes.)
$$\rho_i = \max_{x_i \in \mathcal{X}_i} ~ f_i(x_i) - \min_{x_i \in \mathcal{X}_i} ~ f_i(x_i) < \infty.$$
assume that for any $x_i \in \mathrm{conv}(\mathcal{X}_i)$, there exists $x_i' \in \mathcal{X}_i$ such that,$$A_i x_i' \le A_i x_i.$$
Then, the Lagrangian dual bound satisfies,$$z^\ast - d^\ast \le {\color{#BE5845}{m}} \max_{i \in [I]} \rho_i.$$
Hence, if $z^\ast = \Omega(|I|)$, and all $\rho_i \le \Theta(1)$, we have,
$$\tfrac{z^\ast - d^\ast}{z^\ast} \le \mathcal{O}({\color{#BE5845}{m}} |I|^{-1}).$$
This means, as we increase the number of blocks, the gap between $z^\ast$ and $d^\ast$ is decreasing. The problem is, this result is nonconstructive: it does not provide a procedure to compute $x_i'$ that is feasible to the original problem.
There is an interesting way to fix this in [4]. First, lower the resource vector
$$\exists \gamma \in \mathbb{R}_+^m, \qquad \bar b = b - \gamma.$$
Then, one can solve the Lagrangian dual problem with $\bar b$, collect all $x_{\bar b}(\lambda(\bar b))$. It is shown to be feasible to the original problem. The strength of this approach is guaranteed by an argument as follows.
$$\tfrac{z^\ast(\bar b) - z^\ast}{z^\ast} \le (m + {\color{#BE5845}{\mathcal{O}(\|\gamma\|_\infty)}}) \cdot {\color{#BE5845}{|I|^{-1}}} \cdot \max_{i \in [I]} \rho_i.$$
The gap of such a heuristic is diminishing as the number of blocks increases.
- [1] Estimates of the duality gap in nonconvex optimization. Mathematics of Operations Research, 1(3), 225–245 (1976).
- [2] Necessary and sufficient conditions for a penalty method to be exact. Mathematical Programming, 9(1), 87–99 (1975).
- [3] Exact augmented Lagrangian duality for mixed integer linear programming. Mathematical Programming, 161(1-2), 365–387 (2017).
- [4] A decomposition method for large scale MILPs, with performance guarantees and a power system application. Automatica, 67, 144–156 (2016).
- [5] A customized augmented Lagrangian method for block-structured integer programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 9439–9455 (2024).