Duality in choice and private consumption, I

We use Legendre-Fenchel duality to connect some interesting objects in discrete choice theory, convex analysis, and consumer theory. An interesting observation that motivates this note is the connection between a Multinomial Logit (MNL) model and a Walrasian consumer with CES (Constant Elasticity of Substitution) utility.

Setup and notation. Throughout, $n$ denotes the number of alternatives or goods. Write $[n] := {1,\ldots,n}$ for the index set, $\Delta_n := {\boldsymbol{\rho} \in \mathbb{R}^n_+ : \textstyle\sum_{j\in[n]} \rho_j = 1}$ for the unit simplex over $[n]$, and $\mathrm{int}(\Delta_n)$ for its relative interior. Write $\langle a, b\rangle := \textstyle\sum_{j\in[n]} a_j b_j$ for the Euclidean inner product on $\mathbb{R}^n$ and $\mathrm{diag}(\boldsymbol{\omega}) \in \mathbb{R}^{n\times n}$ for the diagonal matrix with diagonal $\boldsymbol{\omega} \in \mathbb{R}^n$. The componentwise softmax of $\boldsymbol{\omega} \in \mathbb{R}^n$ is $[\mathrm{softmax}(\boldsymbol{\omega})]_j := \tfrac{e^{\omega_j}}{\textstyle\sum_{k\in[n]} e^{\omega_k}}$.

Primal-dual interpretation of a MNL model

Let $\boldsymbol{\omega} \in \mathbb{R}^n$ be a deterministic-utility vector and $\boldsymbol{\rho} \in \Delta_n$ a distribution over $[n]$, define a pair of convex functions $(W, W_\ast)$,

$$\begin{aligned} W(\boldsymbol{\omega}) &:= \log \textstyle\sum_{j\in[n]} e^{\omega_j}, \\ W_\ast(\boldsymbol{\rho}) &:= \textstyle\sum_{j\in[n]} \rho_j \log \rho_j ~~\text{ on }\Delta_n,~~+\infty\text{ otherwise}. \end{aligned}$$

The pair $(W, W_\ast)$ are Fenchel-Legendre duals, i.e.,

$$W(\boldsymbol{\omega}) = \max_{\boldsymbol{\rho} \in \Delta_n} \left\{ \langle \boldsymbol{\omega}, \boldsymbol{\rho}\rangle - W_\ast(\boldsymbol{\rho}) \right\},$$

with unique maximizer given by

$$\boldsymbol{\rho}(\boldsymbol{\omega}) = \nabla W(\boldsymbol{\omega}) = \mathrm{softmax}(\boldsymbol{\omega}), \qquad \rho_j(\boldsymbol{\omega}) = \tfrac{e^{\omega_j}}{\textstyle\sum_{k\in[n]} e^{\omega_k}}.$$

McFadden's Surplus and RUM. Following McFadden [1], let us introduce the Random Utility Maximization (RUM) model. Suppose each $\omega_j = \omega_j(\mathbf{z};\boldsymbol{\beta}) \in \mathbb{R}$ is the deterministic utility on the alternative $j \in [n]$. Here, $\omega_j$ could depend on observables (like prices) $\mathbf{z} \in \mathcal{Z}$ and parameters $\boldsymbol{\beta} \in \mathcal{B}$, with i.i.d. standard Gumbel shocks $\varepsilon_j$ and total utility $\xi_j = \omega_j(\mathbf{z};\boldsymbol{\beta}) + \varepsilon_j$. Then we could see, the logsumexp function above can be written as

$$W(\boldsymbol{\omega}) = \mathbb{E}_{\boldsymbol{\varepsilon}}[\textstyle\max_j \omega_j + \varepsilon_j] = \log \textstyle\sum_{j\in[n]} e^{\omega_j}.$$

This function is known as the McFadden's surplus function. It is apparent that $W(\cdot)$ is smooth and convex (composition of expectation and pointwise maximum). It has a closed form gradient, known as the choice probability [2],

$$\boldsymbol{\rho} = \mathrm{softmax}(\boldsymbol{\omega}) = \nabla W(\boldsymbol{\omega}).$$

The second equality is known as the Daly-Zachary-Williams theorem whenever the surplus function $W(\cdot)$ is differentiable with respect to $\boldsymbol{\omega}$. The argument was established in the 1970s [2]. However, actually, one can use the duality arguments in convex analysis to prove this result.

The Perturbation Function and Fictitious Play. The dual function $W_\ast$ is a local utility maximization problem with negative Shannon entropy on the simplex, viz.

$$\boldsymbol{\rho} = \arg\max_{\boldsymbol{\rho} \in \Delta_n} \left\{ \langle \boldsymbol{\omega}, \boldsymbol{\rho}\rangle - W_\ast(\boldsymbol{\rho}) \right\}.$$

Taking the gradient of the perturbation function, we get

$$\nabla W_\ast(\boldsymbol{\rho}) = \nabla(\langle \boldsymbol{\rho}, \log \boldsymbol{\rho}\rangle) = \log\boldsymbol{\rho} + \mathbf{1} = \boldsymbol{\omega} - \log(\langle e^{\boldsymbol{\omega}}, \mathbf{1}\rangle).$$

The name perturbation function comes from the nomenclature in Fudenberg and Levine [3]. Under very mild conditions on $\varepsilon_j$, as long as the choice probability $\boldsymbol{\rho}(\boldsymbol{\omega})$ is continuously differentiable with respect to $\boldsymbol{\omega}$, there always exists a deterministic perturbation $W_\ast$, so that the choice probability is the best response of the perturbed utility; see, e.g., Hofbauer and Sandholm [4]. It is because of this perturbation interpretation that one could establish the convergence of stochastic fictitious play using Lyapunov-type arguments [4].
The differentiability requirement on $\boldsymbol{\rho}(\cdot)$ is for convenience to show the Jacobian matrix of the choice probability is symmetric. Clearly, this should not be sufficient and necessary. A simple sufficient condition to make this happen is that $\boldsymbol{\omega}$ is independent of $\boldsymbol{\varepsilon}$ and the distribution of $\boldsymbol{\varepsilon}$ is absolutely continuous with respect to the Lebesgue measure [2] (a standalone proof can also be found in Shi, Shum and Song [5]).

The CES consumer is a special MNL model

In the Walrasian setting, consider a CES consumer at prices $\mathbf{p} \in \mathcal{P} \subseteq \mathbb{R}^n_{+}$ with budget $w > 0$, intercepts $\mathbf{c} \in \mathbb{R}^n_{++}$, and substitution exponent $r \in (-\infty, 1)$. The CES utility and its negative logarithm are

$$u(\mathbf{x}) := \left(\textstyle\sum_{j\in[n]} c_j x_j^r\right)^{\tfrac{1}{r}}, \qquad v(\mathbf{x}) := -\log u(\mathbf{x}).$$

Because the CES is a homothetic utility function, $v(\cdot)$ is logarithmically homogeneous; Zhang, He, Jiang and Ye [6] connect this to logarithmic homogeneous barrier functions:

$$v_\ast(\mathbf{p}) := \max_{\mathbf{x} \in \mathbb{R}^n_+} \left\{ -\langle \mathbf{p}, \mathbf{x}\rangle - v(\mathbf{x}) \right\},$$

and the envelope theorem gives $\nabla v_\ast(\mathbf{p}) = -\tfrac{\mathbf{x}(\mathbf{p})}{w}$ where $\mathbf{x}(\mathbf{p}) \in \mathbb{R}^n_+$ is the demand at budget $w$. Walras's law $\langle \mathbf{p}, \mathbf{x}(\mathbf{p})\rangle = w$ then yields the spending-share map as a rescaled gradient (writing $\mathbf{P} := \mathrm{diag}(\mathbf{p})$),

$$\mathrm{int}(\Delta_n) \ni \boldsymbol{\gamma}(\mathbf{p}) := \tfrac{\mathbf{P}\mathbf{x}(\mathbf{p})}{w} = -\mathbf{P}\nabla v_\ast(\mathbf{p}).$$

Reparameterize via

$$\sigma := \tfrac{r}{1-r} \in (-1, \infty), \quad y_j := \tfrac{1}{1-r}\log c_j, \quad \mathbf{z} := \log\mathbf{p}.$$

It is easy to see, by writing the deterministic utility (so the price is the only observable),

$$\omega_j(\mathbf{p}; \mathbf{y}, \sigma) := y_j - \sigma\log p_j,$$

the spending share is the softmax of an affine function of log-prices,

$$\boxed{\gamma_j(\mathbf{p}) = \rho_j(\boldsymbol{\omega}).}$$

It is very easy to understand. For a consumer with CES utility and Walrasian budget, the choice probability simply means how the budget is allocated to each alternative. To our knowledge, the correspondence between the CES and the MNL can also be found in Anderson, de Palma and Thisse [7] (Chapter 3.7).

General, but non-testable case

What about the general case where $u$ is not CES? Note that CES is the case where we can define what the observable is, and what is needed to recover the parametric form of the deterministic utility $\boldsymbol{\omega}(\cdot)$. In fact, a utility function yielding a single-valued demand, hence a single-valued spending-share distribution in the interior of the unit simplex, admits a softmax representation by direct inversion.

Suppose the utility function $u: \mathbb{R}^n_+ \to \mathbb{R}$ is continuous, strictly concave, and locally non-satiated, such that the demand $\mathbf{x}(\mathbf{p}) \in \mathbb{R}^n_+$ is unique. Then, for any $\mathbf{p} \in \mathbb{R}^n_{>0}$ and positive budget $w > 0$, the spending shares is a single-valued function in the interior of the unit simplex:

$$\boldsymbol{\gamma}(\mathbf{p}) := \tfrac{\mathbf{P}\mathbf{x}(\mathbf{p})}{w} \in \mathrm{int}(\Delta_n).$$

Besides, $\boldsymbol{\gamma}(\cdot)$ is Lipschitz continuous in $\mathbf{p}$. Furthermore, $\boldsymbol{\gamma}$ is the choice probability of a MNL model with deterministic utility $\omega_j(\mathbf{p}, w) := \log\gamma_j(\mathbf{p}, w)$.

The lemma above is just an algebraic identity: the choice $\omega_j := \log\gamma_j(\mathbf{p}, w)$ is tautological, with $\boldsymbol{\omega}$ already encoding the whole share map. The Lipschitzness follows from the fact that $\boldsymbol{\gamma}(\cdot)$ is always bounded, while the demand function must be non-Lipschitz in $\mathbf{p}$. For the share $\boldsymbol{\gamma}$ to be a choice probability, we don't even have to restrict to MNLs; it just has to be anything that comes from the gradient of a surplus function. In essence, for any homothetic demand functions, we can show that $\boldsymbol{\gamma}(\cdot)$ is a symmetric monotone operator with respect to log prices $\log(\mathbf{p})$; thus, a surplus function must exist.

Identifiability and information dependency. In the Walrasian setting, the choices to make a RUM informative and testable are very limited. If we respect the private information of the consumer, we only have access to $(\mathbf{p}, \mathbf{x}, w)$. So, for a RUM, $\mathbf{z}$ only contains $(\mathbf{p}, w)$ (if we assume non-satiation, then $w$ is trivial and $\mathbf{z} = \varphi(\mathbf{p})$ for some function $\varphi$); a natural choice is to take $\mathbf{z} = \log(\mathbf{p})$. With this choice, we still have to pick a specification, the parametric form of $\boldsymbol{\omega}$ as a function of $(\mathbf{z}, w)$. The CES function is a choice where we assume $\boldsymbol{\omega}$ is linear in $\mathbf{z}$; it is also a very special linear case.

References
  1. [1] McFadden, D.: Conditional logit analysis of qualitative choice behavior. In: Frontiers in Econometrics. Academic Press (1974).
  2. [2] McFadden, D.: Econometric models of probabilistic choice. In: Structural Analysis of Discrete Data with Econometric Applications. MIT Press, Cambridge, MA (1981).
  3. [3] Fudenberg, D., Levine, D. K.: The Theory of Learning in Games. MIT Press, Cambridge, MA (1998).
  4. [4] Hofbauer, J., Sandholm, W. H.: On the global convergence of stochastic fictitious play. Econometrica. 70(6), 2265-2294 (2002).
  5. [5] Shi, X., Shum, M., Song, W.: Estimating semi-parametric panel multinomial choice models using cyclic monotonicity. Econometrica. 86(2), 737-761 (2018).
  6. [6] Zhang, C., He, C., Jiang, B., Ye, Y.: The second-order tâtonnement: decentralized interior-point methods for market equilibrium. arXiv:2508.04822 (2025).
  7. [7] Anderson, S. P., de Palma, A., Thisse, J.-F.: The representative consumer approach. In: Discrete Choice Theory of Product Differentiation. MIT Press (1992).