Learning revealed preferences
A replicated note on PAC learning of demand functions
Introduction
This is a replicated note of [1]. Consider a market with $n$ divisible goods. A buyer reveal his demand $\mathbf{x}(\mathbf{p}) \in \mathcal{X} = [0, 1]^n$ for a price vector $\mathbf{p} \in \mathbb{R}_+^n$. The demand is assumed to be Walrasian,
$$\mathbf{x}(\mathbf{p}) = \arg\max_{\mathbf{x} \in \mathcal{X}} u(\mathbf{x}) \quad \text{s.t.}\ \langle \mathbf{p}, \mathbf{x} \rangle \le w.$$
For some concave utility function $u$. Let $\mathcal{U}$ be a class of utility functions, and $\mathcal{H}$ be corresponding demand functions. Important classes include:
- Linear utility:
$$\mathcal{U}_{\mathrm{L}} = \left\{u(\mathbf{x}) = \langle \mathbf{c}, \mathbf{x} \rangle: \mathbf{c} \in \mathbb{R}_+^n\right\}.$$
- CES utility:
$$\mathcal{U}_{\mathrm{CES}} = \left\{u(\mathbf{x}) = \left[\sum_{j\in[n]} c_j (x_j)^r\right]^{\frac{1}{r}}: c \in \mathbb{R}_+^n, r \in (-\infty, 1)\right\}.$$
Typically, the full preferences of the buyer are not known, and we only observe the demand $\mathbf{x}(\mathbf{p}_k)$ at prices $\mathbf{p}_k$. We are interested in learning the parameters of the utility function from the observed demand. We could have two approaches:
- We are powerful enough to freely set the price $\mathbf{p}$ and observe the demand $\mathbf{x}(\mathbf{p})$.
- We are only passive observers, and can only observe the demand $\mathbf{x}(\mathbf{p}_k)$ at prices $\mathbf{p}_k$. If one is able to set price $\mathbf{p}$ freely, then the problem is not a statistical learning problem. In fact, you can even recover $m$ players' preferences from the aggregate demand $\mathbf{g}(\mathbf{p}) = \sum_{i\in[m]} \mathbf{x}_i(\mathbf{p})$, without any access on the individual preferences [3].
Let us focus on the statistical approach, assuming we have access to the demand data
$$\mathbf{\Xi} = \{\mathbf{v}_k\}_{k=1}^K, \quad \mathbf{v}_k = (\mathbf{p}_k, \mathbf{x}_k, w_k) \in \mathcal{V}.$$
The budget $w_k$ may not be needed. We would assume that the samples $k=1,...,K$ are i.i.d. from some unknown distribution $\nu$. Note that the i.i.d. setup here is not very reasonable (How can you do that in real world?). Very often, $\mathbf{p}$ is set according the the past information in the market, we could imagine there is certain endogeneity between the prices we observe now and what we will see in the future. But here let us focus on this simple setting.
Learning Problem
Consider each revealed preference as a sample over domain $\mathcal{V}$, and we hope to learn a hypothesis $h: \mathcal{V} \to \mathcal{X}$ that maps to the output over domain $\mathcal{X}$. Assume $h \in \mathcal{H}$, as the hypothesis class. The data is generated by a distribution $\nu$ on $\mathcal{V}$. The quality of the hypothesis is measured by the error:
$$E_{\nu}(h) = \mathbb{P}_{\mathbf{v} \sim \nu}\left[\mathbf{x}(\mathbf{v}) \neq h(\mathbf{v}) \right].$$
Recall the multi-class classification problem in a PAC-style: An algorithm $\mathcal{A}$ is said to learn a hypothesis class $\mathcal{H}$ from the revealed preference, if for all $\epsilon, \delta > 0$, there exists a sample size $K(\epsilon, \delta)$ such that for any $\nu$, $\mathcal{A}$ outputs a hypothesis $h = \mathcal{A}(\mathbf{\Xi}) \in \mathcal{H}$ such that in at least probability $1-\delta$,
$$E_{\nu}(h_{\mathcal{A}}) - E_{\nu}(h^\ast) \le \epsilon,$$
where $h^\ast$ is the optimal hypothesis in $\mathcal{H}$. Assume that $E_\nu(h^\ast) = \eta \in [0, 1)$, it means we output correctly in probability of $1-\eta$ (see the definition). If $E_\nu(h^\ast) = 0$, then the problem is called the realizable case, otherwise agnostic; it is also called misspecified case.
The key message is, we could declare the problem to learn the demand, which is a smooth function, as a multi-class classification problem.
$$\Phi: \mathcal{V} \times \mathcal{X} \to \mathbb{R}^D$$
such that for any $h \in \mathcal{H}$, there exists a vector $\mathbf{b} \in \mathbb{R}^D$ such that,$$h(\mathbf{v}) \in \arg\max_{\mathbf{x}\in\mathcal{X}} \langle \mathbf{b}, \Phi(\mathbf{v}, \mathbf{x}) \rangle, \quad \forall \mathbf{v} \in \mathcal{V}.$$
We could denote the alias of $\mathcal{H}$ by $\mathcal{H}_\Phi$, and the membership $h_\mathbf{b}$.With $\Phi$, we can specify $\mathcal{A}$ as a SVM.
Multiclass Classification by SVM
- Inputs: $\mathbf{\Xi} = \{\mathbf{v}_k\}_{k=1}^K$, $\quad \mathbf{v}_k = (\mathbf{p}_k, \mathbf{x}_k, w_k) \in \mathcal{V}$.
- Outputs: $\mathbf{b} \in \mathbb{R}^D$.
- Solve:
$$\begin{align*} \min_{\mathbf{b} \in \mathbb{R}^D} & \quad \frac{1}{2}\|\mathbf{b}\|^2 \\ \text{s.t.} & \quad \langle \mathbf{b}, \Phi(\mathbf{v}_k, \mathbf{x}_k) - \Phi(\mathbf{v}_k, \mathbf{x}) \rangle \ge 1, \quad \forall k \in [K],\ \mathbf{x} \neq \mathbf{x}_k. \end{align*}$$
The constraint is equivalent to say,
$$\min_{\mathbf{x} \in \mathcal{X}}\ \langle \mathbf{b}, \Phi(\mathbf{v}_k, \mathbf{x}_k) - \Phi(\mathbf{v}_k, \mathbf{x}) \rangle \ge 1,$$
which is attained at $\mathbf{x} = \tilde{\mathbf{x}}_k$, the second best bundle. It is also shown in [1] that the oracle complexity for this is $\mathcal{O}(n)$.
Proof: The optimal solution to UMP with linear utility is given by sorting the fraction $\frac{c_j}{p_j}$ in descending order, and allocate all $w_i$ to buy the best one. It is a priority rule and there exists one solution, such that only one index will be fractional, that is for some $j^\ast$, $0 < x_{j^\ast} < 1$, the others will be zero or one. This is similar to $\frac{c\mu}{\theta}$ rule in the queueing theory. A bundle $\mathbf{x}$ is admissible if the budget is tight and there is at most one fractional index:
$$|\{j: 0 < x_j < 1\}| \le 1, \quad \langle \mathbf{p}, \mathbf{x} \rangle = w.$$
So the $\Phi((\mathbf{p}, w), \mathbf{x})$ can be defined as follows,
$$\Phi((\mathbf{p}, w), \mathbf{x}) = \begin{cases} \mathbf{x}, & \text{if } \mathbf{x} \text{ is admissible,} \\ \mathbf{0}, & \text{otherwise.} \end{cases}$$
One can see $\mathcal{H}_{\Phi} = \mathcal{H}_{\mathrm{L}}$. □
One should understand that $\Phi$ is sufficient and necessary to encode the optimal bundle $\mathbf{x}(\mathbf{p})$ for any price $\mathbf{p}$ and budget $w$ of a linear utility function. Using multiclass SVM, we can learn the demand function $\mathbf{x}(\mathbf{p})$ with the following sample complexity bound.
$$O\!\left(\frac{D \log (1 / \epsilon)+\log (1 / \delta)}{\epsilon}\right).$$
This implies that the sample complexity of learning the demand function $\mathbf{x}(\mathbf{p})$ is
$$O\!\left(\frac{n \log (1 / \epsilon)+\log (1 / \delta)}{\epsilon}\right)$$
if $\mathcal{H}_{\mathrm{L}}$ is the class of demand functions. Actually, the above bound is tight up to logarithmic factors. We only illustrate the worst-case bound (upper bound). Of course, SVM is not the only way to do it.References
- [1] Balcan, M.-F., Daniely, A., Mehta, R., Urner, R., Vazirani, V.V.: Learning economic parameters from revealed preferences, http://arxiv.org/abs/1407.7937, (2014)
- [2] Daniely, A., Sabato, S., Ben-David, S., Shalev-Shwartz, S.: Multiclass learnability and the ERM principle, http://arxiv.org/abs/1308.2893, (2014)
- [3] Bei, X., Chen, W., Garg, J., Hoefer, M., Sun, X.: Learning market parameters using aggregate demand queries. Proceedings of the AAAI Conference on Artificial Intelligence. 30, (2016). https://doi.org/10.1609/aaai.v30i1.10027