Deep sets as universal approximators
Published:
Deep sets are a family of neural networks which parameterize functions on sets. The problem is as follows. Let $X$ be a compact Hausdorff space. We might be able to parameterize arbitrary continuous functions $X \to \mathbb{R}$ with universal approximators (e.g. MLPs). What if we instead need to parameterize functions $X^n \to \mathbb{R}$ which are permutation invariant? That is, we want functions $f$ such that
\[\begin{aligned} f(x_1, \cdots, x_n) = f(x_{\sigma(1)}, \dots, x_{\sigma(n)}) \end{aligned}\]whenever $x_1, \cdots, x_n \in X$ and $\sigma$ is a permutation of ${1, \cdots, n}$. Deep sets are functions of the form
\[\begin{align} f(x_1, \cdots, x_n) = \rho\left(\sum_{i=1}^n \phi(x_i)\right), \end{align}\]where $\phi: X \to \mathbb{R}^d$ and $\rho: \mathbb{R}^d \to \mathbb{R}$ are continuous functions. We parameterize $f$ by the triple $(d, \rho, \phi)$. We can also think of $f$ as a map $X^n/S_n \to \mathbb{R}$, where $X^n/S_n$ denotes the quotient of $X^n$ by the action of the symmetric group $S_n$. In this post, I will outline a proof that such functions are universal approximators in the space of continuous functions on $X^n/S_n$, so arbitrary continuous set functions can be parameterized by deep sets.
Let $\mathcal{F} \subset C(X^n/S_n)$ denote the set of functions of the form (1). We will show that $\mathcal{F}$ is dense in $C(X^n/S_n)$ using Stone-Weierstrass.
Step 1: $\mathcal{F}$ contains all constant functions. We can see this by taking $\rho(x) = C$ for all $x \in \mathbb{R}^d$.
Step 2: $\mathcal{F}$ is closed under addition. Let $f_1, f_2 \in \mathcal{F}$ be functions parameterized by $(d_1, \rho_1, \phi_1)$ and $(d_2, \rho_2, \phi_2)$ respectively. We define $\phi: X \to \mathbb{R}^{d_1 + d_2}$ by $\phi(x) = (\phi_1(x), \phi_2(x))$ and $\rho: \mathbb{R}^{d_1 + d_2} \to \mathbb{R}$ by $\rho(x) = \rho_1(x^{(1)}) + \rho_2(x^{(2)})$. Let $f$ be the function parameterized by $(d_1 + d_2, \rho, \phi)$. Then
\[\begin{aligned} f(x_1, \cdots, x_n) &= \rho\left(\sum_{i=1}^n \phi(x_i)\right)\\ &= \rho\left(\sum_{i=1}^n \phi_1(x_i), \sum_{i=1}^n \phi_2(x_i)\right)\\ &= \rho_1\left(\sum_{i=1}^n \phi_1(x_i)\right) + \rho_2\left(\sum_{i=1}^n \phi_2(x_i)\right)\\ &= f_1(x_1, \cdots, x_n) + f_2(x_1, \cdots, x_n). \end{aligned}\]So $f_1 + f_2 \in \mathcal{F}$.
Step 3: $\mathcal{F}$ is closed under multiplication. Let $f_1, f_2 \in \mathcal{F}$ be functions parameterized by $(d_1, \rho_1, \phi_1)$ and $(d_2, \rho_2, \phi_2)$ respectively. We define $\phi: X \to \mathbb{R}^{d_1 + d_2}$ by $\phi(x) = (\phi_1(x), \phi_2(x))$ and $\rho: \mathbb{R}^{d_1 + d_2} \to \mathbb{R}$ by $\rho(x) = \rho_1(x^{(1)}) \cdot \rho_2(x^{(2)})$. Let $f$ be the function parameterized by $(d_1 + d_2, \rho, \phi)$. Then
\[\begin{aligned} f(x_1, \cdots, x_n) &= \rho\left(\sum_{i=1}^n \phi(x_i)\right)\\ &= \rho\left(\sum_{i=1}^n \phi_1(x_i), \sum_{i=1}^n \phi_2(x_i)\right)\\ &= \rho_1\left(\sum_{i=1}^n \phi_1(x_i)\right) \cdot \rho_2\left(\sum_{i=1}^n \phi_2(x_i)\right)\\ &= f_1(x_1, \cdots, x_n) \cdot f_2(x_1, \cdots, x_n). \end{aligned}\]So $f_1 f_2 \in \mathcal{F}$.
Step 4: $\mathcal{F}$ separates points. Let $\{x_1, \cdots, x_n\}, \{y_1, \cdots, y_n\} \in X^n/S_n$ be distinct. Then there exists $z \in X$ such that
\[|\{i \in [n]: x_i = z\}| \neq |\{i \in [n]: y_i = z\}|.\]Let $E = \{x_1, \cdots, x_n, y_1, \cdots, y_n\} \setminus \{z\}$. Since $E$ and $\{z\}$ are disjoint finite sets, by Urysohn’s lemma there exists $\phi: X \to \mathbb{R}$ such that $\phi(z) = 1$ and $\phi(p) = 0$ for all $p \in E$. Let $\rho: \mathbb{R} \to \mathbb{R}$ be the identity function. Then
\[\begin{aligned} \rho\left(\sum_{i=1}^n \phi(x_i)\right) &= |\{i \in [n]: x_i = z\}|\\ &\neq |\{i \in [n]: y_i = z\}|\\ &= \rho\left(\sum_{i=1}^n \phi(y_i)\right). \end{aligned}\]So we have found a function in $\mathcal{F}$ which separates $x_1, \cdots, x_n$ from $y_1, \cdots, y_n$.
Thus, by Stone-Weierstrass, $\mathcal{F}$ is dense in $C(X^n/S_n)$, so every permutation-invariant function $X^n \to \mathbb{R}$, there exists a deep set arbitrarily close to it.