Numerical submersions
Published:
Let $m \geq n$, and let $f: \mathbb{R}^m \to \mathbb{R}^n$ be a submersion. Then $f$ is not necessarily surjective. Consider, for example, $m = n = 1$, and $f(x) = \arctan(x)$. But this function fails to be surjective for a boring reason: it has a vanishing gradient as $x \to \infty$.
Let’s refine our question. Suppose that for all $x \in \mathbb{R}^m$, $\sigma_n(Df(x)) \geq C$ for some $C > 0$. That is, the singular values of the derivative of $f$ are uniformly bounded below by $C$. Does it now follow that $f$ is surjective? Let $y_1 \in \mathbb{R}^n$. We would like to find $x_1 \in \mathbb{R}^m$ that maps to $y_1$. We can think of this as training $f$ to compute $y_1$. We can do this by minimizing the function
$L(x)=\frac{1}{2}\|f(x)-y_1\|^2$
with the gradient flow
$\dot{x}(t) =-\nabla L(x(t))$.
and an arbitrary initialization $x_0 \in \mathbb{R}^m$. Then
So the loss decays exponentially:
In particular, $L(x(t)) \to 0$ as $t \to \infty$ so $f(x(t)) \to y_1$ as $t \to \infty$. With a little bit more work, we can find a limit point $x_1$ of this path and show that $f(x_1) = y_1$. Hence, $f$ is surjective.
I came across this question when I was studying the optimization dynamics of a model. I wondered about what would happen if there was no global minimizer to converge to. It turns out that good local optimization is strictly stronger than good expressivity.