6
$\begingroup$

Suppose I have a real symmetric $m \times m$ matrix $\Lambda$. This matrix is large ($m \gg 1$) and, for simplicity, we'll assume it's diagonal. I then construct a random $n \times n$ projection

$$ A = \frac{m}{n} \Phi^T \Lambda \Phi $$

where $\Phi$ is a random $m \times n$ matrix satisfying $\Phi^T \Phi = I_n$ and $m \gg n \gg 1$.

I am interested in understanding the typical eigenspectrum of the projected matrix $A$ in terms of the eigenvalues of $\Lambda$. This is for work in the spirit of statistical mechanics, and approximations or nonrigorous answers are perfectly fine. As an example of the sort of phenomenon I'd like to understand, I find numerically that when the eigenvalues of $\Lambda$ decay with index, the spectrum of $A$ approximately matches the top $n$ original eigenvalues, but with a systematic deviation that's roughly the same every trial (see plot below). How could I reach this conclusion analytically, and what is the form of the deviation? This seems likely to be tractable with tools from random matrix theory, but I wasn't able to find this problem in the literature.

enter image description here


Motivation

It turns out that this question is important in understanding the generalization behavior of infinitely-wide neural networks on e.g. image datasets. The original linear operator in question is actually the network's "neural tangent kernel." I do think the question of why one recovers the top $n$ eigenvalues is interesting in its own right, however.

$\endgroup$
8
  • 2
    $\begingroup$ you might want to say a bit more on how generic your finding is, that the original and projected eigenvalues are so close; for example, if all eigenvalues of $\Lambda$ are equal to 1, then the eigenvalues of $A$ are all equal to $m/n\gg 1$, so at least in that case your finding does not apply. $\endgroup$ Dec 20, 2021 at 18:29
  • $\begingroup$ @CarloBeenakker Sure! I tend to see this behavior whenever the eigenvalues decay fairly quickly with index. The systematic gap is smaller the faster the decay (which agrees with it being quite large when the decay rate is zero). In general, I find that large, standout eigenvalues are faithfully recovered by this random projection procedure. $\endgroup$ Dec 20, 2021 at 19:41
  • $\begingroup$ Let $B:=\Phi^\top \Lambda \Phi$. Does it help you to know that $\mbox{trace}(B)$ concentrates around $(n/m)\mbox{trace}(\Lambda)$ while $c_1 m \le \|B\|_{op}/ \max(\mbox{trace}(A),n\|\Lambda\|_{op}) \le c_2m$ w.h.p ? See this question mathoverflow.net/q/412241/78539 and these answers (1) mathoverflow.net/a/412436/78539, and (2) mathoverflow.net/a/412481/78539 (in that order). $\endgroup$
    – dohmatob
    Dec 25, 2021 at 19:39
  • $\begingroup$ @dohmatob Hmm, maybe indirectly! That provides one constraint on the spectrum of $A$, but there are many ways to satisfy that constraint without giving a spectrum of the type I describe. However, using an informal physicist's analysis, I find that there are actually a family of related conditions! I find that $\mathbb{E}\left[\text{trace}(A^k)\right] = \text{trace}(\Lambda^k) + \mathcal{O}(\text{trace}(\Lambda)^k/n)$ when $k \ll n$. These seem like strong enough conditions to say something interesting about the spectrum of $A$, though I'm not sure how best to do so. $\endgroup$ Dec 29, 2021 at 4:33
  • $\begingroup$ In particular, these conditions seem like they might imply that the tops of the spectra of $A$ and $\Lambda$ are approximately the same, which is the sort of result I'm looking for $\endgroup$ Dec 29, 2021 at 4:41

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Browse other questions tagged or ask your own question.