Oct 29, 2024 | Our arXiv preprint proposes a new neural network layer, the Fourier head, which learns a continuous probability density function using Fourier series, and returns a discrete approximation of it. When to use it? Large language models are often adapted to model non-linguistic tokens. If these tokens have an underlying continuous structure, then replacing the linear classification head with the Fourier head can boost downstream performance. Project page |
---|---|
Oct 28, 2024 | I will be attending AGNES at Dartmouth! |