Abstract
In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.
Related content
Announcing the Anthropic Economic Index Survey
We're launching the Anthropic Economic Index Survey, a monthly survey conducted through Anthropic Interviewer.
Read moreWhat 81,000 people told us about the economics of AI
Our recent survey study with 81,000 Claude users provides a way to connect people’s economic concerns with what we’ve quantified in Claude traffic.
Read moreAutomated Alignment Researchers: Using large language models to scale scalable oversight
Can Claude develop, test, and analyze alignment ideas of its own? We ran an experiment to find out.
Read more