Researchers at Databricks have demonstrated a method to extend the memory capacity of AI agents beyond short-term processing. The approach, called memory scaling, allows large language models to retain and retrieve information over longer interactions. This addresses a key limitation in current AI systems that struggle with maintaining context beyond a few thousand tokens.
The team tested the method on tasks requiring agents to follow multi-step instructions over extended conversations. Results showed agents could remember details from earlier in a session without losing performance. This is a step forward from traditional inference scaling, which focuses on improving reasoning during single interactions.
Databricks’ research highlights how memory scaling differs from existing techniques. Unlike methods that compress past interactions into summaries, this approach stores raw data and retrieves it selectively. The system uses a retrieval mechanism to pull relevant information when needed, rather than forcing agents to rely on limited context windows.
Lead researcher Matei Zaharia said the method could make AI agents more practical for real-world use. "Agents need to remember what happened minutes or hours ago," he said. "This isn’t just about bigger models—it’s about smarter memory management." The team plans to release the research under an open-source license later this year.
The work builds on recent advances in AI memory systems, which have gained attention as models grow more complex. While inference scaling improves immediate reasoning, memory scaling targets long-term dependency—a critical gap for applications like customer support or technical troubleshooting.
Source: databricks.com