Databricks has evaluated whether large language model agents can improve join order optimization in database queries. The company’s internal tests show mixed results, with some queries performing better and others worse when using AI-driven optimization compared to traditional methods.
The research focused on the Databricks SQL engine, which supports complex queries involving multiple tables. Engineers compared execution times between standard query plans and those adjusted by an LLM agent. In cases with simple joins, the AI agent matched or slightly exceeded human-optimized plans. For queries with more than four tables, performance varied significantly.
A key challenge was cost estimation errors. The LLM agent sometimes miscalculated the expense of certain join strategies, leading to suboptimal execution paths. This issue was more pronounced in workloads with skewed data distributions, where traditional cost models already struggle.
Databricks has not yet integrated the AI agent into its production systems. The company plans further testing to address the cost estimation flaws and improve reliability before considering deployment. The findings highlight the need for hybrid approaches that combine AI suggestions with human oversight.
Source: databricks.com