nanban-harvest

Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents

JournalArXiv.org
DOI10.48550/arxiv.2509.24405
OpenAlexW4415336831
Languageen
ISSN2331-8422
OA?yes
Statuspending

Abstract

Text-to-SQL enables natural access to databases, yet most benchmarks are English-only, limiting multilingual progress. We introduce MultiSpider 2.0, extending Spider 2.0 to eight languages (English, German, French, Spanish, Portuguese, Japanese, Chinese, Vietnamese). It preserves Spider 2.0's structural difficulty while adding linguistic and dialectal variability, demanding deeper reasoning for complex SQL. On this benchmark, state-of-the-art LLMs (such as DeepSeek-R1 and OpenAI o1) reach only 4\% execution accuracy when relying on intrinsic reasoning, versus 60\% on MultiSpider 1.0. Therefore, we provide a collaboration-driven language agents baseline that iteratively refines queries, improving accuracy to 15\%. These results reveal a substantial multilingual gap and motivate methods that are robust across languages and ready for real-world enterprise deployment. Our benchmark is available at https://github.com/phkhanhtrinh23/Multilingual_Text_to_SQL.

Matched Nanban terms

  • anchor Portuguese-Japanese

Provenance

  • openalex (W4415336831)
    2026-04-30T19:37:55.302153+00:00

Candidate PDF URLs

PSourceURLLast attemptLast error
30 openalex https://arxiv.org/pdf/2509.24405

Extras

openalex_conceptsComputer science; Benchmarking; Benchmark (surveying); Limiting; Natural language processing; Artificial intelligence; Baseline (sea); Natural language; Natural language understanding; Cover (algebra)
openalex_topicsSemantic Web and Ontologies; Multi-Agent Systems and Negotiation