Which Institutional Frameworks Do Chatbots Assume? Auditing Jurisdictional Defaults in Multilingual LLMs
Quick Take
A study on seven LLMs reveals that input language significantly influences jurisdictional responses, with 74.5% of English prompts yielding U.S.-specific answers and 53.3% of Chinese prompts yielding China-specific answers. This highlights the risk of institutional-framework misselection, suggesting LLMs should clarify jurisdiction when language input is ambiguous.
Key Points
- Chinese input leads to China-specific answers 53.3% of the time.
- English input results in U.S.-specific answers 74.5% of the time.
- Study evaluated 60 prompts across seven LLMs from the U.S. and China.
- Misalignment risks arise when users' language differs from relevant jurisdiction.
- LLMs should request jurisdictional context when input language is ambiguous.
Article Content
From source RSS / original summaryarXiv:2606. 00333v1 Announce Type: new Abstract: LLMs increasingly answer questions about taxes, labor protections, healthcare, education, pensions, and administrative procedures, where usefulness often depends on the applicable jurisdiction. Multilingual users may write in their most comfortable language rather than one associated with the country or region whose rules apply. We ask whether deployed LLMs use input language as a default jurisdictional signal when prompts omit any country or region.
Prior multilingual audits show that prompt language can shift cultural, political, or normative outputs; we examine which legal-administrative framework models supply when jurisdiction is underspecified. We evaluate seven LLMs developed in the United States or China on 60 underspecified legal-administrative prompts in English and Mandarin Chinese under three system-prompt conditions, yielding 2,520 manually annotated responses.
Across models and conditions, Chinese input more often produces China-specific answers, while English input more often produces U. S. -specific, comparative, or generic answers. Prompts requiring a single answer further increase jurisdiction selection: pooled across models, 74. 5% of English-input responses adopt a U. S. framework, while 53. 3% of Chinese-input responses adopt a China framework. This directional pattern appears in all seven models.
We describe this deployment-level pattern as institutional-framework misselection risk: a fluent answer may rely on a legal-administrative context the user did not intend, especially when their preferred language differs from the relevant jurisdiction. LLM interfaces should not route institutional advice by input language alone; when location is absent, they should request it or state the jurisdictional scope of the answer.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.