BODHI: Precise OS Kernel Specification Inference

arXiv cs.AI·Zhiming Chang, Ziyang Li

5/26/2026

·~2 min·5/26/2026·en·3

Quick Answer

BODHI enhances OS kernel specification inference by integrating domain knowledge prompting, achieving up to 96.73% Pass@1 on Claude Opus 4.6.

Quick Take

BODHI enhances OS kernel specification inference by integrating domain knowledge prompting, achieving up to 96.73% Pass@1 on Claude Opus 4.6. This method improves performance by 11% to 32% across nine models from six providers, addressing syntax and semantic errors effectively.

Key Points

BODHI uses a structured C-to-Python translation guide for specification generation.
Performance improvements range from +11% to +32% across tested models.
The best configuration achieved a Pass@1 score of 96.73%.
BODHI effectively reduces syntax and semantic errors in generated specifications.
Domain knowledge injection is a model-agnostic technique applicable across various architectures.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 23931v1 Announce Type: new Abstract: The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55. 10%.

We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories.

Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4. 6 + BODHI) reaches 96. 73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material.

These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Mihnea C. Moldoveanu, Joel A. C. Baum

5h ago

FeaturedOriginal

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

AI Summary

The paper introduces Adversarial Social Epistemology (ASE) to analyze how agents manipulate trust in public communications, highlighting mechanisms that undermine the reliability of testimony and inference. It critiques existing frameworks like epistemic bubbles and misinformation diffusion, proposing a new language for understanding trust breaches and auditing inferential chains in densely interactive environments involving humans and large language models.

#LLM #Agent #Inference #Policy

BODHI: Precise OS Kernel Specification Inference

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded Multi-Agent LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure