PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

arXiv cs.CL·Shaiv Patel, Kartik Narayan, Vishal Patel

3h ago

·~2 min·6/8/2026·en·0

Quick Answer

PromptPrint introduces a novel behavioral biometric method using natural language prompts, revealing that lexical choices are more indicative of identity than semantic intent.

Quick Take

PromptPrint introduces a novel behavioral biometric method using natural language prompts, revealing that lexical choices are more indicative of identity than semantic intent. Analyzing 20,680 prompts from 1,034 users, the study shows strong identification performance, highlighting vulnerabilities to semantic paraphrasing while maintaining robustness to minor lexical changes, with implications for security and privacy in LLM interactions.

Key Points

Lexical representations outperform semantic encoders in identifying user identity.
Users exhibit a uniqueness-consistency paradox in their prompt behaviors.
Identity signals are robust against minor lexical changes but vulnerable to semantic paraphrasing.
The study analyzed 20,680 prompts from 1,034 users.
Findings have significant implications for security and privacy in LLM interactions.

Article Content

From source RSS / original summary

arXiv:2606. 06755v1 Announce Type: new Abstract: Authorship attribution research has traditionally focused on long-form, expressive texts; however, interactions with large language models (LLMs) are typically brief and task-driven prompts. This raises a fundamental question: do such prompts contain a stable, author-identifiable, and distinctive signal?

We introduce PromptPrint, a systematic study of prompt-based identity, the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns form a learnable behavioral biometric. Using 20,680 real prompts from 1,034 users, we establish three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis": identity is primarily encoded in surface-level word choice rather than abstract intent.

Second, stylometric features exhibit a "uniqueness-consistency paradox": users are highly distinctive across the population, yet behaviorally inconsistent across contexts. Third, adversarial analysis reveals a clear vulnerability spectrum: identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. Overall, our results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric.

This work introduces a new perspective on user modeling in LLM interactions, with important implications for security and privacy. Data and code will be released upon the acceptance of our work.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy

PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

Quick Answer

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models

SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding

Related in this space

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems