Self-supervised User Profile Generation for Personalization

arXiv cs.CL·Clark Mingxuan Ju, Yuwei Qiu, Tong Zhao, Neil Shah

2d ago

·~2 min·6/5/2026·en·1

Quick Answer

This paper shows that The BUMP framework introduces self-supervised user profile generation for personalization in large language models, outperforming existing methods on the LaMP benchmark without requiring labeled data.

Quick Take

The BUMP framework introduces self-supervised user profile generation for personalization in large language models, outperforming existing methods on the LaMP benchmark without requiring labeled data. By leveraging user interaction history, BUMP effectively creates personalized profiles that enhance model responses across various applications.

Key Points

BUMP uses a self-supervised approach to generate user profiles without labeled data.
It employs a bidirectional ranking objective to evaluate profile effectiveness.
BUMP matches or outperforms closed-source APIs on the LaMP benchmark.
The framework relies solely on raw interaction logs for training supervision.
This method addresses the high cost and sparsity of traditional labeled reward systems.

Article Content

From source RSS / original summary

arXiv:2606. 05336v1 Announce Type: new Abstract: Personalizing large language models (LLMs) has become a central challenge as LLMs are deployed across recommendation, search, dialogue, and content generation -- settings where the same query should yield different answers given different users. A promising route is to summarize each user's interaction history into a natural-language memory or profile and prepend it to the prompt to facilitate personalization.

Existing methods learn such profile generators with explicit rewards derived from labeled downstream tasks, which are expensive and sparse as they require annotated supervision for every target task. In light of this challenge, we introduce Bidirectional User Modeling via Profiles (BUMP), a self-supervised framework that trains a profile generator without any downstream labels.

Specifically, given a user's interaction history, we use GRPO to train an LLM to emit a free-form textual profile under a bidirectional in-batch ranking objective: a small LLM judge measures (i) how well the generated profile, used as a query, ranks the user's own held-out interactions above interactions from other users in the batch, and (ii) how well a held-out interaction, used as a query, ranks the user's own profile above profiles of other users.

Both directions are scored with multi-positive NDCG and combined into a dense reward per rollout; other users in the batch supply free negatives, so every training example yields supervision from raw interaction logs alone. Evaluated on the LaMP benchmark, BUMP matches or outperforms closed-source APIs and prior methods relying on labeled rewards, while requiring no task label at training.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

2w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy