ProfileFoundry: A Synthetic Person-Object Substrate for Privacy, Memory, and Tool-Use Evaluation in LLM Agent
Quick Answer
PROFILEFOUNDRY introduces a deterministic generator of 100,000 synthetic Person Objects, facilitating responsible evaluations in LLM research.
Quick Take
PROFILEFOUNDRY introduces a deterministic generator of 100,000 synthetic Person Objects, facilitating responsible evaluations in LLM research. It includes 709,228 events and various relational data, ensuring privacy while maintaining inspectable synthetic identities.
Key Points
- Generates 100,000 synthetic Person Objects across eight locales for LLM evaluations.
- Includes 709,228 events, 40,338 households, and 52,491 employers.
- Ensures temporal consistency and relational integrity for controlled evaluations.
- Not a population-fidelity model, but a responsible synthetic data source.
- Supports evaluations in memory, privacy, and document understanding.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 26403v1 Announce Type: new Abstract: Foundation-model research increasingly needs data about people: user state, personal histories, relationships, contact-like fields, documents, and longitudinal updates. Real user data is difficult to share, perturb, audit, or redistribute responsibly, while independently generated fake fields rarely preserve the cross-field and temporal consistency needed for controlled evaluation.
We present PROFILEFOUNDRY, a deterministic generator and fixed reference release of 100,000 adult synthetic Person Objects across eight locales. Each object combines a typed current snapshot, household, family, and employer links, snapshot-aligned events, normalized relational views, and generation provenance. The release contains 709,228 events, 40,338 households, 52,491 employers, and 518,564 directed relationship edges.
We report evidence in separate categories: selected population-marginal comparisons, per-object invariant checks, release-wide referential and temporal closure, and coincidence/provenance screens. PROFILEFOUNDRY is not a population-fidelity model, a rendered-text corpus, or a formal privacy mechanism.
Instead, it is a responsible synthetic source layer for constructing downstream foundation-model evaluations involving memory, privacy, document understanding, record linkage, and agent state while keeping the synthetic person behind each artifact inspectable
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.