Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs
Quick Take
Large Language Models (LLMs) like Gemini 2.5 Flash can estimate individual domain knowledge from Slack logs, achieving a low MAE of 21.13%. In contrast, GPT models showed larger discrepancies, indicating that message volume alone does not enhance inference accuracy. This research underscores the potential and limitations of automated expertise mapping in organizations.
Key Points
- Gemini 2.5 Flash achieved the lowest mean absolute error (MAE) at 21.13%.
- GPT models exhibited significantly larger discrepancies in knowledge estimation.
- Estimation accuracy was weakly dependent on the volume of messages analyzed.
- Study analyzed 27,188 messages from 43 users over long-term Slack logs.
- Findings highlight the need for privacy-preserving methods in expertise mapping.
Article Excerpt
From source RSS / original summaryarXiv:2605. 22971v1 Announce Type: new Abstract: Employees often struggle to identify ``who knows what,'' leading to organizational productivity losses. We investigate whether Large Language Models (LLMs) can infer individual domain knowledge directly from long-term Slack logs. Analyzing 27,188 messages from 43 users, we evaluated seven models (including Gemini, Claude, and GPT families) by comparing their zero-shot estimates against self-reported skill ratings from 27 participants. Gemini 2.
5 Flash achieved the lowest error (MAE 21. 13%), while GPT models showed significantly larger discrepancies. Notably, estimation accuracy depended only weakly on message volume, indicating that more text alone does not guarantee better inference. These findings demonstrate the feasibility and current limits of automated expertise mapping, highlighting the need for privacy-preserving deployments and richer, structure-aware representations of human knowledge.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.