Computational conceptual history of scientific concepts: From early digital methods to LLMs
Quick Take
This article explores the evolution of computational methods in concept analysis, emphasizing the role of large language models (LLMs) in enhancing traditional approaches. It highlights challenges in corpus construction and evaluation while reviewing case studies that utilize LLMs for lexical semantic change detection in the history, philosophy, and sociology of science.
Key Points
- LLMs inherit challenges from earlier computational methods in history and philosophy of science.
- The article reviews early digital methods and distributional approaches before LLMs.
- Key issues include corpus construction, model choice, and evaluation in LLM workflows.
- Case studies demonstrate LLMs' application in lexical semantic change detection.
Article Content
From source RSS / original summaryarXiv:2606. 04118v1 Announce Type: new Abstract: This article situates large language models (LLMs) within the longer history of computational approaches to concept analysis in the history, philosophy, and sociology of science (HPSS). We examine what LLMs add to existing methods, how they inherit longstanding problems, and review recent case studies that employ them.
In the first part, we reconstruct computational conceptual history before LLMs by bringing together three strands of work: early digital methods in HPSS, distributional approaches from digital history and related research, and lexical semantic change detection. We provide an overview of the main challenges and opportunities, focusing on corpus construction, operationalization and modelling choices, and evaluation and interpretation.
In the second part, we turn to the era of LLMs, starting with a short introduction to LLMs before reviewing LLM-based work on lexical semantic change detection and relevant case studies in HPSS. We then revisit the earlier methodological questions, showing how issues of corpus construction, model choice and training data, operationalization trade-offs, and evaluation and interpretation play out in LLM-based workflows.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.