Curation and Extraction of Drug-Related Entities from Reddit Platform
Quick Take
The ReDose dataset, comprising 6,435 Reddit posts, enhances medical understanding of drug use by extracting DRUG, DOSE, and EFFECT entities. BiomedBERT achieved an F1-score of 0.843 for DRUG extraction, while Llama-3 70B surpassed GPT-4 in performance. However, EFFECT extraction remains difficult, with GPT-4 only achieving a recall of 0.41.
Key Points
- ReDose dataset includes 6,435 Reddit posts on substance use.
- BiomedBERT achieved an F1-score of 0.843 for drug entity extraction.
- Llama-3 70B outperformed GPT-4 in drug extraction performance.
- EFFECT extraction remains challenging with GPT-4's recall at 0.41.
- The dataset aims to bridge the gap between clinical knowledge and user experiences.
Article Excerpt
From source RSS / original summaryarXiv:2605. 26445v1 Announce Type: new Abstract: Physicians learn primarily about illicit drugs from clinical overdose cases, limiting their understanding of real-world usage. Meanwhile, drug users share first-hand experiences online, offering insights into dosage and effects of drugs. To bridge this gap, we introduce ReDose (REddit Drug DOSe and Effect), a dataset of 6,435 Reddit posts on substance use.
A board-certified toxicologist primarily annotated both the training and test sets, while two medical science students contributed to the test set, labeling DRUG, DOSE, and EFFECT entities. We benchmarked 6,267 annotations using BERT-based, large language model (LLM)-based, and Retrieval-Augmented Generation (RAG) models. BiomedBERT achieved an F1-score of 0. 843 for DRUG, while Llama-3 70B outperformed GPT-4 (F1 = 0. 79 vs. 0. 72). EFFECT extraction remains challenging, with GPT-4 achieving a recall of 0. 41.
ReDose captures patient-curated narratives to advance medical data extraction from social media.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.