ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation
Quick Take
ACAT is a web-based platform designed for efficient aspect-based sentiment dataset annotation, supporting four workflows and automating ETL processes. In a validation with 1,002 restaurant reviews, it achieved a median annotation time of 31.58 seconds and an Inter-Annotator Agreement (IAA) ranging from 0.78 to 0.86, streamlining dataset preparation for ABSA models.
Key Points
- ACAT supports four ABSA workflows including Aspect-Category Sentiment Analysis.
- The platform automates ETL processes for collaborative annotations.
- Median annotation time for 1,002 reviews was 31.58 seconds.
- Inter-Annotator Agreement (IAA) ranged from 0.78 to 0.86 across tasks.
- Streamlines dataset preparation for training reliable sentiment analysis models.
Article Excerpt
From source RSS / original summaryarXiv:2606. 04189v1 Announce Type: new Abstract: Aspect-Based Sentiment Analysis (ABSA) requires high-quality datasets to train reliable models. However, existing annotation tools treat output as flat files, leaving researchers to manually consolidate multi-annotator data, reconstruct relational structures, and compute reliability metrics through custom scripts.
This paper introduces ACAT (Aspect-based sentiment analysis Collaborative Annotation Tool), a web-based platform natively supporting four ABSA workflows: (1) Aspect-Category Sentiment Analysis, (2) Clause-Level Segmentation, (3) Aspect-Term Sentiment Analysis with character-level position tracking, and (4) Aspect Sentiment Triplet Extraction with dual span offset preservation.
Its core contribution is an automated Extract, Transform, Load (ETL) pipeline that aligns collaborative annotations and computes Inter-Annotator Agreement (IAA) metrics directly at export, yielding training-ready datasets. In a preliminary validation on 1,002 restaurant reviews with two annotators of differing expertise, ACAT achieves a median annotation time of 31. 58 seconds and a raw IAA ranging from 0. 78 to 0. 86 across all tasks.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.