From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference
Quick Answer
SemantiClean is a modular framework for extracting structured signals from e-commerce data, prioritizing auditability and reproducibility over mere accuracy.
Quick Take
SemantiClean is a modular framework for extracting structured signals from e-commerce data, prioritizing auditability and reproducibility over mere accuracy. It organizes behavioral elements into a four-layer architecture and employs anti-inflation mechanisms to ensure signal quality, with a fully implemented LLM-Integrated Semantic Inference Engine for inference tasks.
Key Points
- SemantiClean uses a four-layer architecture for behavioral element organization.
- The framework includes anti-inflation mechanisms to maintain signal quality.
- LLM-Integrated Semantic Inference Engine provides reproducible outputs.
- Current implementation excludes gender inference from quantitative results.
- Framework built on the Online Shoppers Purchasing Intention dataset.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11207v1 Announce Type: new Abstract: We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a shared element library.
Unlike conventional end-to-end predictors that optimise solely for accuracy, SemantiClean prioritises auditability, structural governance, and sigma=0 reproducibility, explicitly trading marginal predictive gains for element-level transparency and defensible decision trails.
Built upon the Online Shoppers Purchasing Intention (OSPI) dataset, the framework organises twenty-four behavioural elements into a four-layer architecture (Functional, Interaction, Systemic, Contextual) and enforces signal quality through three anti-inflation mechanisms: RedundancyGroup contribution caps, TieredPenaltyCalculator bias penalties, and AdaptiveConstraintMode cold-start protection.
This report introduces the LLM-Integrated Semantic Inference Engine, a fully implemented two-phase LLM-driven inference architecture that leverages complete element metadata at inference time. All quantitative results reported herein are produced by this engine. Deterministic engine outputs remain fully reproducible (sigma=0); LLM-dependent results (E8, E10) are subject to controlled output variability under fixed provider/model/temperature settings.
The gender inference target remains non-functional in the current implementation and is excluded from all quantitative results.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.