GENIE: A Fine-Grained Measure for Novelty

arXiv cs.CL·Ramya Namuduri, Manya Wadhwa, Anshun Asher Zheng, Greg Durrett, Junyi Jessy Li

1d ago

·~1 min·6/12/2026·en·0

Quick Answer

The paper introduces GENIE, a fine-grained metric for assessing the novelty of model-generated content, addressing the shortcomings of holistic metrics.

Quick Take

The paper introduces GENIE, a fine-grained metric for assessing the novelty of model-generated content, addressing the shortcomings of holistic metrics. It demonstrates that GENIE effectively captures task-specific features of novelty, providing insights into model creativity and the impact of mitigation methods.

Key Points

GENIE measures novelty in model outputs with task-specific features.
Holistic metrics fail to capture the complexity of novelty effectively.
The study evaluates the effectiveness of methods aimed at enhancing creativity.
GENIE provides insights into which properties contribute to novelty.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 12790v1 Announce Type: new Abstract: Large Language Models have consistently demonstrated a lack of creativity and diversity across tasks. Prior work has focused on addressing whether models are capable of generating creative outputs. Here, we aim to consider novelty and investigate what makes model-generated content novel or not novel in a task-specific manner.

We propose a fine-grained evaluation metric GENIE to measure the novelty of responses along task-specific features with respect to a population of responses. We show that unlike GENIE, holistic metrics struggle to capture the high-dimensionality of novelty and do not provide insight on which properties they target. Finally, we use GENIE to measure the effectiveness of mitigation methods that address creativity to better understand where these methods can improve novelty.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Leyao Wang, Yanan He, Peng Chen, Asaf Yehudai, Yixin Liu, Rex Ying, Michal Shmueli-Scheuer, Arman Cohan

3w ago

FeaturedOriginal

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

AI Summary

The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.

#LLM #Agent #Inference #Policy