#LLM

Articles tagged LLM.

Tom's Hardware· Luke James

1h ago

FeaturedOriginal

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

AI Summary

OpenClaw's creator spent $1.3 million on 603 billion OpenAI tokens in one month.

Why Featured

The massive $1.3 million spending on OpenAI API tokens signals the growing demand for AI-driven coding solutions, highlighting potential market opportunities for developers and investors in AI technology.

#LLM #AI Coding #Funding

MIT Technology Review·Thomas Macaulay

2d ago

FeaturedOriginal

The Download: China’s AI drama factory and the WHO’s missing health targets

AI Summary

China's short drama industry leverages AI to produce engaging, bite-sized content for mobile viewers.

Why Featured

China's AI-driven short drama production signals a shift in content creation, highlighting opportunities for developers and investors in mobile entertainment and innovative storytelling.

#LLM #AI Video #AI Startup

arXiv cs.AI·Jinxian Qu, Qingqing Gu, Teng Chen, Luo Ji

2d ago

FeaturedOriginal

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

AI Summary

A novel framework enhances LLM agents' alignment with human values using GraphRAG for improved decision-making.

Why Featured

This framework enables developers and PMs to create LLM agents that better align with user values, enhancing user trust and satisfaction, which is crucial for market adoption.

#LLM #Agent #AI Assistant

arXiv cs.AI·Yize Cheng, Chenrui Fan, Mahdi JafariRaviz, Keivan Rezaei, Soheil Feiz

2d ago

FeaturedOriginal

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

AI Summary

Study reveals a knowing-doing gap in LLM tool use, necessitating model-adaptive definitions of tool necessity.

Why Featured

This study highlights the importance of adaptive tools for LLMs, signaling developers and PMs to address the gap between knowledge and practical application, which could influence investment in AI tool development.

#LLM #AI Assistant

arXiv cs.AI·Yusuke Ozaki, Dhaval Patel

2d ago

FeaturedOriginal

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

AI Summary

SPIN enhances LLM planning by ensuring valid workflows and reducing execution tasks significantly.

Why Featured

SPIN's ability to create valid workflows with reduced execution tasks is crucial for developers and PMs aiming to streamline industrial applications, while investors can identify opportunities in efficient LLM solutions.

#LLM #AI Assistant #Enterprise AI

arXiv cs.CL·Yumeng Zhang, Zhengbang Yang, Yevin Nikhel Goonatilake, Zhuangdi Zhu

2d ago

FeaturedOriginal

Distribution Corrected Offline Data Distillation for Large Language Models

AI Summary

Proposed a framework to correct distribution drift in offline data distillation for large language models.

Why Featured

This framework addresses distribution drift, enabling developers and PMs to enhance model performance and investors to recognize potential improvements in AI product reliability and effectiveness.

#LLM #Open Source

arXiv cs.CL·Xun Fang, Yunchen Li, Hang Yuan, Zhou Yu

2d ago

FeaturedOriginal

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

AI Summary

FeF-DLLM enhances discrete diffusion language models by eliminating factorization errors and improving inference speed.

Why Featured

The FeF-DLLM's elimination of factorization errors and improved inference speed signal a significant advancement in language model efficiency, crucial for developers, PMs, and investors focusing on AI applications.

#LLM #Inference

arXiv cs.CL·Injin Kong, Hyoungjoon Lee, Yohan Jo

2d ago

FeaturedOriginal

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

AI Summary

DiHAL introduces geometry-guided diffusion for improved integration in pretrained language models.

Why Featured

The introduction of geometry-guided diffusion in language models enhances their integration, signaling a potential breakthrough for developers and PMs in optimizing AI performance and efficiency.

#LLM #AI Coding

arXiv cs.CL·Ignacio Sastre, Guillermo Moncecchi, Aiala Ros\'a

2d ago

FeaturedOriginal

Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

AI Summary

Derivation Prompting enhances Retrieval-Augmented Generation by using logic-based methods to reduce errors.

Why Featured

Derivation Prompting improves Retrieval-Augmented Generation accuracy, signaling developers and PMs to refine AI models and investors to consider its potential for enhanced user experience.

#LLM #AI Coding #AI Search

arXiv cs.AI·Mingda Zhang, Tiesunlong Shen, Haoran Luo, Wenjin Liu, Zikai Xiao, Erik Cambria, Xiaoying Tang

2d ago

FeaturedOriginal

SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

AI Summary

SkillFlow introduces a flow-driven framework for improved task orchestration in LLM-based systems.

Why Featured

SkillFlow's framework enhances task orchestration in LLM systems, signaling a shift towards more efficient AI workflows that developers and PMs can leverage for better performance and scalability.

#LLM #Agent #AI Assistant

arXiv cs.CL·Kunil Lee, Ki-Young Shin, Jong-Hyeok Lee, Young-Joo Suh

2d ago

FeaturedOriginal

Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

AI Summary

The paper evaluates vector merging methods for multilingual knowledge editing in large language models.

Why Featured

This research highlights effective techniques for multilingual knowledge editing in large language models, crucial for developers and PMs aiming to enhance model performance across diverse languages.

#LLM #Open Source

arXiv cs.CV·Zhuojin Li, Hsin-Pai Cheng, Hong Cai, Shizhong Han, Fatih Porikli

2d ago

FeaturedOriginal

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

AI Summary

CoReDiT enhances Diffusion Transformers by optimizing token pruning for efficiency and quality.

Why Featured

CoReDiT's optimization of token pruning in Diffusion Transformers signals improved efficiency and quality, crucial for developers and PMs focusing on resource management and performance in AI applications.

#LLM #AI Coding #Inference

arXiv cs.CV·Michael Karnes, Alper Yilmaz

2d ago

FeaturedOriginal

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

AI Summary

This paper shows off-the-shelf embeddings are sufficient for few-shot learning without extensive fine-tuning.

Why Featured

This research indicates that developers can leverage existing embeddings for efficient few-shot learning, reducing the need for extensive fine-tuning, which is crucial for faster deployment and cost-effectiveness.

#LLM #Open Source #AI Assistant

arXiv cs.CL·Anjir Ahmed Chowdhury, Syed Zawad, Xiaolong Ma, Xu Dong, Feng Yan

2d ago

FeaturedOriginal

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

AI Summary

PEML optimizes continuous prompts and model weights for efficient multi-task learning in LLMs.

Why Featured

PEML enhances multi-task learning efficiency in LLMs, signaling developers and PMs to adopt optimized prompting strategies for improved performance and resource management.

#LLM #AI Coding

arXiv cs.CL·Zeli Su, Ziyin Zhang, Zhou Liu, Xuexian Song, Zhankai Xu, Longfei Zheng, Xiaolu Zhang, Rong Fu, Guixian Xu, Wentao Zhang

2d ago

FeaturedOriginal

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

AI Summary

Semantic rewards in reinforcement learning enhance low-resource language models without alignment tax.

Why Featured

This advancement in reinforcement learning allows developers to create efficient low-resource language models, offering PMs new market opportunities and signaling investors potential for scalable AI solutions in diverse languages.

#LLM #AI Coding

arXiv cs.CL·Sinclair Schneider, Florian Steuber, Gabi Dreo Rodosek

2d ago

FeaturedOriginal

LLM-based Detection of Manipulative Political Narratives

AI Summary

A framework detects manipulative political narratives in social media using unsupervised clustering and prompt-based filtering.

Why Featured

This framework enables developers and PMs to create tools for identifying misinformation, while investors can recognize opportunities in AI-driven content moderation solutions.

#LLM #AI Search #Policy

arXiv cs.CL·Zhanhao Hu, Xiao Huang, Patrick Mendoza, Emad A. Alghamdi, Basel Alomair, Raluca Ada Popa, David Wagner

2d ago

FeaturedOriginal

GradShield: Alignment Preserving Finetuning

AI Summary

GradShield is a method that filters harmful data during LLM finetuning to maintain alignment and safety.

Why Featured

GradShield enhances LLM safety by filtering harmful data during finetuning, crucial for developers and PMs focused on responsible AI deployment and for investors assessing risk management in AI projects.

#LLM #Security #AI Assistant

arXiv cs.CL·Luis Lara, Aristides Milios, Zhi Hao Luo, Aditya Sharma, Ge Ya Luo, Christopher Beckham, Florian Golemo, Christopher Pal

2d ago

FeaturedOriginal

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

AI Summary

A new LLM-based approach generates floor plans while adhering to numerical and topological constraints using reinforcement learning.

Why Featured

This innovation enables developers and PMs to automate architectural design, enhancing efficiency and creativity while providing investors with insights into scalable AI applications in real estate.

#LLM #AI Coding #Robotics

arXiv cs.CL·Pablo J. Diego-Sim\'on, Pierre Orhan, Yair Lakretz, Jean-R\'emi King

2d ago

FeaturedOriginal

Polar probe linearly decodes semantic structures from LLMs

AI Summary

A neural code using distance and direction of embeddings decodes semantic structures in LLMs.

Why Featured

This breakthrough in decoding semantic structures from LLMs can enhance developers' model interpretability, improve PMs' decision-making, and attract investors by showcasing advanced AI capabilities.

#LLM #AI Coding

arXiv cs.AI·Saharsh Koganti, Priyadarsi Mishra, Pierfrancesco Beneventano, Tomer Galanti

2d ago

FeaturedOriginal

Distribution-Aware Algorithm Design with LLM Agents

AI Summary

The study presents a distribution-aware algorithm leveraging LLM agents for optimized solver code generation.

Why Featured

This research highlights a novel approach to algorithm design that can enhance code generation efficiency, signaling potential improvements in AI-driven development tools for developers, PMs, and investors.

#LLM #Agent #AI Coding

arXiv cs.CL·Mokshit Surana, Archit Rathod, Akshaj Satishkumar

2d ago

FeaturedOriginal

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

AI Summary

This study evaluates DExperts for mitigating toxicity in LLMs, revealing strengths and weaknesses in safety and latency.

Why Featured

This study's findings on DExperts provide developers and PMs insights into improving LLM safety, while investors can gauge the technology's market viability and potential for responsible AI deployment.

#LLM #Open Source #Security

arXiv cs.AI·Leslie G. Valiant

2d ago

FeaturedOriginal

Enhanced and Efficient Reasoning in Large Learning Models

AI Summary

The paper proposes an efficient reasoning method for large language models, enhancing trust in generated content.

Why Featured

This advancement in reasoning methods boosts the reliability of large language models, crucial for developers and PMs focusing on trust in AI applications, while investors can gauge potential market competitiveness.

#LLM #Inference #Open Source

arXiv cs.AI·Hiroki Fukui

2d ago

FeaturedOriginal

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

AI Summary

Invisible orchestrators in multi-agent LLM systems pose significant safety risks and affect behavior dynamics.

Why Featured

The emergence of invisible orchestrators in multi-agent LLM systems highlights critical safety risks, urging developers and PMs to prioritize robust safety protocols and investors to assess potential liabilities.

#LLM #Agent #Security

arXiv cs.AI·Erica Stutz, Giacomo Marino, Daniella Meeker, Qiao Liu, Andrew J. Loza

2d ago

FeaturedOriginal

Conditional Attribute Estimation with Autoregressive Sequence Models

AI Summary

Conditional Attribute Transformers enhance autoregressive models by estimating next-token probabilities and attribute values simultaneously.

Why Featured

This advancement in Conditional Attribute Transformers signals a shift towards more efficient AI models, enabling developers and PMs to create smarter applications while attracting investors interested in innovative technology solutions.

#LLM #AI Coding

arXiv cs.CV·Haun Leung, ZiNan Wang

2d ago

FeaturedOriginal

Unified Pix Token And Word Token Generative Language Model

AI Summary

A new model unifies pix and word tokens for improved generative language and visual understanding.

Why Featured

This model's integration of visual and textual tokens enhances multi-modal applications, signaling potential for developers to create richer AI experiences and for investors to capitalize on emerging technologies.

#LLM #AI Image

arXiv cs.AI·Anjir Ahmed Chowdhury, Syed Zawad, Feng Yan

2d ago

FeaturedOriginal

Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

AI Summary

MSIFR enhances LLM synthetic data generation efficiency by early rejecting low-quality outputs.

Why Featured

This advancement in synthetic data generation allows developers and PMs to optimize resource usage, while investors can identify promising AI technologies that enhance model efficiency and reduce operational costs.

#LLM #AI Coding

arXiv cs.CL·Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek

2d ago

FeaturedOriginal

Ideology Prediction of German Political Texts

AI Summary

A transformer model predicts political orientation in German texts on a continuous left-right spectrum.

Why Featured

This model enables developers and PMs to enhance text analysis tools, while investors can identify opportunities in AI-driven political analytics and sentiment analysis markets.

#LLM #AI Assistant

arXiv cs.CL·Juan S. Santillana

2d ago

FeaturedOriginal

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

AI Summary

VectraYX-Nano is a 42M-parameter Spanish cybersecurity language model utilizing curriculum learning and native tool integration.

Why Featured

VectraYX-Nano's innovative curriculum learning and native tool use signal advancements in specialized AI models, offering developers and PMs new capabilities for cybersecurity applications while attracting investor interest in niche markets.

#LLM #Security #AI Startup

arXiv cs.AI·Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen, Fangzhen Lin

2d ago

FeaturedOriginal

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

AI Summary

The paper proposes a reinforcement learning framework to enhance perception-reasoning synergy in Vision-Language Models.

Why Featured

This framework improves Vision-Language Models, signaling developers and PMs to enhance AI applications and investors to recognize potential advancements in multimodal AI technology.

#LLM #AI Assistant

arXiv cs.CL·Chengzhi Liu, Yichen Guo, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang

2d ago

FeaturedOriginal

Auditing Agent Harness Safety

AI Summary

HarnessAudit framework evaluates safety in LLM agent execution, revealing risks in multi-agent systems.

Why Featured

The HarnessAudit framework's evaluation of LLM agent safety highlights critical risks in multi-agent systems, guiding developers, PMs, and investors in building safer AI applications.

#LLM #Agent #Security

OpenAI Blog

2d ago

#LLM

OpenClaw creator burned through $1.3 million in OpenAI API tokens in a single month — bill covered 603 billion tokens across 7.6 million requests and 100 coding agents

The Download: China’s AI drama factory and the WHO’s missing health targets

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

Distribution Corrected Offline Data Distillation for Large Language Models

Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

CoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers

Rethinking the Good Enough Embedding for Easy Few-Shot Learning

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

LLM-based Detection of Manipulative Political Narratives

GradShield: Alignment Preserving Finetuning

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

Polar probe linearly decodes semantic structures from LLMs

Distribution-Aware Algorithm Design with LLM Agents

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Enhanced and Efficient Reasoning in Large Learning Models

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Conditional Attribute Estimation with Autoregressive Sequence Models

Unified Pix Token And Word Token Generative Language Model

Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

Ideology Prediction of German Political Texts

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

Auditing Agent Harness Safety

A new personal finance experience in ChatGPT

Databricks brings GPT-5.5 to enterprise agent workflows

What happens when AI starts building itself?

How AI Hallucinations Are Creating Real Security Risks

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

Useful Memories Become Faulty When Continuously Updated by LLMs

Position: Agentic AI System Is a Foreseeable Pathway to AGI

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs

Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education

State-Centric Decision Process

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

An Agentic LLM-Based Framework for Population-Scale Mental Health Screening

Inline Critic Steers Image Editing

Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

Just Ask for a Table: A Thirty-Token User Prompt Defeats Sponsored Recommendations in Twelve LLMs

Is Video Anomaly Detection Misframed? Evidence from LLM-Based and Multi-Scene Models

Unlocking asynchronicity in continuous batching

Helping ChatGPT better recognize context in sensitive conversations

Anthropic’s Cat Wu says that, in the future, AI will anticipate your needs before you know what they are

OpenAI launches Codex Cloud Agent for autonomous engineering tasks

It is wild that we still ask LLMs to think in plain text — the next 10x is in the latent stream.

Show HN: Tiny 1B param model that beats GPT-3.5 on JSON extraction

Instructions shape Production of Language, not Processing

Belief or Circuitry? Causal Evidence for In-Context Graph Learning

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models

SOMA: Efficient Multi-turn LLM Serving via Small Language Model

Embeddings for Preferences, Not Semantics

How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting

RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care

Generative AI for Visualizing Highway Construction Hazards Through Synthetic Images and Temporal Sequences

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction