Show HN: Tiny 1B param model that beats GPT-3.5 on JSON extraction · DeepSignalShow HN: Tiny 1B param model that beats GPT-3.5 on JSON extraction
Indie 1B Llama-3 derivative trained on synthetic data beats GPT-3.5 on JSON extraction at 80 tok/s on a single 4090.
Key Points
- 200K synthetic JSON-extraction training examples.
- Beats GPT-3.5 on a 10-task held-out benchmark.
- Runs at 80 tok/s on a 4090; open-sourced.
Reader Mode is being prepared.
Cursor reaches $500M ARR run-rate
AI Summary
Cursor has hit a $500M ARR run-rate, doubling in five months with 40% from enterprise.
Show HN: Pico — open-source on-device LLM router for AI coding agents
AI Summary
Pico routes coding-agent requests between local and remote LLMs, cutting cost 62% with a marginal accuracy drop.
What's the actual cost of running a 70B Llama on AWS?
AI Summary
70B Llama 3.1 on AWS g5.48xlarge with vLLM costs $0.31/M tokens at 50% utilisation, $0.18 at 80%.
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
AI Summary
Invisible orchestrators in multi-agent LLM systems pose significant safety risks and affect behavior dynamics.

arXiv cs.CL·Chengzhi Liu, Yichen Guo, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang 2d agoAuditing Agent Harness Safety
AI Summary
HarnessAudit framework evaluates safety in LLM agent execution, revealing risks in multi-agent systems.

arXiv cs.CL·Mokshit Surana, Archit Rathod, Akshaj Satishkumar 2d agoMeasuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
AI Summary
This study evaluates DExperts for mitigating toxicity in LLMs, revealing strengths and weaknesses in safety and latency.
67
≥75 high · 50–74 medium · <50 low
Why Featured
Small specialised models continue to eat the boring-but-high-volume LLM workloads — a recurring signal worth watching.