What's the actual cost of running a 70B Llama on AWS? · DeepSignalWhat's the actual cost of running a 70B Llama on AWS?
70B Llama 3.1 on AWS g5.48xlarge with vLLM costs $0.31/M tokens at 50% utilisation, $0.18 at 80%.
Key Points
- g5.48xlarge with vLLM.
- $0.31/M @ 50% util, $0.18/M @ 80%.
- Includes batching trade-off analysis.
Reader Mode is being prepared.
Cursor reaches $500M ARR run-rate
AI Summary
Cursor has hit a $500M ARR run-rate, doubling in five months with 40% from enterprise.
Show HN: Pico — open-source on-device LLM router for AI coding agents
AI Summary
Pico routes coding-agent requests between local and remote LLMs, cutting cost 62% with a marginal accuracy drop.
Show HN: Tiny 1B param model that beats GPT-3.5 on JSON extraction
AI Summary
Indie 1B Llama-3 derivative trained on synthetic data beats GPT-3.5 on JSON extraction at 80 tok/s on a single 4090.
$60B AI chip darling Cerebras almost died early on, burning $8M a month
AI Summary
Cerebras Systems, once burning $8M monthly, is now the biggest tech IPO of 2026.
What you need to know about Nvidia competitor Cerebras after wild IPO
AI Summary
Cerebras' IPO highlights strong demand for AI chips, positioning it as a competitor to Nvidia.
Jim Cramer says it's time to trim this volatile AI chipmaker
AI Summary
Jim Cramer advises reducing exposure to a volatile AI chipmaker amid market fluctuations.
67
≥75 high · 50–74 medium · <50 low
Why Featured
Concrete cost data is rare and immediately actionable for any team weighing self-hosted vs managed.