What's the actual cost of running a 70B Llama on AWS?

5/12/2026

·~3 min·5/12/2026·en·2

Quick Answer

Running a 70B Llama 3.1 model on AWS using vLLM costs $0.31 per million tokens at 50% utilization, decreasing to $0.18 at 80% utilization.

Quick Take

Running a 70B Llama 3.1 model on AWS using vLLM costs $0.31 per million tokens at 50% utilization, decreasing to $0.18 at 80% utilization. This cost analysis includes considerations for batching tradeoffs, impacting users looking to optimize their cloud expenses.

Key Points

Cost per million tokens is $0.31 at 50% utilization.
Cost drops to $0.18 per million tokens at 80% utilization.
Batching tradeoffs are included in the cost analysis.
Model used is 70B Llama 3.1 with vLLM on AWS.
Analysis aids users in optimizing cloud expenses.

Article Excerpt

From source RSS / original summary

Detailed breakdown: 70B Llama 3. 1 with vLLM on a g5. 48xlarge runs at $0. 31/M tokens at 50% utilisation, dropping to $0. 18/M at 80%. Includes batching tradeoffs.

Read on news.ycombinator.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from Hacker News

See more →

Show HN: RLM-based local debugger for AI agent traces

Hacker News·mikepollard_dev

1w ago

FeaturedOriginal

Show HN: RLM-based local debugger for AI agent traces

AI Summary

HALO (Hierarchal Agent Loop Optimizer) is an open-source tool designed for debugging AI agents by analyzing OTEL compliant execution traces. It utilizes a Recursive Language Model (RLM) to efficiently identify patterns and systemic issues, enabling developers to optimize their agents iteratively without complex setups.

#LLM #Agent #Open Source

What's the actual cost of running a 70B Llama on AWS?

Quick Answer

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from Hacker News

Show HN: RLM-based local debugger for AI agent traces

Cursor reaches $500M ARR run-rate

Show HN: Pico — open-source on-device LLM router for AI coding agents

Related in this space

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw