What's the actual cost of running a 70B Llama on AWS?
Quick Answer
Running a 70B Llama 3.1 model on AWS using vLLM costs $0.31 per million tokens at 50% utilization, decreasing to $0.18 at 80% utilization.
Quick Take
Running a 70B Llama 3.1 model on AWS using vLLM costs $0.31 per million tokens at 50% utilization, decreasing to $0.18 at 80% utilization. This cost analysis includes considerations for batching tradeoffs, impacting users looking to optimize their cloud expenses.
Key Points
- Cost per million tokens is $0.31 at 50% utilization.
- Cost drops to $0.18 per million tokens at 80% utilization.
- Batching tradeoffs are included in the cost analysis.
- Model used is 70B Llama 3.1 with vLLM on AWS.
- Analysis aids users in optimizing cloud expenses.
Article Excerpt
From source RSS / original summaryDetailed breakdown: 70B Llama 3. 1 with vLLM on a g5. 48xlarge runs at $0. 31/M tokens at 50% utilisation, dropping to $0. 18/M at 80%. Includes batching tradeoffs.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from Hacker News
See more →Show HN: RLM-based local debugger for AI agent traces
HALO (Hierarchal Agent Loop Optimizer) is an open-source tool designed for debugging AI agents by analyzing OTEL compliant execution traces. It utilizes a Recursive Language Model (RLM) to efficiently identify patterns and systemic issues, enabling developers to optimize their agents iteratively without complex setups.


