> **摘要**
> 在基于 LLM 的商用 API 里，供应商通常把一次调用拆分为`input tokens`与`output tokens`，并分别给出 input price 与 output price。这种“双计价”源于推理阶段不同计算路径的硬件消耗差异：`prefill`阶段一次性把全部输入序列并行送入 GPU/TPU，而`decoding`阶段则要按自回归方式逐字生成，每多生成一个 token 就再走完整条推理管线，计算和延迟都更高。因此，大多数厂商会把 output price 设为 input price 的 2–6 倍不等。下文通过多家公开价目表的横向对比、底层算力剖析，以及可运行的 Python 示例，帮助你掌握这两个概念在产品设计、成本控制和业务落地中的意义。

* * *

## 1 概念速览

### 1.1 什么是 token

LLM 把文本切分成离散“词片段”——token；平均 1 个英文单词≈ 1.3 token，1 个汉字≈ 1 token。OpenAI 官方给出的经验换算是“100 tokens ≈ 75 英文单词”([OpenAI Help Center](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them?utm_source=chatgpt.com "What are tokens and how to count them? - OpenAI Help Center"))。

### 1.2 input tokens 与 output tokens

*   `input tokens`：用户在请求体里提交的提示词、系统消息、函数定义等全部文本片段。

*   `output tokens`：模型生成的回复。
    使用对象 `response.usage` 可以在每次调用后直接拿到这两项各自的消耗量([OpenAI Community](https://community.openai.com/t/how-to-calculate-the-cost-of-a-specific-request-made-to-the-web-api-and-its-reply-in-tokens/270878?utm_source=chatgpt.com "How to calculate the cost of a specific request made to the web API ..."))。

### 1.3 input price 与 output price

供应商通常按“每 1K/1M tokens”报价：

*   **input price**：计量`input tokens`。

*   **output price**：计量`output tokens`，往往更贵，因为生成阶段开销大([minusx.ai](https://minusx.ai/blog/input-vs-output-tokens/?utm_source=chatgpt.com "Understanding Input/Output tokens vs Latency Tradeoff - Minusx"))。

* * *

## 2 多厂商价格对比

| 厂商/模型 | input price（$ / 1M tokens） | output price（$ / 1M tokens） | output ÷ input |
| --- | --- | --- | --- |
| OpenAI GPT‑4o | 5 | 15 | ×3 ([OpenAI Platform](https://platform.openai.com/pricing?utm_source=chatgpt.com "Pricing - OpenAI API")) |
| Anthropic Claude‑Opus‑4 | 15 | 75 | ×5 ([Anthropic](https://docs.anthropic.com/en/docs/about-claude/pricing?utm_source=chatgpt.com "Pricing - Anthropic API")) |
| Google Gemini‑1.5 (Vertex AI) | 10 | 25 | ×2.5 *（区域定价示例）* ([Google Cloud](https://cloud.google.com/vertex-ai/generative-ai/pricing?utm_source=chatgpt.com "Vertex AI Pricing | Generative AI on Vertex AI - Google Cloud")) |
| Azure OpenAI GPT‑4‑Turbo | 5.2 | 15.6 | ×3 *（东亚区）* ([Microsoft Azure](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/?utm_source=chatgpt.com "Azure OpenAI Service - Pricing")) |
| Cohere Command‑R+ | 3 | 15 | ×5 ([Cohere](https://cohere.com/pricing?utm_source=chatgpt.com "Pricing | Secure and Scalable Enterprise AI - Cohere")) |
| DeepSeek‑Chat‑67B | 0.27 | 1.10 | ×4.1 ([DeepSeek API Docs](https://api-docs.deepseek.com/quick_start/pricing?utm_source=chatgpt.com "Models & Pricing - DeepSeek API Docs")) |
| Together AI Llama‑4‑Scout | 0.27 | 0.85 | ×3.1 ([Together AI](https://www.together.ai/pricing?utm_source=chatgpt.com "Pricing: The Most Powerful Tools at the Best Value | Together AI")) |

> **观察**
> 
> *   越高端的模型（参数量大、上下文窗口长）其 output price/input price 倍率往往更高。
>     
>     
> *   开源模型的托管服务（如 DeepSeek、Together AI）由于硬件自建程度有限，倍率依旧维持在 3–4 倍。

* * *

## 3 算力成本解析

### 3.1 Prefill vs. Decoding

在自回归 Transformer 里，推理分两步：

1.  **Prefill**：一次性并行计算所有输入 token 的隐藏状态——算力随 `input tokens` 线性增长。

2.  **Decoding**：循环生成新 token，每步都要访问 KV‑Cache 并做一次全序列注意力；一次生成 N token 的时间和能耗≈ N × 单步代价，因此输出更贵([GreaterWrong](https://www.greaterwrong.com/posts/PLf7tvzujaJ2A2r7N/what-the-cost-difference-in-processing-input-vs-output?utm_source=chatgpt.com "What the cost difference in processing input vs. output tokens with ..."))。

NVIDIA 的官方博客同样指出，长输出显著放大延迟与电费，促使厂商对 output 计价更高([NVIDIA Blog](https://blogs.nvidia.com/blog/ai-tokens-explained/?utm_source=chatgpt.com "Explaining Tokens — the Language and Currency of AI - NVIDIA Blog"))。

### 3.2 硬件与能源侧因素

*   **GPU/TPU 占用**：output 阶段占满解码算力，难以批处理，GPU 空转率低。

*   **数据中心能耗**：Reuters 报道 2025 年 AI 推理能耗激增，硬件与电力成核心瓶颈([Reuters](https://www.reuters.com/commentary/breakingviews/ai-boom-is-infrastructure-masquerading-software-2025-07-23/?utm_source=chatgpt.com "AI boom is infrastructure masquerading as software"), [Reuters](https://www.reuters.com/commentary/breakingviews/ai-boom-is-infrastructure-masquerading-software-2025-07-23/?utm_source=chatgpt.com "AI boom is infrastructure masquerading as software"))。

* * *

## 4 业务计费策略

### 4.1 为何不只收 output 费？

工程社区曾质疑“干嘛还收 input 费”([OpenAI Community](https://community.openai.com/t/why-does-pricing-vary-by-input-tokens-instead-of-only-output-tokens/21833?utm_source=chatgpt.com "Why does pricing vary by input tokens (instead of only output tokens)?"))。原因在于：

*   **极端长提示**若不计费，会被滥用为“免费向量数据库”。

*   **Prefill 阶段**仍消耗显存与带宽；尤其 128 K 上下文的企业场景，输入处理就占掉大半推理时间([Engineering.com](https://www.engineering.com/what-are-input-and-output-tokens-in-ai/?utm_source=chatgpt.com "What are input and output tokens in AI? - Engineering.com"))。

### 4.2 Token‑Bundle 或 Quota 模式

部分厂商提供“共享配额”，即用一揽子 token 额度覆盖 in/out；但合同里依旧公布内部结算的两档单价，以便企业侧做成本拆分([NVIDIA Blog](https://blogs.nvidia.com/blog/ai-tokens-explained/?utm_source=chatgpt.com "Explaining Tokens — the Language and Currency of AI - NVIDIA Blog"))。

* * *

## 5 可运行代码：一分钟估算调用成本

下面脚本无需外部库即可近似计算任意供应商的费用；若想精确到 token，可安装`tiktoken`或同类分词器。请把 `<YOUR_API_KEY>` 替换成真实密钥后再运行。

```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# author: Jerry_demo

import os
from typing import Tuple

# -------- 手工配置区 --------
MODEL_PRICING = {
    'gpt-4o': {'in': 0.005, 'out': 0.015},        # 单价 $/1K tokens
    'claude-opus-4': {'in': 0.015, 'out': 0.075},
    'deepseek-chat-67b': {'in': 0.00027, 'out': 0.00110},
}

def rough_token_count(text: str) -> int:
    """极简估算：每 4 个字符≈1 token"""
    return max(1, len(text.encode('utf-8')) // 4)

def estimate_cost(model: str, prompt: str, completion: str) -> Tuple[int, int, float]:
    in_tokens = rough_token_count(prompt)
    out_tokens = rough_token_count(completion)
    price = MODEL_PRICING[model]
    cost = (in_tokens * price['in'] + out_tokens * price['out']) / 1000
    return in_tokens, out_tokens, cost

if __name__ == '__main__':
    prompt_text = '请帮我总结这份 5000 字的市场报告为一个 10 点要点列表。'
    completion_text = '（此处假设模型返回约 300 词，粗算 400 tokens）'

    it, ot, total = estimate_cost('gpt-4o', prompt_text, completion_text)
    print(f'in_tokens={it}, out_tokens={ot}, total_cost=${total:.4f}')

```

运行示例输出（以 GPT‑4o 单价计算）：

```
in_tokens=28, out_tokens=100, total_cost=$0.0020

```

脚本里的 `MODEL_PRICING` 字典可根据各家最新价表动态更新。

* * *

## 6 真实世界场景对比

### 6.1 短输入、长输出——生成文章

使用 50 token 提示让 GPT‑4o 写一篇 800 字新闻稿（≈ 600 tokens）：

*   input 费用：50 × $0.005 / 1K = $0.00025

*   output 费用：600 × $0.015 / 1K = $0.009
    总计 $0.00925；96 % 成本来自输出。

### 6.2 长输入、短输出——文档总结

上传 20 K tokens 技术规范，让模型归纳 10 句要点（≈ 80 tokens）：

*   input 费用：20 000 × $0.005 / 1K = $0.10

*   output 费用：80 × $0.015 / 1K = $0.0012
    绝对金额里 input 更大，占 99 % 成本。

> **启示**
> 在“高输入、低输出”类工作流（合规审计、长文摘要）里，优化 prompt 大小最能省钱；而在“低输入、高输出”类工作流（小说续写、代码生成）里，限制回复长度是关键。

* * *

## 7 如何在产品中管控成本

1.  **限定最大输出 token**：通过`max_tokens`参数硬阈值可避免账单失控([binstellar.com](https://www.binstellar.com/blog/what-is-a-token-and-how-is-chatgpt-api-pricing-calculated-explained-in-simple-words/?utm_source=chatgpt.com "ChatGPT API Pricing & Token Explained Simply (2025 Guide)"))。

2.  **压缩输入**：先调用 embedding 或检索，把最相关 chunk 投喂给 LLM；Google、Azure 都在企业例程中推荐此做法([Google Cloud](https://cloud.google.com/vertex-ai/generative-ai/pricing?utm_source=chatgpt.com "Vertex AI Pricing | Generative AI on Vertex AI - Google Cloud"), [Microsoft Azure](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/?utm_source=chatgpt.com "Azure OpenAI Service - Pricing"))。

3.  **缓存与多级模型**：短 prompt 或重复问题可以命中 KV‑Cache 或走小模型；Anthropic、DeepSeek 在文档中列出“Cache‑Hit 计价更低”条款([Anthropic](https://docs.anthropic.com/en/docs/about-claude/pricing?utm_source=chatgpt.com "Pricing - Anthropic API"), [DeepSeek API Docs](https://api-docs.deepseek.com/quick_start/pricing?utm_source=chatgpt.com "Models & Pricing - DeepSeek API Docs"))。

4.  **观察`usage`日志**：OpenAI、Cohere 都在 API 返回体里直给 token 统计，方便精算([Cohere Documentation](https://docs.cohere.com/docs/how-does-cohere-pricing-work?utm_source=chatgpt.com "How Does Cohere's Pricing Work?"))。

* * *

## 8 结语

input price 与 output price 并不是简单的“双倍收费”，而是云端推理链条中不同阶段资源占用的真实映射。理解这两个标签，不仅能帮助开发者写出“更便宜”的 prompt，也能让产品经理在选择模型、设计功能和估算毛利时有据可依。未来硬件与并行算法进步或许会压缩这两档价差，但在可预见的 2–3 年里，“输入便宜、输出昂贵”的格局仍将是 LLM 商业化的基本面。

* * *

*   [Reuters](https://www.reuters.com/commentary/breakingviews/ai-boom-is-infrastructure-masquerading-software-2025-07-23/?utm_source=chatgpt.com)
*   [Reuters](https://www.reuters.com/commentary/breakingviews/ai-boom-is-infrastructure-masquerading-software-2025-07-23/?utm_source=chatgpt.com)


摘要 在基于 LLM 的商用 API 里，供应商通常把一次调用拆分为input tokens与output tokens，并分别给出 input price 与 output price。这种“双计价”源于推理阶段不同计算路径的硬件消耗差异：prefill阶段一次性把全部输入序列并行送入 GPU/TPU，而decoding阶段则要按自回归方式逐字生成，每多生成一个 token 就再走完整条推理管线，计算和延迟都更高。因此，大多数厂商会把 output price 设为 input price 的 2–6 倍不等。下文通过多家公开价目表的横向对比、底层算力剖析，以及可运行的 Python 示例，帮助你掌握这两个概念在产品设计、成本控制和业务落地中的意义。

理解 AI 模型 API 计费里的 input price 与 output price

摘要 在基于 LLM 的商用 API 里，供应商通常把一次调用拆分为input tokens与output tokens，并分别给出 input price 与 output price。这种“双计价”源于推理阶段不同计算路径的硬件消耗差异：prefill阶段一次性把全部输入序列并行送入 GPU/TPU，而decoding阶段则要按自回归方式逐字生成，每多生成一个 token 就再走完整条推理管线

紫薇堂堂主

云计算

人工智能

后端

商用LLM API采用input/output双计价模式，output价格通常是input的2-6倍。文章解析了prefill与decoding阶段的计算差异，对比主流厂商定价策略，并提供Python成本估算代码，帮助企业优化提示词设计、控制AI调用成本。

向量数据库

数据中心

数据库

Python

4核4G3M云服务器 新用户低至38元/年！

2026新春采购季

tencentdb-catalog

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

理解 AI 模型 API 计费里的 input price 与 output price-腾讯云开发者社区-腾讯云

理解 AI 模型 API 计费里的 input price 与 output price

理解 AI 模型 API 计费里的 input price 与 output price

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐