Skip to content

指南

基于 Gemini 2.5 + @google/genai 新 SDK 编写。重点接口与能力

速查

  • 新 SDK:google-genai(Python)/ @google/genai(Node)——旧 google-generativeai 已废弃
  • 模型 ID:gemini-2.5-pro / gemini-2.5-flash / gemini-2.5-flash-lite / gemini-2.5-flash-image
  • 上下文:Pro 2M / Flash 1M(业界最大)
  • 多模态:image / video / audio / pdf 原生(无需预处理)
  • Function Calling:tools: [{function_declarations: [...]}]
  • Grounding:tools: [{google_search: {}}] / {url_context: {}} / {google_maps: {}}
  • Code Execution:tools: [{code_execution: {}}] 一类内置
  • Implicit Cache:自动(5min)/ Explicit Cache:caches.create(...)(1h+ TTL)
  • Live API:client.aio.live.connect(...)(双向实时)
  • Thinking:thinking_config: {thinking_budget: 8192} (2.5 Pro 起)

接口对比一览

APIGeminiClaudeGPT
端点generateContent / streamGenerateContentmessageschat.completions / responses
历史结构contents 数组(role + parts)messages 数组(role + content)messages 数组(role + content)
系统提示system_instruction 字段顶层 system 字段messages: [{role: "system"}]instructions
文本块parts: [{text: "..."}]content: [{type: "text", text: "..."}]content: [{type: "text", text: "..."}]
图像块parts: [{inline_data: {mime_type, data}}]content: [{type: "image", source: {...}}]content: [{type: "image_url", image_url: {url}}]
文件复用parts: [{file_data: {file_uri, mime_type}}]content: [{type: "document", source: {type: "file", file_id}}]messages: [{type: "image_url", image_url: {url}}](仅图)

generateContent 完整字段

ts
interface GenerateContentParams {
  model: string;
  contents: Content | Content[];
  config?: {
    system_instruction?: string | Content;
    temperature?: number;
    top_p?: number;
    top_k?: number;
    candidate_count?: number;
    max_output_tokens?: number;
    stop_sequences?: string[];
    response_mime_type?: "text/plain" | "application/json";
    response_schema?: JSONSchema | PydanticModel;
    seed?: number;
    presence_penalty?: number;
    frequency_penalty?: number;
    response_logprobs?: boolean;
    logprobs?: number;
    safety_settings?: SafetySetting[];
    tools?: Tool[];
    tool_config?: ToolConfig;
    thinking_config?: { thinking_budget: number };  // 2.5 Pro 起
    cached_content?: string;                          // Explicit cache 引用
  };
}

Implicit Cache(自动)

默认开启——无需 cache_control。模型自动检测请求前缀是否与最近请求相同,命中则缓存。

python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        {"role": "user", "parts": [{"text": LONG_CONTEXT}]},   # > 32K
        {"role": "user", "parts": [{"text": "今天的问题"}]},
    ],
)

print(response.usage_metadata.cached_content_token_count)   # 命中数

条件

  • 请求前缀(system + 早期 messages)总 ≥ 32K(Flash)或 4K(Pro)
  • 5 分钟内重复

价格:cached_content_token_count 按 25% 价(75% 折扣,业界最高)。

Explicit Cache(手动 1h+)

需要更长 TTL(如 RAG 应用每天用同样上下文):

python
# 1. 创建 cache
cache = client.caches.create(
    model="gemini-2.5-flash",
    config={
        "contents": [
            {"role": "user", "parts": [{"text": LONG_DOC}]},
        ],
        "system_instruction": "你是文档助手...",
        "ttl": "3600s",   # 1 小时(可更长)
    },
)

# 2. 调用时引用
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="文档里关于退款的部分?",
    config={"cached_content": cache.name},
)

TTL:默认 1h,可设最长 days。

价格

  • 创建:standard 价
  • 调用时引用的 token:75% 折扣
  • TTL 期间存储费:每小时 $1/M tokens(Pro)/ $0.1875/M(Flash)

Function Calling

python
# 定义函数
weather_function = {
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {"type": "string"},
            "unit": {"type": "string", "enum": ["c", "f"]},
        },
        "required": ["city"],
    },
}

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="上海现在多少度?",
    config={
        "tools": [{"function_declarations": [weather_function]}],
        "tool_config": {
            "function_calling_config": {
                "mode": "AUTO",   # AUTO / ANY / NONE
                "allowed_function_names": ["get_weather"],   # 可选
            },
        },
    },
)

# 处理 function calls
for part in response.candidates[0].content.parts:
    if part.function_call:
        result = my_get_weather(**part.function_call.args)
        # 把结果继续发回

对比

  • GPT: tools: [{type: "function", function: {...}}]
  • Claude: tools: [{name, input_schema}](无包装层)
  • Gemini: tools: [{function_declarations: [{name, parameters: {...}}]}](多层嵌套)

内置工具

google_search(Grounding)

python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="2026 NBA 总冠军是?",
    config={"tools": [{"google_search": {}}]},
)

# 引用源
metadata = response.candidates[0].grounding_metadata
for chunk in metadata.grounding_chunks:
    print(chunk.web.uri, chunk.web.title)
for support in metadata.grounding_supports:
    print(f"段落 {support.segment.text} 引用 {support.grounding_chunk_indices}")

url_context

python
config={
    "tools": [{"url_context": {}}],
}

contents = "总结这篇文章:https://example.com/article"
# Gemini 自动 fetch URL 内容

google_maps

python
config={
    "tools": [{"google_maps": {}}],
}

contents = "上海陆家嘴附近的咖啡馆推荐"
# Gemini 调 Google Maps API

code_execution

python
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="计算 ln(2) 的连分数前 10 项",
    config={"tools": [{"code_execution": {}}]},
)

# 输出含 code + 执行结果
for part in response.candidates[0].content.parts:
    if part.executable_code:
        print(f"[Code]\n{part.executable_code.code}")
    elif part.code_execution_result:
        print(f"[Output]\n{part.code_execution_result.output}")

Claude 对比:完全无内置工具,需要 MCP。

GPT 对比web_search / code_interpreter / image_generation / file_search 4 个内置;Gemini 4 个不一样的(google_search / url_context / google_maps / code_execution)。

Thinking(推理模式)

python
response = client.models.generate_content(
    model="gemini-2.5-pro",   # Flash 也支持
    contents="证明:对任意正整数 n,n^2 - n 是偶数",
    config={
        "thinking_config": {
            "thinking_budget": 8192,   # 思考 tokens 预算
        },
    },
)

# 思考内容(通常不可见)
for part in response.candidates[0].content.parts:
    if part.thought:
        print(f"[Thinking]\n{part.thought}")
    elif part.text:
        print(f"[Answer]\n{part.text}")

print(f"思考 tokens: {response.usage_metadata.thoughts_token_count}")

对比

  • Claude: thinking: {type: "enabled", budget_tokens: N},response 含 type: "thinking" block
  • GPT o-series: reasoning_effort: "high"(间接控制 budget),response 不暴露 thinking
  • Gemini: thinking_config: {thinking_budget: N},response 含 part.thought

Files API

python
# 上传
my_file = client.files.upload(
    file="paper.pdf",
    config={"mime_type": "application/pdf"},
)

# 等处理完(视频 / 大 PDF 需要时间)
while my_file.state.name == "PROCESSING":
    time.sleep(5)
    my_file = client.files.get(name=my_file.name)

# 用
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[my_file, "总结这份文档"],
)

支持格式

类型mime types大小上限
png / jpeg / gif / webp / heic / heif7 MB(inline)/ 50 MB(File)
视频mp4 / mpeg / mov / avi / wmv / flv / webm / 3gpp / 3gpp22 GB / 1 小时
音频mp3 / wav / aiff / aac / ogg / flac9.5 小时
PDFapplication/pdf50 MB / 1000 页
文本plain / html / json / md / css / xml / rtf / csv100 MB

保留:48 小时(免费)/ Vertex 可配置更长。

Live API 完整

python
import asyncio
from google import genai
from google.genai import types

async def live_session():
    client = genai.Client()

    config = types.LiveConnectConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(
                    voice_name="Aoede"   # Puck / Charon / Kore / Fenrir / Aoede
                )
            )
        ),
        system_instruction="你是友好的客服助手",
        tools=[{"function_declarations": [...]}],
    )

    async with client.aio.live.connect(
        model="gemini-live-2.5-flash-preview",
        config=config,
    ) as session:
        # 文 / 音 / 视频帧 都可发
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": "你好"}]},
        )

        # 实时接
        async for chunk in session.receive():
            if chunk.data:   # 音频块
                play_audio(chunk.data)
            elif chunk.text:
                print(chunk.text, end="")

asyncio.run(live_session())

支持模态

  • TEXT / AUDIO 单选
  • 输入:文本 / 音频流 / 视频帧
  • 输出:文本 / 音频流(不能同时两者)

对比 GPT Realtime

  • Gemini Live:含视频帧输入 / 5 种 voice 选择 / TEXT 或 AUDIO 单选
  • GPT Realtime:含 TEXT + AUDIO 同时输出 / 多种 voice / 不支持视频帧

Safety Settings

python
config={
    "safety_settings": [
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",   # BLOCK_NONE / BLOCK_LOW_AND_ABOVE / BLOCK_MEDIUM_AND_ABOVE / BLOCK_ONLY_HIGH
        },
        {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE"},
        {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
        {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE"},
    ],
}

默认 BLOCK_MEDIUM_AND_ABOVE——Gemini 内容审核比 GPT / Claude 更严。某些技术内容(如说明黑客技巧的中性教学)会被 SAFETY block。需要时调低阈值。

Block 处理

python
if response.candidates[0].finish_reason == types.FinishReason.SAFETY:
    print("被 SAFETY block")
    for rating in response.candidates[0].safety_ratings:
        print(rating)

Streaming

python
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="...",
):
    print(chunk.text, end="", flush=True)

或 async:

python
async for chunk in client.aio.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="...",
):
    print(chunk.text, end="")

Batches(Vertex AI)

python
# 仅 Vertex 支持
from google.cloud import aiplatform

batch_job = aiplatform.BatchPredictionJob.create(
    model_name="gemini-2.5-flash",
    job_display_name="my-batch",
    gcs_source="gs://my-bucket/inputs.jsonl",
    gcs_destination_prefix="gs://my-bucket/outputs/",
)

价格:50% 标准价。

Vertex AI 与 AI Studio 差异

维度AI Studio APIVertex AI
鉴权API keyGoogle Cloud ADC(OAuth / service account)
Endpointgenerativelanguage.googleapis.com<region>-aiplatform.googleapis.com
SLA无承诺Enterprise SLA
Region全球(自动路由)按 region
价格标准同标准
Batches-
私有 endpoint-
IAM 集成-
日志审计-
python
# AI Studio
client = genai.Client(api_key="AIzaSy...")

# Vertex
client = genai.Client(
    vertexai=True,
    project="my-gcp",
    location="us-central1",
)

API 调用代码完全相同——仅 client 初始化不同。

错误处理

python
from google.genai import errors

try:
    response = client.models.generate_content(...)
except errors.APIError as e:
    if e.status_code == 429:
        time.sleep(60)
        response = client.models.generate_content(...)
    elif e.status_code == 400:
        # 参数错(model ID / safety / size)
        ...

大陆访问

不可直连。方案:

方案难度
自备代理
Vertex AI + 某些 region
OpenRouter

故障排查

现象排查
401 unauthorizedAPI key 错 / 过期
429 quota_exceeded超免费配额 / RPM
400 invalid_argument参数错(model ID / safety / mime type)
503 unavailableGoogle 内部错(重试)
输出 SAFETY block调低 safety_settings 阈值
视频/PDF 超时用 Files API 上传等处理完
Function 不调tool_config.mode: "ANY" 强制
大陆延迟高用 Vertex / 代理
中文回复不自然system_instruction 显式「用流畅中文回答」

版本里程碑

模型时间主要变化
Gemini 1.02023 末首发(Bard 改名)
Gemini 1.52024 中1M 上下文(业界首)
Gemini 2.02024 末全多模态 / Live API
Gemini 2.52025Thinking / Pro 2M / Implicit Cache
Gemini 2.5 Flash-Image2025 末Nano Banana(图像生成)
Gemini Live 2.52026视频帧 + 多 voice