8.8K Star！真正的本地实时 Whisper 开源神器，彻底干翻云端 STT！

开源星探

发布于 2026-03-16 20:25:51

4270

文章被收录于专栏：翩翩白衣少年翩翩白衣少年

在项目里想做一个真正实时的语音转文字系统，一般只有两条路。

一是上云端 API：延迟低、断句好，但贵，而且隐私敏感场景根本不敢用。

二是本地跑 Whisper：虽然免费，但延迟高、断句不自然。

结果这两年大家几乎都在问同一个问题：

有没有可能在本地就做出云端级别的实时 ASR 体验？

我最近在 GitHub 上挖到一个宝：WhisperLiveKit。

它不是简单的 Whisper 包装，而是一套专门为本地流式语音识别优化的全栈解决方案，真正把 Whisper 流式延迟高的痛点给干碎了。

GitHub：https://github.com/QuentinFuxa/WhisperLiveKit

主要功能

• 极速流式处理：通过优化的 VAD（语音活动检测）和音频切片逻辑，大幅降低延迟。
• 说话人区分：内置说话人跟踪、说话人切换检测、自动标注，对团队会议、访谈录音、用户访谈特别有用。
• 多引擎支持：支持 NVIDIA 显卡的 faster-whisper，也支持苹果 M 芯片的 mlx-whisper。
• 实时翻译：集成了 NLLB 模型，支持 200 种语言的实时互译。
• 开箱即用：提供了完整的服务端（Python）和客户端（Web/React）示例。

快速入手

WhisperLiveKit 是一个开源的 Python 工具包，使用 pip 命令即可一键安装。

pip install whisperlivekit

安装成功后，命令行就会有 wlk 命令可使用。

启动转录服务器

wlk --model base --language en

然后打开浏览器并访问http://localhost:8000。开始说话，即可实时观看你的话语！

wlk 支持各种参数启动，部署参数如下：

Python API集成

import asyncio
from contextlib import asynccontextmanager
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from whisperlivekit import AudioProcessor, TranscriptionEngine, parse_args
transcription_engine = None
@asynccontextmanager
async def lifespan(app: FastAPI):
    global transcription_engine
    transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
    yield
app = FastAPI(lifespan=lifespan)
async def handle_websocket_results(websocket: WebSocket, results_generator):
    async for response in results_generator:
        await websocket.send_json(response)
    await websocket.send_json({"type": "ready_to_stop"})
@app.websocket("/asr")
async def websocket_endpoint(websocket: WebSocket):
    global transcription_engine
    # Create a new AudioProcessor for each connection, passing the shared engine
    audio_processor = AudioProcessor(transcription_engine=transcription_engine)    
    results_generator = await audio_processor.create_tasks()
    results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
    await websocket.accept()
    while True:
        message = await websocket.receive_bytes()
        await audio_processor.process_audio(message)