Rust AWS实时语音转文字库aws-sdk-transcribestreaming的使用,支持流式音频转录与实时文本转换
Rust AWS实时语音转文字库aws-sdk-transcribestreaming的使用,支持流式音频转录与实时文本转换
Amazon Transcribe streaming提供四种主要的实时转录类型:
- 标准转录是最常见的选择
- 医疗转录专为医疗专业人员设计,包含医学术语
- 通话分析转录设计用于双通道呼叫中心音频
- HealthScribe转录使用生成式AI从医患对话中自动创建临床笔记
开始使用
SDK为每个AWS服务提供一个crate。您必须在Rust项目中添加Tokio作为依赖项来执行异步代码。要将aws-sdk-transcribestreaming
添加到您的项目中,请在Cargo.toml文件中添加以下内容:
[dependencies]
aws-config = { version = "1.1.7", features = ["behavior-version-latest"] }
aws-sdk-transcribestreaming = "1.81.0"
tokio = { version = "1", features = ["full"] }
然后在代码中,可以创建如下客户端:
use aws_sdk_transcribestreaming as transcribestreaming;
#[::tokio::main]
async fn main() -> Result<(), transcribestreaming::Error> {
let config = aws_config::load_from_env().await;
let client = aws_sdk_transcribestreaming::Client::new(&config);
// ... 使用客户端进行调用
Ok(())
}
完整示例代码
以下是使用aws-sdk-transcribestreaming进行流式音频转录的完整示例:
use aws_sdk_transcribestreaming::{Client, Error};
use aws_sdk_transcribestreaming::model::{AudioEvent, AudioStream, LanguageCode, MediaEncoding};
use bytes::Bytes;
use std::path::Path;
use tokio::fs::File;
use tokio::io::AsyncReadExt;
#[tokio::main]
async fn main() -> Result<(), Error> {
// 加载AWS配置
let config = aws_config::load_from_env().await;
let client = Client::new(&config);
// 音频文件路径
let audio_path = Path::new("audio.wav");
// 打开音频文件
let mut audio_file = File::open(audio_path).await?;
let mut audio_buffer = Vec::new();
audio_file.read_to_end(&mut audio_buffer).await?;
// 创建音频流
let audio_stream = AudioStream::builder()
.audio_event(AudioEvent::builder()
.audio_chunk(Bytes::from(audio_buffer))
.build())
.build();
// 开始转录会话
let mut session = client
.start_stream_transcription()
.language_code(LanguageCode::EnUs)
.media_encoding(MediaEncoding::Pcm)
.media_sample_rate_hertz(44100)
.audio_stream(audio_stream)
.send()
.await?;
// 处理转录结果
while let Some(event) = session.transcript_result_stream.next().await? {
if let Some(transcript) = event.transcript_event {
if let Some(results) = transcript.transcript.results {
for result in results {
if result.is_partial {
println!("Partial result: {:?}", result);
} else {
println!("Final result: {:?}", result);
}
}
}
}
}
Ok(())
}
这个示例展示了如何:
- 配置AWS客户端
- 读取音频文件
- 创建音频流
- 启动流式转录会话
- 处理转录结果(包括部分和最终结果)
获取帮助
- 讨论区 - 用于想法、RFC和一般问题
- 问题跟踪 - 用于错误报告和功能请求
- 生成的文档(最新版本)
- 使用示例
许可证
该项目采用Apache-2.0许可证。
1 回复
Rust AWS实时语音转文字库aws-sdk-transcribestreaming使用指南
介绍
aws-sdk-transcribestreaming
是AWS官方提供的Rust SDK,用于实现实时语音转文字(语音识别)功能。它支持流式音频转录,可以将音频流实时转换为文本,适用于需要实时字幕、语音助手、客服系统等场景。
主要特点:
- 低延迟的实时语音识别
- 支持流式传输音频数据
- 自动处理音频分块和转录
- 支持多种语言和方言
- 与AWS生态系统无缝集成
使用方法
1. 添加依赖
首先在Cargo.toml
中添加依赖:
[dependencies]
aws-config = "0.55"
aws-sdk-transcribestreaming = "0.24"
tokio = { version = "1.0", features = ["full"] }
tokio-util = { version = "0.7", features = ["io"] }
futures = "0.3"
2. 基本使用示例
use aws_sdk_transcribestreaming::{
client::Client,
model::{
AudioEvent, AudioStream, LanguageCode, MediaEncoding,
StartStreamTranscriptionRequest,
},
output::StartStreamTranscriptionOutput,
error::StartStreamTranscriptionError,
};
use bytes::Bytes;
use futures::StreamExt;
use std::error::Error;
async fn transcribe_audio_stream(
client: &Client,
audio_stream: impl futures_core::Stream<Item = Result<Bytes, Box<dyn Error + Send + Sync>>> + Send + 'static,
) -> Result<StartStreamTranscriptionOutput, StartStreamTranscriptionError> {
// 准备音频流
let audio_stream = audio_stream.map(|chunk| {
let chunk = chunk?;
Ok(AudioEvent::builder().audio_chunk(chunk).build())
});
let audio_stream = AudioStream::from_event_stream(audio_stream);
// 构建请求
let request = StartStreamTranscriptionRequest::builder()
.language_code(LanguageCode::EnUs)
.media_encoding(MediaEncoding::Pcm)
.media_sample_rate_hertz(16_000)
.audio_stream(audio_stream)
.build()?;
// 发送请求并获取响应流
let response = client.start_stream_transcription().set_request(request).send().await?;
Ok(response)
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
// 创建AWS配置
let config = aws_config::load_from_env().await;
let client = Client::new(&config);
// 模拟音频流 - 实际应用中应该来自麦克风或音频文件
let audio_data = vec![Bytes::from_static(b"raw audio data here")];
let audio_stream = futures::stream::iter(audio_data.into_iter().map(Ok));
// 开始转录
let mut response = transcribe_ audio_stream(&client, audio_stream).await?;
// 处理转录结果
while let Some(event) = response.transcript_result_stream.recv().await? {
if let Some(transcript) = event.transcript_event {
for result in transcript.transcript.results {
if result.is_partial {
println!("[Partial] {}", result.alternatives[0].transcript);
} else {
println!("[Final] {}", result.alternatives[0].transcript);
}
}
}
}
Ok(())
}
3. 从文件读取音频并转录
use std::path::Path;
use tokio::fs::File;
use tokio_util::codec::{BytesCodec, FramedRead};
async fn transcribe_audio_file(
client: &Client,
file_path: impl AsRef<Path>,
) -> Result<(), Box<dyn Error>> {
let file = File::open(file_path).await?;
let audio_stream = FramedRead::new(file, BytesCodec::new())
.map(|chunk| chunk.map(|bytes| bytes.freeze()));
let mut response = transcribe_audio_stream(&client, audio_stream).await?;
while let Some(event) = response.transcript_result_stream.recv().await? {
if let Some(transcript) = event.transcript_event {
for result in transcript.transcript.results {
println!("Transcript: {}", result.alternatives[0].transcript);
}
}
}
Ok(())
}
4. 处理实时麦克风输入
use cpal::{
traits::{DeviceTrait, HostTrait, StreamTrait},
StreamConfig,
};
use futures::channel::mpsc;
use std::sync::Arc;
async fn transcribe_microphone(
client: &Client,
) -> Result<(), Box<dyn Error>> {
let (tx, rx) = mpsc::channel(1024);
// 设置音频输入设备
let host = cpal::default_host();
let input_device = host.default_input_device().expect("No input device available");
let config = StreamConfig {
channels: 1,
sample_rate: cpal::SampleRate(16000),
buffer_size: cpal::BufferSize::Default,
};
let tx_arc = Arc::new(tokio::sync::Mutex::new(tx));
let input_stream = input_device.build_input_stream(
&config,
move |data: &[f32], _: &_| {
let bytes: Vec<u8> = data.iter()
.flat_map(|&sample| sample.to_ne_bytes().to_vec())
.collect();
let tx = tx_arc.clone();
tokio::spawn(async move {
let mut tx = tx.lock().await;
tx.try_send(Ok(Bytes::from(bytes))).unwrap();
});
},
|err| eprintln!("Audio stream error: {:?}", err),
None,
)?;
input_stream.play()?;
// 开始转录
let mut response = transcribe_audio_stream(&client, rx).await?;
while let Some(event) = response.transcript_result_stream.recv().await? {
if let Some(transcript) = event.transcript_event {
for result in transcript.transcript.results {
println!("{}", result.alternatives[0].transcript);
}
}
}
Ok(())
}
高级配置
自定义语言和音频参数
let request = StartStreamTranscriptionRequest::builder()
.language_code(LanguageCode::ZhCn) // 中文普通话
.media_encoding(MediaEncoding::Pcm)
.media_sample_rate_hertz(44_100) // 44.1kHz采样率
.vocabulary_name("my-custom-vocabulary") // 使用自定义词汇表
.show_speaker_label(true) // 显示说话人标签
.audio_stream(audio_stream)
.build()?;
错误处理
match transcribe_audio_stream(&client, audio_stream).await {
Ok(response) => {
// 处理成功响应
}
Err(StartStreamTranscriptionError::ServiceError(err)) => {
eprintln!("Service error: {:?}", err);
}
Err(StartStreamTranscriptionError::StreamError(err)) => {
eprintln!("Stream error: {:?}", err);
}
Err(err) => {
eprintln!("Unexpected error: {:?}", err);
}
}
注意事项
- AWS Transcribe Streaming 服务会产生费用,使用前请了解定价
- 音频格式必须符合要求(如PCM格式需要16kHz采样率)
- 实时转录有约2-3秒的延迟
- 需要正确的AWS凭证和区域配置
- 长时间运行的转录会话可能需要特殊处理
性能优化建议
- 使用适当的音频分块大小(通常100-200ms的音频数据)
- 考虑使用压缩音频格式如Ogg/Opus减少带宽
- 实现重连逻辑处理网络中断
- 缓存部分结果以减少重复传输
- 根据网络状况动态调整音频质量
完整示例代码
以下是一个完整的实时麦克风输入转录示例:
use aws_sdk_transcribestreaming::{
client::Client,
model::{
AudioEvent, AudioStream, LanguageCode, MediaEncoding,
StartStreamTranscriptionRequest,
},
output::StartStreamTranscriptionOutput,
error::StartStreamTranscriptionError,
};
use bytes::Bytes;
use futures::{StreamExt, channel::mpsc};
use tokio::sync::Mutex;
use std::sync::Arc;
use cpal::{
traits::{DeviceTrait, HostTrait, StreamTrait},
StreamConfig,
};
use std::error::Error;
async fn transcribe_audio_stream(
client: &Client,
audio_stream: impl futures_core::Stream<Item = Result<Bytes, Box<dyn Error + Send + Sync>>> + Send + 'static,
) -> Result<StartStreamTranscriptionOutput, StartStreamTranscriptionError> {
let audio_stream = audio_stream.map(|chunk| {
let chunk = chunk?;
Ok(AudioEvent::builder().audio_chunk(chunk).build())
});
let audio_stream = AudioStream::from_event_stream(audio_stream);
let request = StartStreamTranscriptionRequest::builder()
.language_code(LanguageCode::EnUs)
.media_encoding(MediaEncoding::Pcm)
.media_sample_rate_hertz(16_000)
.audio_stream(audio_stream)
.build()?;
client.start_stream_transcription().set_request(request).send().await
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
// 创建AWS客户端
let config = aws_config::load_from_env().await;
let client = Client::new(&config);
// 创建音频通道
let (tx, rx) = mpsc::channel(1024);
let tx_arc = Arc::new(Mutex::new(tx));
// 设置麦克风输入
let host = cpal::default_host();
let input_device = host.default_input_device().expect("No input device");
let config = StreamConfig {
channels: 1,
sample_rate: cpal::SampleRate(16000),
buffer_size: cpal::BufferSize::Default,
};
let input_stream = input_device.build_input_stream(
&config,
move |data: &[f32], _: &_| {
let bytes: Vec<u8> = data.iter()
.flat_map(|&sample| sample.to_ne_bytes().to_vec())
.collect();
let tx = tx_arc.clone();
tokio::spawn(async move {
let mut tx = tx.lock().await;
tx.try_send(Ok(Bytes::from(bytes))).unwrap();
});
},
|err| eprintln!("Audio stream error: {:?}", err),
None,
)?;
input_stream.play()?;
// 开始转录
let mut response = transcribe_audio_stream(&client, rx).await?;
// 处理转录结果
while let Some(event) = response.transcript_result_stream.recv().await? {
if let Some(transcript) = event.transcript_event {
for result in transcript.transcript.results {
if !result.is_partial {
println!("Transcript: {}", result.alternatives[0].transcript);
}
}
}
}
Ok(())
}