Flutter语音识别插件deepgram_speech_to_text的使用

发布于 1周前 作者 yuanlaile 来自 Flutter

Flutter语音识别插件deepgram_speech_to_text的使用

Features

Speech-to-Text Status Methods
From File listen.file(), listen.path()
From URL listen.url()
From Byte listen.bytes()
From Audio Stream listen.live(), listen.liveListener()
Text-to-Speech Status Methods
From Text speak.text()
From Text Stream speak.live(), speak.liveSpeaker()
Agent Interaction Status Methods
Agent Interaction 🚧 agent.live()

PRs are welcome for all work-in-progress 🚧 features

Getting Started

All you need is a Deepgram API key. You can get a free one by signing up on Deepgram.

Usage

Initialize Client

First, create the client with optional parameters:

String apiKey = 'your_api_key';

// Optional parameters can be passed in client's baseQueryParams or in every method's queryParams
Deepgram deepgram = Deepgram(apiKey, baseQueryParams: {
  'model': 'nova-2-general',
  'detect_language': true,
  'filler_words': false,
  'punctuation': true,
  // More options here : https://developers.deepgram.com/reference/listen-file
});

Speech to Text

Call the methods under the proper listen subclass:

// From file
DeepgramListenResult resFile = await deepgram.listen.file(File('audio.wav'));

// From URL
DeepgramListenResult resUrl = await deepgram.listen.url('https://somewhere/audio.wav');

// From bytes
DeepgramListenResult resBytes = await deepgram.listen.bytes(List.from([1, 2, 3, 4, 5]));

// Streaming from audio stream (e.g., microphone)
Stream<List<int>> micStream = await AudioRecorder().startStream(RecordConfig(
  encoder: AudioEncoder.pcm16bits,
  sampleRate: 16000,
  numChannels: 1,
));

final streamParams = {
  'language': 'en', // Must specify encoding and sample_rate according to the audio stream
  'encoding': 'linear16',
  'sample_rate': 16000,
};

Deepgram deepgramStreaming = Deepgram(apiKey, baseQueryParams: streamParams);

// Automatically managed stream
Stream<DeepgramListenResult> streamAuto = deepgramStreaming.listen.live(micStream);

// Manually managed stream
DeepgramLiveListener listener = deepgramStreaming.listen.liveListener(micStream);
listener.stream.listen((res) {
    print(res.transcript);
});
listener.start();
listener.pause(); 
listener.resume();
listener.close();

Text to Speech

Call the methods under the proper speak subclass:

// From text
DeepgramSpeakResult resText = await deepgram.speak.text('Hello world');

// Streaming from text stream
Stream<String> textStream = Stream.fromIterable(['Hello', 'World']);
Stream<DeepgramSpeakResult> streamTTS = deepgram.speak.live(textStream);

// Manually managed stream
DeepgramLiveSpeaker speaker = deepgram.speak.liveSpeaker(textStream);
speaker.stream.listen((res) {
    print(res);
    // Use the audio data if needed
});

speaker.start();
speaker.flush();
speaker.clear();
speaker.close();

Debugging Common Errors

  • Ensure your API key is valid and has enough credits.
  • Check parameters; some models do not support certain parameters (e.g., whisper model with live streaming).
  • For empty transcripts/metadata only, verify that the encoding and sample rate match the audio stream.

Additional Information & Support

This package was created to fulfill specific needs since there were no Dart SDKs available for Deepgram. Contributions and feature requests are welcome on GitHub. If you find this project useful, consider supporting it here.

Complete Demo Example

Below is a complete demo example showcasing how to use the deepgram_speech_to_text plugin within a Flutter application:

import 'dart:io';
import 'package:flutter/material.dart';
import 'package:deepgram_speech_to_text/deepgram_speech_to_text.dart';
import 'package:record/record.dart';

void main() => runApp(MyApp());

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(title: Text('Deepgram Speech To Text Demo')),
        body: Center(child: SpeechToTextDemo()),
      ),
    );
  }
}

class SpeechToTextDemo extends StatefulWidget {
  @override
  _SpeechToTextDemoState createState() => _SpeechToTextDemoState();
}

class _SpeechToTextDemoState extends State<SpeechToTextDemo> {
  final String apiKey = "YOUR_API_KEY";
  final TextEditingController _textController = TextEditingController();
  bool _isRecording = false;
  bool _isTranscribing = false;
  String _transcription = "";

  late Deepgram deepgram;

  @override
  void initState() {
    super.initState();
    deepgram = Deepgram(apiKey, baseQueryParams: {
      'model': 'nova-2-general',
      'detect_language': true,
      'filler_words': false,
      'punctuation': true,
    });
  }

  Future<void> startRecording() async {
    final record = Record();
    if (await record.hasPermission()) {
      await record.start(
        path: null, // Use null for byte array output
        encoder: AudioEncoder.PCM_16, // Default is AAC
        bitRate: 128000,
        samplingRate: 16000,
      );
      setState(() {
        _isRecording = true;
      });

      final bytes = await record.stop();
      final audioStream = Stream.value(bytes);

      final streamParams = {
        'language': 'en',
        'encoding': 'linear16',
        'sample_rate': 16000,
      };

      final deepgramStreaming = Deepgram(apiKey, baseQueryParams: streamParams);
      final listener = deepgramStreaming.listen.liveListener(audioStream);

      listener.stream.listen((res) {
        setState(() {
          _transcription = res.transcript ?? '';
        });
      });

      listener.start();
      setState(() {
        _isTranscribing = true;
      });

      await Future.delayed(Duration(seconds: 5)); // Simulate transcription time
      listener.close();
      setState(() {
        _isTranscribing = false;
        _isRecording = false;
      });
    }
  }

  @override
  Widget build(BuildContext context) {
    return Column(
      mainAxisAlignment: MainAxisAlignment.center,
      children: <Widget>[
        ElevatedButton(
          onPressed: !_isRecording && !_isTranscribing ? startRecording : null,
          child: Text(_isRecording ? 'Recording...' : 'Start Recording'),
        ),
        SizedBox(height: 20),
        Text(_transcription),
      ],
    );
  }
}

In this demo, we’ve integrated the deepgram_speech_to_text plugin into a simple Flutter app that records audio from the device’s microphone and transcribes it using Deepgram’s speech-to-text service. The transcription result is then displayed on the screen. Adjust the API key and any other configurations as necessary for your environment.


更多关于Flutter语音识别插件deepgram_speech_to_text的使用的实战系列教程也可以访问 https://www.itying.com/category-92-b0.html

1 回复

更多关于Flutter语音识别插件deepgram_speech_to_text的使用的实战系列教程也可以访问 https://www.itying.com/category-92-b0.html


当然,以下是一个使用 deepgram_speech_to_text 插件在 Flutter 中进行语音识别的示例代码。这个插件允许你将音频数据发送到 Deepgram 的 API 并获取转录文本。

首先,确保你已经在 pubspec.yaml 文件中添加了 deepgram_speech_to_text 依赖:

dependencies:
  flutter:
    sdk: flutter
  deepgram_speech_to_text: ^latest_version  # 请替换为最新版本号

然后,运行 flutter pub get 来安装依赖。

接下来,创建一个 Flutter 应用并配置 deepgram_speech_to_text 插件。以下是一个完整的示例,展示如何使用该插件进行语音识别:

import 'package:flutter/material.dart';
import 'package:deepgram_speech_to_text/deepgram_speech_to_text.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(
          title: Text('Deepgram Speech to Text Example'),
        ),
        body: Center(
          child: SpeechRecognitionButton(),
        ),
      ),
    );
  }
}

class SpeechRecognitionButton extends StatefulWidget {
  @override
  _SpeechRecognitionButtonState createState() => _SpeechRecognitionButtonState();
}

class _SpeechRecognitionButtonState extends State<SpeechRecognitionButton> {
  final DeepgramSpeechToText _speechToText = DeepgramSpeechToText();
  String _transcription = '';

  @override
  void initState() {
    super.initState();
    // 配置 Deepgram API 密钥(请替换为你的实际密钥)
    _speechToText.configure(apiKey: 'YOUR_DEEPGRAM_API_KEY');
  }

  @override
  Widget build(BuildContext context) {
    return ElevatedButton(
      onPressed: () async {
        // 开始语音识别
        _startSpeechRecognition();
      },
      child: Text('Start Speech Recognition'),
    );
  }

  Future<void> _startSpeechRecognition() async {
    // 录制音频并发送到 Deepgram API
    try {
      final result = await _speechToText.recognizeSpeechFromMic(
        languageModel: 'en-US',  // 选择语言模型
      );
      setState(() {
        _transcription = result.transcript;
      });
      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(
          content: Text('Transcription: $_transcription'),
        ),
      );
    } catch (e) {
      print('Error: $e');
      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(
          content: Text('Error during speech recognition'),
        ),
      );
    }
  }
}

注意事项:

  1. API 密钥:确保你已经从 Deepgram 获取了 API 密钥,并在代码中替换 'YOUR_DEEPGRAM_API_KEY'
  2. 权限:在 Android 和 iOS 上,你需要申请麦克风权限。对于 Android,这通常在 AndroidManifest.xml 中自动处理。对于 iOS,你需要在 Info.plist 中添加麦克风权限请求。
  3. 错误处理:示例代码中有基本的错误处理,但你可能需要根据实际需求进行更详细的错误处理。

iOS 麦克风权限配置(Info.plist)

ios/Runner/Info.plist 文件中添加以下内容:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to your microphone to perform speech recognition.</string>

Android 麦克风权限配置(AndroidManifest.xml)

通常,deepgram_speech_to_text 插件会自动处理 Android 权限请求,但如果你遇到权限问题,可以手动检查 AndroidManifest.xml 文件中是否包含以下内容:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

这个示例代码展示了如何使用 deepgram_speech_to_text 插件进行基本的语音识别。根据你的具体需求,你可能需要进一步定制和扩展功能。

回到顶部