Python音频转文字：将中文语音转换成文本

使用 Python 实现视频或音频中中文语音转文本，可以使用语音识别库，例如 Google Cloud Speech-to-Text API。以下示例代码展示如何使用该 API 将音频转换成中文文本：

import argparse
import io
import os

from google.cloud import speech_v1p1beta1 as speech


def transcribe_speech(input_file):
    client = speech.SpeechClient()

    # 读取音频文件
    with io.open(input_file, 'rb') as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code='zh-CN',
    )

    # 发送识别请求
    response = client.recognize(config=config, audio=audio)

    # 处理识别结果
    transcript = ''
    for result in response.results:
        transcript += result.alternatives[0].transcript

    return transcript


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('input_file', help='Path to the input audio file')
    args = parser.parse_args()

    transcript = transcribe_speech(args.input_file)
    print(transcript)

要使用这段代码，你需要安装 google-cloud-speech 库。您还需要一个 Google Cloud 平台账户，并启用 Cloud Speech-to-Text API，并下载 Google Cloud 凭据文件。您可以将凭据文件的路径设置为 GOOGLE_APPLICATION_CREDENTIALS 环境变量。

使用命令行运行这段代码，例如：

python transcribe.py input_audio.wav

其中 input_audio.wav 是您要转换的音频文件的路径。运行结束后，会打印出音频中的中文转换后的文字。