时长比较短的音频:https://huggingface.co/datasets/PolyAI/minds14/viewer/en-US
时长比较长的音频:https://huggingface.co/datasets/librispeech_asr?row=8
此次测试过程暂时只使用比较短的音频
使用fast_whisper测试
下载安装,参考官方网站即可
报错提示:
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!
解决办法:
找到有libcudnn_ops_infer.so.8 的路径,在我的电脑中,改文件所在的路径为
在终端导入 export LD_LIBRARY_PATH=/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
test_fast_whisper.py
import subprocess import os import time import unittest import openpyxl from pydub import AudioSegment from datasets import load_dataset from faster_whisper import WhisperModel class TestFastWhisper(unittest.TestCase): def setUp(self): pass def test_fastwhisper(self): # 替换为您的脚本路径 # 设置HTTP代理 os.environ["http_proxy"] = "http://10.10.10.178:7890" os.environ["HTTP_PROXY"] = "http://10.10.10.178:7890" # 不知道此处为什么不能生效,必须要在终端中手动导入 os.environ["LD_LIBRARY_PATH"] = "/opt/audio/venv/lib/python3.10/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH" # 设置HTTPS代理 os.environ["https_proxy"] = "http://10.10.10.178:7890" os.environ["HTTPS_PROXY"] = "http://10.10.10.178:7890" print("load whisper") # 使用fast_whisper model_size = "large-v2" # Run on GPU with FP16 fast_whisper_model = WhisperModel(model_size, device="cuda", compute_type="float16") minds_14 = load_dataset("PolyAI/minds14", "en-US", split="train") # for en-US workbook = openpyxl.Workbook() # 创建一个工作表 worksheet = workbook.active # 设置表头 worksheet["A1"] = "Audio Path" worksheet["B1"] = "Audio Duration (seconds)" worksheet["C1"] = "Audio Size (MB)" worksheet["D1"] = "Correct Text" worksheet["E1"] = "Transcribed Text" worksheet["F1"] = "Cost Time (seconds)" for index, each in enumerate(minds_14, start=2): audioPath = each["path"] print(audioPath) # audioArray = each["audio"] audioDuration = len(AudioSegment.from_file(audioPath))/1000 audioSize = os.path.getsize(audioPath)/ (1024 * 1024) CorrectText = each["transcription"] tran_start_time = time.time() segments, info = fast_whisper_model.transcribe(audioPath, beam_size=5) segments = list(segments) # The transcription will actually run here. print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) text = "" for segment in segments: text += segment.text cost_time = time.time() - tran_start_time print("Audio Path:", audioPath) print("Audio Duration (seconds):", audioDuration) print("Audio Size (MB):", audioSize) print("Correct Text:", CorrectText) print("Transcription Time (seconds):", cost_time) print("Transcribed Text:", text) worksheet[f"A{index}"] = audioPath worksheet[f"B{index}"] = audioDuration worksheet[f"C{index}"] = audioSize worksheet[f"D{index}"] = CorrectText worksheet[f"E{index}"] = text worksheet[f"F{index}"] = cost_time # break workbook.save("fast_whisper_output_data.xlsx") print("数据已保存到 fast_whisper_output_data.xlsx 文件") if __name__ == '__main__': unittest.main()
使用whisper测试
下载安装,参考官方网站即可,代码与上面代码类似
测试结果可视化
不太熟悉用numbers,凑合着看一下就行
很明显,fast_whisper速度要更快一些
whisperiosproxyunitscripttpupythonhuggingfacexlsxcodeurlasrshare可视化工作表gpucto