openai开源的whisper在huggingface中使用例子(语音转文字中文)

openai开源的whisper在huggingface中使用例子(语音转文字中文)

    正在检查是否收录...
一言准备中...

openai开源的语音转文字支持多语言在huggingface中使用例子。
目前发现多语言模型large-v2支持中文是繁体,因此需要繁体转简体。
后续编写微调训练例子

GitHub地址:
https://github.com/openai/whisper

!pip install zhconv !pip install whisper !pip install tqdm !pip install ffmpeg-python !pip install transformers !pip install librosa from transformers import WhisperProcessor, WhisperForConditionalGeneration import librosa import torch from zhconv import convert import warnings warnings.filterwarnings("ignore") audio_file = f"test.wav" #load audio file audio, sampling_rate = librosa.load(audio_file, sr=16_000) # # audio # display.Audio(audio_file, autoplay=True) # load model and processor processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2") model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2") tokenizer = WhisperProcessor.from_pretrained("openai/whisper-large-v2") processor.save_pretrained("openai/model/whisper-large-v2") model.save_pretrained("openai/model/whisper-large-v2") tokenizer.save_pretrained("openai/model/whisper-large-v2") processor = WhisperProcessor.from_pretrained("openai/model/whisper-large-v2") model = WhisperForConditionalGeneration.from_pretrained("openai/model/whisper-large-v2") tokenizer = WhisperProcessor.from_pretrained("openai/model/whisper-large-v2") # load dummy dataset and read soundfiles # ds = load_dataset("common_voice", "fr", split="test", streaming=True) # ds = ds.cast_column("audio", datasets.Audio(sampling_rate=16_000)) # input_speech = next(iter(ds))["audio"]["array"] model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe") input_features = processor(audio, return_tensors="pt").input_features predicted_ids = model.generate(input_features) # transcription = processor.batch_decode(predicted_ids) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription) print('转化为简体结果:', convert(transcription, 'zh-cn')) 
It is h2ly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug. ['启动开始录音'] 转化为简体结果: 启动开始录音 
input_features = processor(audio, return_tensors="pt").input_features predicted_ids = model.generate(input_features) # transcription = processor.batch_decode(predicted_ids) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription) print('转化为简体结果:', convert(transcription, 'zh-cn')) 
It is h2ly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug. ['启动开始录音'] 转化为简体结果: 启动开始录音 
#长文本如下 #使用参考网站:https://huggingface.co/openai/whisper-large-v2 

whispercodeopenaiscripterptokendebug多语言githubtransformergithuggingfacetransformersmmo语言模型微调训练长文本promptpython语音转文字url
  • 本文作者:WAP站长网
  • 本文链接: https://wapzz.net/post-413.html
  • 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。
本站部分内容来源于网络转载,仅供学习交流使用。如涉及版权问题,请及时联系我们,我们将第一时间处理。
文章很赞!支持一下吧 还没有人为TA充电
为TA充电
还没有人为TA充电
0
  • 支付宝打赏
    支付宝扫一扫
  • 微信打赏
    微信扫一扫
感谢支持
文章很赞!支持一下吧
关于作者
2.7W+
9
1
2
WAP站长官方

腾讯云部署清华大学ChatGLM-6B实战

上一篇

Stable-Diffusion部署web服务到公网,实现个人电脑远程访问

下一篇
  • 复制图片
按住ctrl可打开默认菜单