--- license: mit language: - en pipeline_tag: automatic-speech-recognition base_model: - FunAudioLLM/SenseVoiceSmall --- # SenseVoice FunASR SenseVoice on Axera, official repo: https://github.com/FunAudioLLM/SenseVoice ## TODO - [x] 支持 AX630C - [x] 支持 C++ - [x] 支持 FastAPI ## 功能 - 语音识别 - 自动识别语言(支持中文、英文、粤语、日语、韩语) - 情感识别 - 自动标点 - 支持流式识别 ## 支持平台 - [x] AX650N - [x] AX630C ## Table of contents - [环境安装](#环境安装) - [使用](#使用) - [准确率](#准确率) - [技术讨论](#技术讨论) ## 环境安装 Python==3.12 ``` sudo apt-get install libsndfile-dev pip install -r requirements.txt ``` #### 安装 pyaxenigne 参考 https://github.com/AXERA-TECH/pyaxengine 安装 NPU Python API 在 0.1.3rc2 上测试通过,可通过 ``` pip install https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc2/axengine-0.1.3-py3-none-any.whl ``` 安装,或把版本号更改为你想使用的版本 ## 使用 ### Python ``` cd python python3 main.py --input ../example/en.mp3 [INFO] Available providers: ['AxEngineExecutionProvider'] {'input': '../example/en.mp3', 'language': 'auto', 'streaming': False} ...... RTF: 0.036785734138361184 Latency: 0.2639744281768799s Total length: 7.176s ASR result: the tribal chieftain called for the boy and presented him with fifty pieces of gold ``` 运行参数说明: | 参数名称 | 说明 | 默认值 | | --- | --- | --- | | --input/-i | 输入音频文件 | | | --language/-l | 识别语言,支持auto, zh, en, yue, ja, ko | auto | | --streaming | 流式识别 | | ### CPP - AX650 ``` ./cpp/ax650/test_sensevoice -a example/zh.mp3 -p sensevoice_ax650/ Init asr success, take 0.2130seconds Result: 开饭时间早上九点至下午五点 RTF(0.21 / 5.62) = 0.0372 ``` - AX630C ``` ./cpp/ax630c/test_sensevoice -a example/zh.mp3 -p sensevoice_ax630c/ ``` 对应的源码在[Github](https://github.com/AXERA-TECH/ax_asr_api)上 ### 示例 example下有测试音频 如 中文测试 ``` cd python python main.py -i example/zh.mp3 ``` 输出 ``` RTF: 0.04386647134764582 Latency: 0.2463541030883789s Total length: 5.616s ASR result: 开饭时间早上九点至下午五点 ``` 流式识别 ``` python main.py -i example/zh.mp3 --streaming ``` 输出 ``` {'timestamps': [540], 'text': '开'} {'timestamps': [540, 780, 1080], 'text': '开放时'} {'timestamps': [540, 780, 1080, 1260, 1740], 'text': '开放时间早'} {'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340], 'text': '开放时间早上9'} {'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640], 'text': '开放时间早上9点'} {'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060], 'text': '开放时间早上9点至'} {'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020], 'text': '开放时间早上9点至下午'} {'timestamps': [540, 780, 1080, 1260, 1740, 1920, 2340, 2640, 3060, 3780, 4020, 4440, 4620], 'text': '开放时间早上9点至下午五点'} RTF: 0.03678379235444246 ``` ### Gradio DEMO ``` cd python python3 gradio_demo.py [INFO] Available providers: ['AxEngineExecutionProvider'] [INFO] Using provider: AxEngineExecutionProvider [INFO] Chip type: ChipType.MC50 [INFO] VNPU type: VNPUType.DISABLED [INFO] Engine version: 2.12.0s [INFO] Model type: 2 (triple core) [INFO] Compiler version: 5.0 76f70fdc * Running on local URL: https://xxx.xxx.xxx.xxx:7861 * Running on local URL: https://172.18.0.1:7861 * Running on local URL: https://172.17.0.1:7861 * Running on local URL: https://0.0.0.0:7861 * To create a public link, set `share=True` in `launch()`. ``` ![DEMO_Gradio](https://cdn-uploads.huggingface.co/production/uploads/660d19e9955bacf5a22a9b7b/ClusyqVUEm_gXTfvST7dg.png) ## 准确率 使用WER(Word-Error-Rate)作为评价标准 **WER = 2.0%** ### 复现测试结果 ``` ./download_datasets.sh python test_wer.py -d aishell -g datasets/ground_truth.txt --language zh ``` ## 技术讨论 - Github issues - QQ 群: 139953715