Microsoft语音文本Python SDK无效标题SPXERR_INVALID_HEADER问题

Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file")azure-cognitiveservices-speech v1.8.0 SDK结合使用时出现以下错误。

RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)

此文件只有3个输入:

  • 天蓝色订阅密钥
  • 天青服务区
  • 文件名

我正在使用以下测试MP3文件:

这是完整的输出:

Traceback (most recent call last):
  File "main.py",line 16,in <module>
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,audio_config=audio_input)
  File "/library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py",line 761,in __init__
    self._impl = self._get_impl(impl.SpeechRecognizer,speech_config,audio_config)
  File "/library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py",line 547,in _get_impl
    _impl = reco_type._from_config(speech_config._impl,audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]

3   libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4   libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5   libmicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6   libmicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7   libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8   libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9   libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10  libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11  libmicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12  libmicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13  _speech_py_impl.so                  0x00000001067ff35b PyInit__speech_py_impl + 814939
14  _speech_py_impl.so                  0x000000010679b530 PyInit__speech_py_impl + 405808
15  Python                              0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16  Python                              0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17  Python                              0x00000001061b45a4 call_function + 724
18  Python                              0x00000001061b1576 _PyEval_evalframeDefault + 25190
19  Python                              0x00000001060f5e90 function_code_fastcall + 128
20  Python                              0x00000001061b45b2 call_function + 738
21  Python                              0x00000001061b1576 _PyEval_evalframeDefault + 25190
22  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23  Python                              0x00000001060f55fb _PyFunction_FastCallDict + 523
24  Python                              0x00000001060f68cf _PyObject_Call_Prepend + 143
25  Python                              0x0000000106144d51 slot_tp_init + 145
26  Python                              0x00000001061406a9 type_call + 297
27  Python                              0x00000001060f5871 _PyObject_FastCallKeywords + 433
28  Python                              0x00000001061b4474 call_function + 420
29  Python                              0x00000001061b16bd _PyEval_evalframeDefault + 25517
30  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31  Python                              0x00000001061ab234 PyEval_EvalCode + 100
32  Python                              0x00000001061e88f1 PyRun_FileExflags + 209
33  Python                              0x00000001061e816a PyRun_SimpleFileExflags + 890
34  Python                              0x00000001062079db pymain_main + 6875
35  Python                              0x0000000106207f2a _Py_UnixMain + 58
36  libdyld.dylib                       0x00007fff5d8aaed9 start + 1
37  ???                                 0x0000000000000002 0x0 + 2

任何人都可以提供一些指向该标题所指以及如何解决此问题的指针。

xianggujidan 回答:Microsoft语音文本Python SDK无效标题SPXERR_INVALID_HEADER问题

不支持将

mp3编码的音频作为输入格式。请使用WAV(PCM)文件,该文件应具有16位采样率,16 kHz采样率和单个通道(单声道)。

,

默认音频流格式为WAV(16kHz或8kHz,16位和单声道PCM)。在WAV / PCM之外,还支持下面列出的压缩输入格式。

但是,如果您使用C#/ Java / C ++ / Objective C,并且想要使用诸如 .mp3 之类的压缩音频格式,则可以使用 GStreamer 来处理它。

有关更多信息,请遵循此Microsoft文档。

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams

,

我想没有使用不同格式(mp3 或不同帧率)的 SDK 的官方方法我想使用能够使用任何类型的音频文件输入的 Azure 方法

直到现在我都是用我自己编造的方法来处理这个问题,先转换正确的文件,完成我的工作后删除它。原始文件正在保存:

对于python

fname_buf = fname
fname = self.AudioFileAdjust(fname,'test-it') 

# Do somethings

if fname_buf != fname:
self.AudioFileAdjust(fname,'remove')

Subfunction AudioFileAdjust(我使用的是pydub和pyaudio):

def AudioFileAdjust(self,fname,states=''):
    '''
    check audio file format and if not appropriate create new buffer audio for use
    '''
    if states == 'remove':
        os.remove(fname)
    else:
        # if the file format not useful for Azure,first need to change -> fr: 16000 must be
        audio_file = au.ReadAudioFile(fname)
        if audio_file.frame_rate != int(16000):
            #print('[Commend] changing the FrameRate')
            audio_file_e = au.SetFramerate(audio_file,int(16000))
            #change fine name for use
            fname2 = fname.split(".")[0] + "_Conv_2" + ".wav"  #without wav firstly and add additional 
            au.ExportAudioFile(audio_file_e,fname2)
            #print('new file name: ',fname)
            fname = fname2
    return fname
本文链接:https://www.f2er.com/3103601.html

大家都在问