【语音识别入门】My-Voice-Analysis
My-Voice-Analysis 是一个用于分析语音(同时语音、高熵)的 Python 库,打破话语并检测音节边界、基频轮廓和共振峰。性别区分语气分析发音得分发音率语速填充f0 统计。
·
简介
My-Voice-Analysis 是一个用于分析语音(同时语音、高熵)的 Python 库,打破话语并检测音节边界、基频轮廓和共振峰。其内置功能:
- 性别区分
- 语气分析
- 发音得分
- 发音率
- 语速
- 填充
- f0 统计
一、安装
my-voice-analysis 可以像任何其他 Python 库一样安装,Python 包管理器 pip(最新版本):
pip install my-voice-analysis
如果不成功,可以试试下面这个
pip install myprosody
二、示例用法
音频文件必须为*.wav格式,以44kHz采样帧和16位分辨率录制
1、性别区分函数 myspgen(p,c)
//_init_.py
def myspgend(m,p):
sound=p+"/"+m+".wav"
sourcerun=p+"/myspsolution.praat"
path=p+"/"
try:
objects= run_file(sourcerun, -20, 2, 0.3, "yes",sound,path, 80, 400, 0.01, capture_output=True)
print (objects[0]) # This will print the info from the sound object, and objects[0] is a parselmouth.Sound object
z1=str( objects[1]) # This will print the info from the textgrid object, and objects[1] is a parselmouth.Data object with a TextGrid inside
z2=z1.strip().split()
z3=float(z2[8]) # will be the integer number 10
z4=float(z2[7]) # will be the floating point number 8.3
if z4<=114:
g=101
j=3.4
elif z4>114 and z4<=135:
g=128
j=4.35
elif z4>135 and z4<=163:
g=142
j=4.85
elif z4>163 and z4<=197:
g=182
j=2.7
elif z4>197 and z4<=226:
g=213
j=4.5
elif z4>226:
g=239
j=5.3
else:
print("Voice not recognized")
exit()
def teset(a,b,c,d):
d1=np.random.wald(a, 1, 1000)
d2=np.random.wald(b,1,1000)
d3=ks_2samp(d1, d2)
c1=np.random.normal(a,c,1000)
c2=np.random.normal(b,d,1000)
c3=ttest_ind(c1,c2)
y=([d3[0],d3[1],abs(c3[0]),c3[1]])
return y
nn=0
mm=teset(g,j,z4,z3)
while (mm[3]>0.05 and mm[0]>0.04 or nn<5):
mm=teset(g,j,z4,z3)
nn=nn+1
nnn=nn
if mm[3]<=0.09:
mmm=mm[3]
else:
mmm=0.35
if z4>97 and z4<=114:
print("a Male, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
elif z4>114 and z4<=135:
print("a Male, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
elif z4>135 and z4<=163:
print("a Male, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
elif z4>163 and z4<=197:
print("a female, mood of speech: Showing no emotion, normal, p-value/sample size= :%.2f" % (mmm), (nnn))
elif z4>197 and z4<=226:
print("a female, mood of speech: Reading, p-value/sample size= :%.2f" % (mmm), (nnn))
elif z4>226 and z4<=245:
print("a female, mood of speech: speaking passionately, p-value/sample size= :%.2f" % (mmm), (nnn))
else:
print("Voice not recognized")
except:
print ("Try again the sound of the audio was not clear")
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspgend(p,c)
[out] a female, mood of speech: Reading, p-value/sample size= :0.00 5
2、发音后验概率得分百分比:函数 mysppron(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.mysppron(p,c)
[out] Pronunciation_posteriori_probability_score_percentage= :85.00
3、检测并计算音节数:函数 myspsyl(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspsyl(p,c)
[out] number_ of_syllables= 154
4、检测和计算填充和暂停的数量:函数 mysppaus(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.mysppaus(p,c)
[out] number_of_pauses= 22
5、测量语速(速度):函数 myspsr(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspsr(p,c)
[out] rate_of_speech= 3 # syllables/sec original duration
6、测量发音(速度):函数 myspatc(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspatc(p,c)
[out] articulation_rate= 5 # syllables/sec speaking duration
7、测量说话时间(不包括填充词和暂停):函数 myspst(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspst(p,c)
[out] speaking_duration= 31.6 # sec only speaking duration without pauses
8、测量总说话时长(包括填充词和停顿):函数 myspod(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspod(p,c)
[out] original_duration= 49.2 # sec total speaking duration with pauses
9、测量说话时长与总说话时长之间的比率:函数 myspbala(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspbala(p,c)
[out] balance= 0.6 # ratio (speaking duration)/(original duration)
10、测量基频分布均值:函数 myspf0mean(p,c)
[in] import myspsolution as mysp
p="Walkers" # Audio File title
c=r"C:\Users\Shahab\Desktop\Mysp" # Path to the Audio_File directory (Python 3.7)
mysp.myspf0mean(p,c)
[out] f0_mean= 212.45 # Hz global mean of fundamental frequency distribution
三、发展
My-Voice-Analysis 由日本的 Sab-AI Lab(以前称为 Mysolution)开发。它是 Sab-AI 实验室开发语言学声学模型项目的一部分。计划通过添加更高级的功能以及添加语言模型来丰富 My-Voice Analysis 的功能。
请参阅 Myprosody https://github.com/Shahabks/myprosody和 Speech- Rater https://shahabks.github.io/Speech-Rater/)
更多推荐
已为社区贡献5条内容
所有评论(0)