梅尔频谱(mel-spectrogram)提取，griffin_lim声码器【python代码分析】

在语音分析，合成，转换中，第一步往往是提取语音特征参数。
利用机器学习方法进行上述语音任务，常用到梅尔频谱。
本文介绍从音频文件提取梅尔频谱，和从梅尔频谱变成音频波形。

从音频波形提取Mel频谱：

对音频信号预加重、分帧和加窗
对每帧信号进行短时傅立叶变换STFT，得到短时幅度谱
短时幅度谱通过Mel滤波器组得到Mel频谱
从Mel频谱重建音频波形

Mel频谱转换成幅度谱
griffin_lim声码器算法重建波形
去加重
声码器有很多种，比如world,straight等，但是griffin_lim是特殊的，它不需要相位信息就可以重频谱重建波形，实际上它根据帧之间的关系估计相位信息。和成的音频质量也较高，代码也比较简单。
音频波形到 mel-spectrogram

sr = 24000 # Sample rate.
n_fft = 2048 # fft points (samples)
frame_shift = 0.0125 # seconds
frame_length = 0.05 # seconds
hop_length = int(sr*frame_shift) # samples.
win_length = int(sr*frame_length) # samples.
n_mels = 512 # Number of Mel banks to generate
power = 1.2 # Exponent for amplifying the predicted magnitude
n_iter = 100 # Number of inversion iterations
preemphasis = .97 # or None
max_db = 100
ref_db = 20
top_db = 15
1
2
3
4
5
6
7
8
9
10
11
12
13
def get_spectrograms(fpath):
'''Returns normalized log(melspectrogram) and log(magnitude) from `sound_file`.
Args:
sound_file: A string. The full path of a sound file.

Returns:
mel: A 2d array of shape (T, n_mels) <- Transposed
mag: A 2d array of shape (T, 1+n_fft/2) <- Transposed
'''
# Loading sound file
y, sr = librosa.load(fpath, sr=sr)

# Trimming
y, _ = librosa.effects.trim(y, top_db=top_db)

# Preemphasis
y = np.append(y[0], y[1:] - preemphasis * y[:-1])

# stft
linear = librosa.stft(y=y,
n_fft=n_fft,
hop_length=hop_length,
win_length=win_length)

# magnitude spectrogram
mag = np.abs(linear) # (1+n_fft//2, T)

# mel spectrogram
mel_basis = librosa.filters.mel(sr, n_fft, n_mels) # (n_mels, 1+n_fft//2)
mel = np.dot(mel_basis, mag) # (n_mels, t)

# to decibel
mel = 20 * np.log10(np.maximum(1e-5, mel))
mag = 20 * np.log10(np.maximum(1e-5, mag))

# normalize
mel = np.clip((mel - ref_db + max_db) / max_db, 1e-8, 1)
mag = np.clip((mag - ref_db + max_db) / max_db, 1e-8, 1)

# Transpose
mel = mel.T.astype(np.float32) # (T, n_mels)
mag = mag.T.astype(np.float32) # (T, 1+n_fft//2)

return mel, mag

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
mel-spectrogram 到音频波形

def melspectrogram2wav(mel):
'''# Generate wave file from spectrogram'''
# transpose
mel = mel.T

# de-noramlize
mel = (np.clip(mel, 0, 1) * max_db) - max_db + ref_db

# to amplitude
mel = np.power(10.0, mel * 0.05)
m = _mel_to_linear_matrix(sr, n_fft, n_mels)
mag = np.dot(m, mel)

# wav reconstruction
wav = griffin_lim(mag)

# de-preemphasis
wav = signal.lfilter([1], [1, -preemphasis], wav)

# trim
wav, _ = librosa.effects.trim(wav)

return wav.astype(np.float32)

def spectrogram2wav(mag):
'''# Generate wave file from spectrogram'''
# transpose
mag = mag.T

# de-noramlize
mag = (np.clip(mag, 0, 1) * max_db) - max_db + ref_db

# to amplitude
mag = np.power(10.0, mag * 0.05)

# wav reconstruction
wav = griffin_lim(mag)

# de-preemphasis
wav = signal.lfilter([1], [1, -preemphasis], wav)

# trim
wav, _ = librosa.effects.trim(wav)

return wav.astype(np.float32)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
几个辅助函数：

def _mel_to_linear_matrix(sr, n_fft, n_mels):
m = librosa.filters.mel(sr, n_fft, n_mels)
m_t = np.transpose(m)
p = np.matmul(m, m_t)
d = [1.0 / x if np.abs(x) > 1.0e-8 else x for x in np.sum(p, axis=0)]
return np.matmul(m_t, np.diag(d))

def griffin_lim(spectrogram):
'''Applies Griffin-Lim's raw.
'''
X_best = copy.deepcopy(spectrogram)
for i in range(n_iter):
X_t = invert_spectrogram(X_best)
est = librosa.stft(X_t, n_fft, hop_length, win_length=win_length)
phase = est / np.maximum(1e-8, np.abs(est))
X_best = spectrogram * phase
X_t = invert_spectrogram(X_best)
y = np.real(X_t)

return y

def invert_spectrogram(spectrogram):
'''
spectrogram: [f, t]
'''
return librosa.istft(spectrogram, hop_length, win_length=win_length, window="hann")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
预加重：
语音信号的平均功率谱受声门激励和口鼻辐射影响，高频端约在800HZ以上按6dB/倍频程衰落，预加重的目的是提升高频成分，使信号频谱平坦化，以便于频谱分析或声道参数分析．
---------------------

梅尔频谱(mel-spectrogram)提取，griffin_lim声码器【python代码分析】的更多相关文章

Python代码分析工具
Python代码分析工具:PyChecker.Pylint - CSDN博客 https://blog.csdn.net/permike/article/details/51026156
Python代码分析工具之dis模块
转自:http://hi.baidu.com/tinyweb/item/923d012e8146d00872863ec0 ,格式调整过. 代码分析不是一个新的话题,代码分析重要性的判断比较主观,不同 ...
正则提取关键字符-python代码实现
原文地址:http://www.bugingcode.com/blog/python_re_extraction_key.html 关于python的正则使用在以前的文章中 http://www.bu ...
60行python代码分析2018互联网大事件
2018年是改革开放四十周年,也是互联网发展的重要一年.经历了区块链,人工智能潮的互联网行业逐渐迎来了冬天.这一年里有无数的事件发生着,正好学了python数据处理相关,那么就用python对18年的 ...
转载：量化投资中常用python代码分析（一）
pandas的IO 量化投资逃不过数据处理,数据处理逃不过数据的读取和存储.一般,最常用的交易数据存储格式是csv,但是csv有一个很大的缺点,就是无论如何,存储起来都是一个文本的格式,例如日期‘20 ...
如何使用 Pylint 来规范 Python 代码风格
如何使用 Pylint 来规范 Python 代码风格转载自https://www.ibm.com/developerworks/cn/linux/l-cn-pylint/ Pylint 是什么 ...
python代码检查工具pylint 让你的python更规范
1.pylint是什么? Pylint 是一个 Python 代码分析工具,它分析 Python 代码中的错误,查找不符合代码风格标准(Pylint 默认使用的代码风格是 PEP 8,具体信息,请参阅 ...
利用这10个工具，你可以写出更好的Python代码
我每天都使用这些实用程序来使我的Python代码可显示. 它们是免费且易于使用的. 编写漂亮的Python比看起来难. 作为发布工作流程的一部分,我使用以下工具使代码可显示并消除可避免的错误. 很多人 ...
语音识别之梅尔频谱倒数MFCC（Mel Frequency Cepstrum Coefficient）
语音识别之梅尔频谱倒数MFCC(Mel Frequency Cepstrum Coefficient) 原理梅尔频率倒谱系数:一定程度上模拟了人耳对语音的处理特点预加重:在语音信号中,高频部分的能 ...

随机推荐

【FICO系列】SAP FICO模块-完工入库后的差异凭证处理
公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[FICO系列]SAP FICO模块-完工入库后 ...
SEC7 - MySQL 查询语句--------------进阶3：排序查询
# 进阶3:排序查询 /* 引入: select * from employees; 语法: select 查询列表 from 表 [where 筛选条件] order by 排序的列表 asc/de ...
[Python3 填坑] 013 几个类相关函数的举例
目录 1. print( 坑的信息 ) 2. 开始填坑 2.1 issubclass() 2.2 isinstance() 2.3 hasattr() 2.4 getattr() 2.5 setatt ...
Maven安装、配置环境变量
一.首先在官网下载安装maven 1.进入官网 2.找到下载位置 3.点进去后是最新版的,若需要最新版就下这个,需要旧版本接着往下滑 4.下载历史版本 (1)点击"archives" ...
Mysql一些概念，基本没啥用，
关系型数据库管理系统(RDBMS):是建立在关系模型基础上的数据库,借助于集合代数等数学概念和方法来处理数据库中的数据.特点:1.数据以表格的形式出现2.每行为各种记录名称3.每列为记录名称所对应的数 ...
JavaScript中的反柯里化
转载自:https://www.cnblogs.com/zztt/p/4152147.html 柯里化柯里化又称部分求值,其含义是给函数分步传递参数,每次传递参数后部分应用参数,并返回一个更具体的函 ...
ubuntu16.04编译linux3.9内核
下载linux内核解压内核 tar -xvf 安装低版本gcc,不然会报错 apt-cache search gcc-4.7 sudo apt-get install 搜索的名字设置默认gccsu ...
Linux学习笔记1-在CentOS 7中安装配置JDK8
说明: 参考博客:http://blog.csdn.net/czmchen/article/details/41047187系统环境:CentOS 7安装方式:rpm安装JDK地址:http://ww ...
06.Linux-RedHat系统网卡服务连不上活跃连接路径变化
问题:在新装的系统中,重启网卡的时候出现如下报错 [root@localhost ~]# service network restart 正在关闭接口 eth0: 设备状态:3 (断开连接) [确定] ...
Sass函数：列表函数nth
语法: nth($list,$n) nth() 函数用来指定列表中某个位置的值.不过在 Sass 中,nth() 函数和其他语言不同,1 是指列表中的第一个标签值,2 是指列给中的第二个标签值,依此类 ...

梅尔频谱(mel-spectrogram)提取，griffin_lim声码器【python代码分析】

梅尔频谱(mel-spectrogram)提取，griffin_lim声码器【python代码分析】的更多相关文章

随机推荐

热门专题