【持续更新】

为了简便:import librosa

display

specshow(data[, x_coords, y_coords, x_axis, …]) Display a spectrogram/chromagram/cqt/etc.
waveplot(y[, sr, max_points, x_axis, …]) Plot the amplitude envelope of a waveform.
cmap(data[, robust, cmap_seq, cmap_bool, …]) Get a default colormap from the given data.
TimeFormatter([lag, unit]) A tick formatter for time axes.
NoteFormatter([octave, major]) Ticker formatter for Notes
LogHzFormatter([major]) Ticker formatter for logarithmic frequency
ChromaFormatter A formatter for chroma axes
TonnetzFormatter A formatter for tonnetz axes

[1]中介绍了很多关于librosa的应用,同时提出librosa.display模块并不默认包含在librosa中,使用时要单独引入:

import librosa.display

waveplot

Plot the amplitude envelope of a waveform.

If y is monophonic, a filled curve is drawn between [-abs(y), abs(y)].

If y is stereo, the curve is drawn between [-abs(y[1]), abs(y[0])], so that the left and right channels are drawn above and below the axis, respectively.

Long signals (duration >= max_points) are down-sampled to at most max_sr before plotting.

librosa.display.waveplot(y, sr=22050, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000, ax=None, **kwargs)

specshow

Display a spectrogram/chromagram/cqt/etc.

librosa.display.specshow(data, x_coords=None, y_coords=None, x_axis=None, y_axis=None, sr=22050, hop_length=512, fmin=None, fmax=None, tuning=0.0, bins_per_octave=12, ax=None, **kwargs)

注意:源码中 sr 默认是22050Hz,如果音频文件是8k或者16k,一定要指定采样率。

可以选择不同的尺度显示频谱图,y_axis={‘linear’, ‘log’, ‘mel’, ‘cqt_hz’,...}

stft / istft

短时傅里叶变换 / 逆短时傅里叶变换,参考librosa源码博客[librosa语音信号处理]

librosa.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, pad_mode='reflect')

librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.complex64'>, pad_mode='reflect')   # This function caches at level 20.

The STFT represents a signal in the time-frequency domain by computing discrete Fourier transforms (DFT) over short overlapping windows. This function returns a complex-valued matrix D such that

  • np.abs(D[f, t]) is the magnitude of frequency bin f at frame t, and
  • np.angle(D[f, t]) is the phase of frequency bin f at frame t.
Parameters:
y : np.ndarray [shape=(n,)], real-valued

input signal

n_fft : int > 0 [scalar]

length of the windowed signal after padding with zeros. The number of rows in the STFT matrix D is (1 + n_fft/2). The default value, n_fft=2048 samples, corresponds to a physical duration of 93 milliseconds at a sample rate of 22050 Hz, i.e. the default sample rate in librosa. This value is well adapted for music signals. However, in speech processing, the recommended value is 512, corresponding to 23 milliseconds at a sample rate of 22050 Hz. In any case, we recommend setting n_fft to a power of two for optimizing the speed of the fast Fourier transform (FFT) algorithm.

hop_length : int > 0 [scalar]

number of audio samples between adjacent STFT columns.

Smaller values increase the number of columns in D without affecting the frequency resolution of the STFT.

If unspecified, defaults to win_length / 4 (see below).

win_length : int <= n_fft [scalar]

Each frame of audio is windowed by window() of length win_length and then padded with zeros to match n_fft.

Smaller values improve the temporal resolution of the STFT (i.e. the ability to discriminate impulses that are closely spaced in time) at the expense of frequency resolution (i.e. the ability to discriminate pure tones that are closely spaced in frequency). This effect is known as the time-frequency localization tradeoff and needs to be adjusted according to the properties of the input signal y.

If unspecified, defaults to win_length = n_fft.

window : string, tuple, number, function, or np.ndarray [shape=(n_fft,)]

Either:

  • a window specification (string, tuple, or number); see scipy.signal.get_window
  • a window function, such as scipy.signal.hanning
  • a vector or array of length n_fft

Defaults to a raised cosine window (“hann”), which is adequate for most applications in audio signal processing.

center : boolean

If True, the signal y is padded so that frame D[:, t] is centered at y[t * hop_length].

If False, then D[:, t] begins at y[t * hop_length].

Defaults to True, which simplifies the alignment of D onto a time grid by means of librosa.core.frames_to_samples. Note, however, that center must be set to False when analyzing signals with librosa.stream.

dtype : numeric type

Complex numeric type for D. Default is single-precision floating-point complex (np.complex64).

pad_mode : string or function

If center=True, this argument is passed to np.pad for padding the edges of the signal y. By default (pad_mode=”reflect”), y is padded on both sides with its own reflection, mirrored around its first and last sample respectively. If center=False, this argument is ignored.

Returns:
D : np.ndarray [shape=(1 + n_fft/2, n_frames), dtype=dtype]

Complex-valued matrix of short-term Fourier transform coefficients.


librosa.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, length=None)

librosa.core.istft(stft_matrix, hop_length=None, win_length=None, window='hann', center=True, dtype=<class 'numpy.float32'>, length=None)       # This function caches at level 30.

Converts a complex-valued spectrogram stft_matrix to time-series y by minimizing the mean squared error between stft_matrix and STFT of y as described in [2] up to Section 2 (reconstruction from MSTFT).

In general, window function, hop length and other parameters should be same as in stft, which mostly leads to perfect reconstruction of a signal from unmodified stft_matrix.

Parameters:
stft_matrix : np.ndarray [shape=(1 + n_fft/2, t)]

STFT matrix from stft

hop_length : int > 0 [scalar]

Number of frames between STFT columns. If unspecified, defaults to win_length / 4.

win_length : int <= n_fft = 2 * (stft_matrix.shape[0] - 1)

When reconstructing the time series, each frame is windowed and each sample is normalized by the sum of squared window according to the window function (see below).

If unspecified, defaults to n_fft.

window : string, tuple, number, function, np.ndarray [shape=(n_fft,)]
  • a window specification (string, tuple, or number); see scipy.signal.get_window
  • a window function, such as scipy.signal.hanning
  • a user-specified window vector of length n_fft
center : boolean
  • If True, D is assumed to have centered frames.
  • If False, D is assumed to have left-aligned frames.
dtype : numeric type

Real numeric type for y. Default is 32-bit float.

length : int > 0, optional

If provided, the output y is zero-padded or clipped to exactly length samples.

Returns:
y : np.ndarray [shape=(n,)]

time domain signal reconstructed from stft_matrix

有用的函数

effects.split

librosa.effects.split(y, top_db=60, ref=<function amax at 0x7fa274a61d90>, frame_length=2048, hop_length=512)

Split an audio signal into non-silent intervals. 参数说明源码

Parameters:
y : np.ndarray, shape=(n,) or (2, n)

An audio signal

top_db : number > 0

The threshold (in decibels) below reference to consider as silence

ref : number or callable

The reference power. By default, it uses np.max and compares to the peak power in the signal.

frame_length : int > 0

The number of samples per analysis frame

hop_length : int > 0

The number of samples between analysis frames

Returns:
intervals : np.ndarray, shape=(m, 2)

intervals[i] == (start_i, end_i) are the start and end time (in samples) of non-silent interval i.

参考

[1] https://www.cnblogs.com/xingshansi/p/6816308.html

[2] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236–243, Apr. 1984.

【librosa】及其在音频处理中的应用的更多相关文章

  1. 音频采样中left-or right-justified(左对齐,右对齐), I2S时钟关系

    音频采样中left-or right-justified(左对齐,右对齐), I2S时钟关系 原创 2014年02月11日 13:56:51 4951 0 0 刚刚过完春节,受假期综合症影响脑袋有点发 ...

  2. 音频处理中的尺度--Bark尺度与Mel尺度

    由于人耳对声音的感知(如:频率.音调)是非线性的,为了对声音的感知进行度量,产生了一系列的尺度(如:十二平均律),这里重点说下Bark尺度与Mel尺度.刚开始的时候,我自己也没弄明白这两个尺度的区别. ...

  3. AEC、AGC、ANS在视音频会议中的作用?

    AGC是自动增益补偿功能(Automatic Gain Control),AGC可以自动调麦克风的收音量,使与会者收到一定的音量水平,不会因发言者与麦克风的距离改变时,声音有忽大忽小声的缺点.ANS是 ...

  4. 数据处理一条龙!这15个Python库不可不知

    如果你是一名数据科学家或数据分析师,或者只是对这一行业感兴趣,那下文中这些广受欢迎且非常实用的Python库你一定得知道. 从数据收集.清理转化,到数据可视化.图像识别和网页相关,这15个Python ...

  5. 测试开发之前端——No9.HTML5中的视频/音频

    HTML5 视频和音频的 DOM 参考手册 HTML5 DOM 为 <audio> 和 <video> 元素提供了方法.属性和事件. 这些方法.属性和事件允许您使用 JavaS ...

  6. 音频中PCM的概念

    本文取自由http://blog.csdn.net/droidphone一部分 1. PCM是什么 PCM是英文Pulse-code modulation的缩写,中文译名是脉冲编码调制.我们知道在现实 ...

  7. html5中audio支持音频格式

    HTML5 Audio标签能够支持wav, mp3, ogg, acc, webm等格式,但有个很重要的音乐文件格式midi(扩展名mid)却在各大浏览器中都没有内置的支持.不是所有的浏览器都支持MP ...

  8. librosa语音信号处理

    librosa是一个非常强大的python语音信号处理的第三方库,本文参考的是librosa的官方文档,本文主要总结了一些重要,对我来说非常常用的功能.学会librosa后再也不用用python去实现 ...

  9. WAVE音频格式及及转换代码

    音频信号的读写.播放及录音 python已经支持WAV格式的书写,而实时的声音输入输出需要安装pyAudio(http://people.csail.mit.edu/hubert/pyaudio).最 ...

随机推荐

  1. 前端开发vscode必备插件

    VSCode 插件 Atom one Dark Theme Atom Dark主题 Auto Close Tag 自动关闭标签 Auto Rename Tag 自动重命名标签 Beautify 格式化 ...

  2. Spring-Boot-操作-Redis,三种方案全解析!

    在 Redis 出现之前,我们的缓存框架各种各样,有了 Redis ,缓存方案基本上都统一了,关于 Redis,松哥之前有一个系列教程,尚不了解 Redis 的小伙伴可以参考这个教程: Redis 教 ...

  3. Web应急:网站被植入Webshell

    网站被植入webshell,意味着网站存在可利用的高危漏洞,攻击者通过利用漏洞入侵网站,写入webshell接管网站的控制权.为了得到权限 ,常规的手段如:前后台任意文件上传,远程命令执行,Sql注入 ...

  4. 「福利」Java Swing 编写的可视化算法工程,包含树、图和排序

    之前在整理<学习排序算法,结合这个方法太容易理解了>这篇文章时,发现了一个用 Java Swing 编写的可视化算法工程,真心不错!包含了常用数据结构和算法的动态演示,先来张图感受下: 可 ...

  5. Barrier 组织多个线程及时在某个时刻碰面

    任意一个线程调用了 _barrier.SignalAndWait() 方法后,会执行一个回调函数来打印出阶段. /// <summary> /// 实例 Barrier 类 /// < ...

  6. MailKit/MimeKit 发送邮件

    MimeKit / MailKit 支持最新的国际化的电子邮件标准,是.NET 中为一个支持完整支持这些标准电子邮件库,最近正式发布了1.0版本.如果你想做所有与的电子邮件相关的事情,看看 MimeK ...

  7. Delphi - 采用第三方控件TMS、SPComm开发串口调试助手

    第三方控件TMS.SPComm的下载与安装 盒子上可搜索关键字进行下载,TMS是.dpk文件,SPComm.pas文件: 安装方法自行百度,不做赘述. 通过TMS控件进行界面布局 界面预览: Delp ...

  8. 通过Nginx获取用户真实IP

    nginx配置 location / { proxy_set_header Host $host; proxy_set_header X-real-ip $remote_addr; proxy_set ...

  9. 递归删除文件和文件夹(bat)

    递归删除当前目录下指定的文件和文件夹,使用了通配符,Win10下亲测有效,仅供参考!  Batch Code  123456   @echo off echo del file... for /r % ...

  10. WorkFlow三:配BO对象,事件触发工作流

    1.新建个BO对象的字段. 2.新建取数函数: 3.运行事物代码SWO1新建BO对象. 4.新建关键字段: 5.新建BO对象的事件: 6.添加处理方法: 6.调整对象状态,这里是本地对象,不需要释放, ...