Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section 7. The CELP technique is based on three ideas:

  1. The use of a linear prediction (LP) model to model the vocal tract
  2. The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
  3. The search performed in closed-loop in a ``perceptually weighted domain''

This section describes the basic ideas behind CELP. Note that it's still incomplete.

Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal  using a linear combination of its past samples:

where  is the linear prediction of . The prediction error is thus given by:

The goal of the LPC analysis is to find the best prediction coefficients  which minimize the quadratic error function:

That can be done by making all derivatives  equal to zero:

The  filter coefficients are computed using the Levinson-Durbin algorithm, which starts from the auto-correlation  of the signal .

For an order  filter, we have:

The filter coefficients  are found by solving the system . What the Levinson-Durbin algorithm does here is making the solution to the problem instead of  by exploiting the fact that matrix  is toeplitz hermitian. Also, it can be proven that all the roots of  are within the unit circle, which means that  is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply  by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

The linear prediction model represents each speech sample as a linear combination of past samples, plus an error signal called the excitation (or residual).

In the z-domain, this can be expressed as

where  is defined as

We usually refer to  as the analysis filter and  as the synthesis filter. The whole process is called short-term prediction as it predicts the signal using a prediction using only the  past samples, where  is usually around 10.

Because LPC coefficients have very little robustness to quantization, they are converted to Line Spectral Pair (LSP) coefficients which have a much better behaviour with quantization, one of them being that it's easy to keep the filter stable.

Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal  by a gain times the past of the excitation:

where  is the pitch period,  is the pitch gain. We call that long-term prediction since the excitation is predicted from  with .

Innovation Codebook

The final excitation  will be the sum of the pitch prediction and an innovation signal  taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by:

The quantization of  is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal  as

Analysis-by-Synthesis and Error Weighting

Most (if not all) modern audio codecs attempt to ``shape'' the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error

where  is the encoder signal, we minimize the error for the perceptually weighted signal

where  is the weighting filter, usually of the form

(1)

with control parameters . If the noise is white in the perceptually weighted domain, then in the signal domain its spectral shape will be of the form

If a filter  has (complex) poles at  in the -plane, the filter  will have its poles at , making it a flatter version of .

Analysis-by-synthesis refers to the fact that when trying to find the best pitch parameters () and innovation signal , we do not work by making the excitation  as close as the original one (which would be simpler), but apply the synthesis (and weighting) filter and try making  as close to the original as possible.

参考资料:

1 百科总结: https://zh.wikipedia.org/wiki/%E7%A0%81%E6%BF%80%E5%8A%B1%E7%BA%BF%E6%80%A7%E9%A2%84%E6%B5%8B
2 详细介绍: http://ntools.net/arc/Documents/speex/manual/node8.html

Introduction to CELP Coding的更多相关文章

  1. Spark 大数据平台 Introduction part 2 coding

    Basic Functions sc.parallelize(List(1,2,3,4,5,6)).map(_ * 2).filter(_ > 5).collect() *** res: Arr ...

  2. 算术编码Arithmetic Coding-高质量代码实现详解

    关于算术编码的具体讲解我不多细说,本文按照下述三个部分构成. 两个例子分别说明怎么用算数编码进行编码以及解码(来源:ARITHMETIC CODING FOR DATA COIUPRESSION): ...

  3. Zen Coding in Visual Studio 2012

    http://www.johnpapa.net/zen-coding-in-visual-studio-2012 Zen Coding is a faster way to write HTML us ...

  4. Introduction to ASP.NET Web Programming Using the Razor Syntax (C#)

    1, http://www.asp.net/web-pages/overview/getting-started/introducing-razor-syntax-c 2, Introduction ...

  5. Top 10 Algorithms for Coding Interview--reference

    By X Wang Update History:Web Version latest update: 4/6/2014PDF Version latest update: 1/16/2014 The ...

  6. 转:Top 10 Algorithms for Coding Interview

    The following are top 10 algorithms related concepts in coding interview. I will try to illustrate t ...

  7. Github Coding Developer Book For LiuGuiLinAndroid

    Github Coding Developer Book For LiuGuiLinAndroid 收集了这么多开源的PDF,也许会帮到一些人,现在里面的书籍还不是很多,我也在一点点的上传,才上传不到 ...

  8. 使用Travis CI自动部署博客到github pages和coding pages

    每次换系统或换电脑之后重新部署博客总是很苦恼?想像jekyll那样,一次性部署完成后,以后本地不用安装环境直接 git push 就能生成博客?那推荐你应该使用使用 Travis CI了. 这篇文章我 ...

  9. Introduction to Parallel Computing

    Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...

随机推荐

  1. Python常用字符编码

    字符编码的常用种类介绍 第一种:ASCII码 ASCII(American Standard Code for Information Interchange,美国信息交换标准代码)是基于拉丁字母的一 ...

  2. H5相关网址

    html5中国 http://www.html5cn.org/   HTML 5 教程 http://www.w3school.com.cn/html5/index.asp   http://www. ...

  3. python--第十天总结(IO多路复用)

    服务器端编程经常需要构造高性能的IO模型,常见的IO模型有四种: (1)同步阻塞IO(Blocking IO):即传统的IO模型. (2)同步非阻塞IO(Non-blocking IO):默认创建的s ...

  4. django 环境配置.

    1. 一个虚拟环境对应一个 dajngo项目 2. mkvirtruenv pycham 创建Pure Python 新项目,不是Django 2018.3 其他版本 3.  Add Configur ...

  5. KO ----- 静态资源404问题

    --------------------siwuxie095                                 KO ----- 静态资源 404 问题         在 Spring ...

  6. 9.22 Sans-serif VS Serif

    在FCC做题遇到了sans-serif 以及 serif字体,第一次遇到,所以查了一下: 西方国家字母体系分为两类:serif 以及sans serif. 原来Sans-serif是无衬线字体,没有额 ...

  7. react项目请求数据的fetch的使用

    准备三个文件(封装请求函数),然后测试一下,能不能调用数据 第一个文件  request.js import 'whatwg-fetch'; /** * Parses the JSON returne ...

  8. [leetcode]20. Valid Parentheses有效括号序列

    Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the inpu ...

  9. ELK+SpringBoot+Logback离线安装及配置

    ELK+SpringBoot+Logback 离线安装及配置 版本 v1.0 编写时间 2018/6/11 编写人 xxx     目录 一. ELK介绍2 二. 安装环境2 三. Elasticse ...

  10. UI与开发的必备神器!— iDoc一键适配不同平台尺寸(iDoc201902-2新功能)

    一.自动换算不同平台尺寸在一个项目从设计到开发的过程中,为了适配不同设备,一份设计稿,UI需要花大量的时间去制作各种尺寸的切图,耗时耗力. 那有没有一种高效的办法,让UI只需要设计一份设计稿就可以了呢 ...