Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section 7. The CELP technique is based on three ideas:

  1. The use of a linear prediction (LP) model to model the vocal tract
  2. The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
  3. The search performed in closed-loop in a ``perceptually weighted domain''

This section describes the basic ideas behind CELP. Note that it's still incomplete.

Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal  using a linear combination of its past samples:

where  is the linear prediction of . The prediction error is thus given by:

The goal of the LPC analysis is to find the best prediction coefficients  which minimize the quadratic error function:

That can be done by making all derivatives  equal to zero:

The  filter coefficients are computed using the Levinson-Durbin algorithm, which starts from the auto-correlation  of the signal .

For an order  filter, we have:

The filter coefficients  are found by solving the system . What the Levinson-Durbin algorithm does here is making the solution to the problem instead of  by exploiting the fact that matrix  is toeplitz hermitian. Also, it can be proven that all the roots of  are within the unit circle, which means that  is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply  by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

The linear prediction model represents each speech sample as a linear combination of past samples, plus an error signal called the excitation (or residual).

In the z-domain, this can be expressed as

where  is defined as

We usually refer to  as the analysis filter and  as the synthesis filter. The whole process is called short-term prediction as it predicts the signal using a prediction using only the  past samples, where  is usually around 10.

Because LPC coefficients have very little robustness to quantization, they are converted to Line Spectral Pair (LSP) coefficients which have a much better behaviour with quantization, one of them being that it's easy to keep the filter stable.

Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal  by a gain times the past of the excitation:

where  is the pitch period,  is the pitch gain. We call that long-term prediction since the excitation is predicted from  with .

Innovation Codebook

The final excitation  will be the sum of the pitch prediction and an innovation signal  taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by:

The quantization of  is where most of the bits in a CELP codec are allocated. It represents the information that couldn't be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal  as

Analysis-by-Synthesis and Error Weighting

Most (if not all) modern audio codecs attempt to ``shape'' the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That's why instead of minimizing the simple quadratic error

where  is the encoder signal, we minimize the error for the perceptually weighted signal

where  is the weighting filter, usually of the form

(1)

with control parameters . If the noise is white in the perceptually weighted domain, then in the signal domain its spectral shape will be of the form

If a filter  has (complex) poles at  in the -plane, the filter  will have its poles at , making it a flatter version of .

Analysis-by-synthesis refers to the fact that when trying to find the best pitch parameters () and innovation signal , we do not work by making the excitation  as close as the original one (which would be simpler), but apply the synthesis (and weighting) filter and try making  as close to the original as possible.

参考资料:

1 百科总结: https://zh.wikipedia.org/wiki/%E7%A0%81%E6%BF%80%E5%8A%B1%E7%BA%BF%E6%80%A7%E9%A2%84%E6%B5%8B
2 详细介绍: http://ntools.net/arc/Documents/speex/manual/node8.html

Introduction to CELP Coding的更多相关文章

  1. Spark 大数据平台 Introduction part 2 coding

    Basic Functions sc.parallelize(List(1,2,3,4,5,6)).map(_ * 2).filter(_ > 5).collect() *** res: Arr ...

  2. 算术编码Arithmetic Coding-高质量代码实现详解

    关于算术编码的具体讲解我不多细说,本文按照下述三个部分构成. 两个例子分别说明怎么用算数编码进行编码以及解码(来源:ARITHMETIC CODING FOR DATA COIUPRESSION): ...

  3. Zen Coding in Visual Studio 2012

    http://www.johnpapa.net/zen-coding-in-visual-studio-2012 Zen Coding is a faster way to write HTML us ...

  4. Introduction to ASP.NET Web Programming Using the Razor Syntax (C#)

    1, http://www.asp.net/web-pages/overview/getting-started/introducing-razor-syntax-c 2, Introduction ...

  5. Top 10 Algorithms for Coding Interview--reference

    By X Wang Update History:Web Version latest update: 4/6/2014PDF Version latest update: 1/16/2014 The ...

  6. 转:Top 10 Algorithms for Coding Interview

    The following are top 10 algorithms related concepts in coding interview. I will try to illustrate t ...

  7. Github Coding Developer Book For LiuGuiLinAndroid

    Github Coding Developer Book For LiuGuiLinAndroid 收集了这么多开源的PDF,也许会帮到一些人,现在里面的书籍还不是很多,我也在一点点的上传,才上传不到 ...

  8. 使用Travis CI自动部署博客到github pages和coding pages

    每次换系统或换电脑之后重新部署博客总是很苦恼?想像jekyll那样,一次性部署完成后,以后本地不用安装环境直接 git push 就能生成博客?那推荐你应该使用使用 Travis CI了. 这篇文章我 ...

  9. Introduction to Parallel Computing

    Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...

随机推荐

  1. MFC---关于string.h相关函数

    1.在VS2005中使用strcpy.strcat.sprintf出现如:mfc中'strcpy' was declared deprecated警告 这是因为VS2005中认为CRT中的一组函数如果 ...

  2. Codeforces Round #439 C. The Intriguing Obsession

    题意:给你三种不同颜色的点,每种若干(小于5000),在这些点中连线,要求同色的点的最短路大于等于3或者不连通,求有多少种连法. Examples Input 1 1 1 Output 8 Input ...

  3. OpenStack 安装:基本环境准备

    刚刚学完openstack,这几篇文章就算对过去课程的一个总结吧. 首先说说基本的结构:在一台Dell的workstation上面安装了VMware,在VMware上面安装两台CentOS,现在给每台 ...

  4. sqlserver truncate清空表时候,无法删除 'B表',因为该表正由一个 FOREIGN KEY 约束引用。

    外键: 查询:select object_name(a.parent_object_id) 'tables'  from sys.foreign_keys a  where a.referenced_ ...

  5. 实验吧“解码磁带”的write up

    在“实验吧”的做CTF题时遇到的一道题,地址在这里:http://ctf5.shiyanbar.com/misc/cidai.html 因为正在学python,做这道题的时候正好用python写个简单 ...

  6. python list初始化技巧

    一维列表 # 初始化递增的list,与L = [i for i in range(10)] 效果相同 L = range(10) # print(L) # [0,1,2,3,4,5,6,7,8,9] ...

  7. 169. Majority Element (Array)

    Given an array of size n, find the majority element. The majority element is the element that appear ...

  8. FortiGate安全策略说明

    1.安全策略原理 1)为了对数据流进行统一控制,方便用户配置和管理,FGT设备引入了安全策略的概念.通过配置安全策略,防火墙能够对经过设备的数据流进行有效的控制和管理. 2)当防火墙收到数据报文时,把 ...

  9. 113. Path Sum II 输出每个具体路径

    [抄题]: Given a binary tree and a sum, find all root-to-leaf paths where each path's sum equals the gi ...

  10. Java15-java语法基础(十五)——内部类

    java16-java语法基础(十五)内部类 一.内部类: 可以在一个类的内部定义另一个类,这种类称为内部类. 二.内部类分为两种类型: 1.静态内部类: 静态内部类是一个具有static修饰词的类, ...