[Audio processing] 数据集生成 & 性别年龄分类训练 Python

1、重命名，Python中文路径各种错误，所以需要先将所有文件的路径名全都改成中文。用的是MAC系统，所以WIN下的命令行批处理没法解决，所以用C来完成

//  Created by Carl on 16.

//  Copyright (c) 2016年 Carl. All rights reserved.

//

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <dirent.h>

#include <unistd.h>

using namespace std;

void getFileList()

{

    string sourceDir = "/Users/karl/Work/database/rawdata/children_CN/";

    string targetDir = "/Users/karl/Work/database/rawdata/children/";

    DIR *dir;

    struct dirent *ptr;

    int i = ;

    if ((dir=opendir(sourceDir.c_str())) == NULL)

    {

        perror("Open dir error...");

        exit();

    }

    while ((ptr=readdir(dir)) != NULL)

    {

        if(strcmp(ptr->d_name,".")== || strcmp(ptr->d_name,"..")==)    ///current dir OR parrent

            continue;

        else if(ptr->d_type == )

        {

            printf("%s %s\n",(sourceDir + ptr->d_name).c_str(),(targetDir + to_string(i) + ".wav").c_str());

            if(rename((sourceDir + ptr->d_name).c_str(), (targetDir + to_string(i++) + ".wav").c_str())<)

                cout<<"error"<<endl;

            else

                cout<<"ok"<<endl;

        }

    }

    return;

}

int main() {

    getFileList();

    return ;

}

2、然后再使用FFMPEG那篇文章写的Python代码，将所有音频文件转成统一格式

#coding=utf-8

#!/usr/bin/env python

'''CREATED:2016-03-08

Use example of ffmpeg

'''

import argparse

import sys

import os

import string

import subprocess as sp

#Full path of ffmpeg

FFMPEG_BIN = "/Users/karl/Documents/python/audio/tool/ffmpeg"

#Full path of sourceDir

sourceDir = "/Users/karl/Work/database/rawdata/male/"

#Full path of targetDir

targetDir = "/Users/karl/Work/database/age/male/"

#Channel setting 1 for mono

ac = 1

#Sample frequency

sf = 16000

#Extension setting

ext = 'wav'

def convert(sourceDir, targetDir, ac, sf, ext):

    i = 0

    if not os.path.exists(targetDir):

        os.mkdir(targetDir)

    files = os.listdir(sourceDir)

    for f in files:

        if f.endswith('.wav'):

            command = [ FFMPEG_BIN,

                       '-i', os.path.join(sourceDir, f),

                       '-ac', str(ac),

                       '-ar', str(sf), os.path.join(targetDir, str(i) + "." + ext)]

            i += 1

            print command

            pipe = sp.Popen(command, stdout = sp.PIPE, bufsize = 10**8)

if __name__ == '__main__':

    convert(sourceDir, targetDir, ac, sf, ext)

3、用时域上RMS去除静音帧(Optional)

#---Cut the silent head and tail of audio

def rmsdemo(y):

    return np.sqrt((y**2).mean())

def cutheadntail(y, winlen, threshold):

    totallen = y.shape[0]

    num = totallen / winlen

    i = 1

    j = num

    for i in range(num):

        if rmsdemo(y[i * winlen : (i + 1) * winlen - 1]) > threshold:

            break

    for j in range(-1,0,-1):

        if rmsdemo(y[i * winlen : (i + 1) * winlen - 1]) > threshold or j == i:

            break

    #percentage = (j - i + 1) * 1.0 / num;

    #print(i, j, percentage)

    yy = y[i * winlen : (j + 1) * winlen - 1]

    return yy

4、用librosa提取特征，包括MFCC、DMFCC

from __future__ import print_function

import argparse

import sys

import os

import pprint

import sklearn as sl

import numpy as np

import librosa

import librosa.feature.spectral as f

import svmutil

#---Feature extraction and store, including MFCC, DMFCC

def mfcclist(data_dir):

    m = []

    dm = []

    for i in range(300):

        filepath = os.path.join(data_dir, str(i) + '.wav')

        print(filepath)

        am, adm = mfccfile(filepath)

        m.append(am)

        dm.append(adm)

        i += 1

    np.savetxt("TrainFemaleMFCC",m,fmt='%s',newline='\n')

    np.savetxt("TrainFemaleDMFCC",dm,fmt='%s',newline='\n')

    #print(m)

    #print(dm)

'''

    fout = open(output_file,'w')

    fout.write(str(am) + '\n')

    fout.write(str(adm))

    fout.close()

'''

def mfccfile(input_file):

    print('Loading ', input_file)

    y, sr = librosa.load(input_file)

    M = f.mfcc(y, sr, None, 13)

    DM = M[::,1::] - M[::,0:-1:1]

    am = np.mean(M, axis = 1)

    adm = np.mean(DM, axis = 1)

    return (am, adm)

#---Loading stored features file

def loadfeatures(features_file):

    fin = open(features_file, 'r')

    features = [map(float,ln.strip().split(' '))

                for ln in fin.read().splitlines() if ln.strip()]

                #pprint.pprint(features)

    print(features)

5、用libsvm训练和预测，包括归一化

#---SVM training and predicting process

def svmtraindemo(x, modelname, scalar):

    x = scalar.transform(x)

    #x = sl.preprocessing.scale(x)

    x = x.tolist()

    print(x)

    y = [1.0] * 300 + [1] * 300 + [-1.0] * 600

    model = svm_train(y, x, '-b 1')

    svm_save_model(modelname + str(0), model)

    p_label, p_acc, p_val = svm_predict(y[:1200], x[:1200], model, '-b 1')

def svmpredictdemo(x, modelname, scalar):

    x = scalar.transform(x)

    #x = sl.preprocessing.scale(x)

    x = x.tolist()

    print(len(x))

    y = [1.0] * 100 + [1] * 100 + [-1.0] * 200

    m = svm_load_model(modelname + str(0))

    print(p_label)

    p_label, p_acc, p_val = svm_predict(y[:400], x[:400], m, '-b 1')

附：

1、经过试验，发现用无监督的方式，准确来说是基于规则的方式分辨男、女、小孩的声音还是不太靠谱，频域上的分布还是用有监督的方式自己学习应该更可靠。

2、用有噪音的推无噪音的小孩，准确率80%，无噪音推有噪音的，准确率才60+%，所以训练还是最好用噪音环境的数据集吧，之前想的是训练应该用无噪音的样本还是太天真了。其实混合起来效果还不错。

3、男女的准确率也就80%，样本分布还是比较好，而且均有噪音，估计在实际应用中效果也不会比80%差太远。

[Audio processing] 数据集生成 & 性别年龄分类训练 Python的更多相关文章

keras系列︱图像多分类训练与利用bottleneck features进行微调（三）
引自:http://blog.csdn.net/sinat_26917383/article/details/72861152 中文文档:http://keras-cn.readthedocs.io/ ...
使用Python基于TensorFlow的CIFAR-10分类训练
TensorFlow Models GitHub:https://github.com/tensorflow/models Document:https://github.com/jikexueyua ...
编程语言分类及python所属类型
编程语言分类及python所属类型编程语言主要从以下几个角度为进行分类:编译型和解释型.静态语言和动态语言.强类型定义语言和弱类型定义语言. 编译和解释的区别是什么? 编译器是把源程序的每一条语句都 ...
day02-操作系统、编程语言分类及python安装
目录操作系统编程语言分类安装python解释器操作系统操作系统有什么用操作系统能接受外部指令转化成0和1,并把一些对硬件的复杂操作简化成一个个简单的接口,作为中间人连接硬件和软件计算机三 ...
Python生成文本格式的excel\xlwt生成文本格式的excel\Python设置excel单元格格式为文本\Python excel xlwt 文本格式
Python生成文本格式的excel\xlwt生成文本格式的excel\Python设置excel单元格格式为文本\Python excel xlwt 文本格式解决: xlwt 中设置单元格样式主要 ...
ctpn+crnn 训练数据集生成
1. https://github.com/Belval/TextRecognitionDataGenerator 2. https://textrecognitiondatagenerator.re ...
利用keras自带路透社数据集进行多分类训练
import numpy as np from keras.datasets import reuters from keras import layers from keras import mod ...
利用keras自带影评数据集进行评价正面与否的二分类训练
from keras.datasets import imdb from keras import layers from keras import models from keras import ...
Tensorflow2 自定义数据集图片完成图片分类任务
对于自定义数据集的图片任务,通用流程一般分为以下几个步骤: Load data Train-Val-Test Build model Transfer Learning 其中大部分精力会花在数据的准备 ...

随机推荐

iOS将产品进行多语言发布，开发
多语言就是程序的国际化.在Xcode中要实现程序的国际化,只需要简单配置,并修改相应的字符串键值对即可. 应用程序的国际化主要包括三个方面:A.程序名称国际化:B.程序内容国际化:C.程序资源国际化 ...
X3850 Linux 下DSA日志收集办法
收集工具下载 RHEL 6: 32bit-- [IBM 下载]http://delivery04.dhe.ibm.com/sar/CMA/XSA/03tza/1/ibm_utl_dsa_dsytb7x ...
SGU 106.Index of super-prime
时间限制:0.25s 空间限制:4M 题目大意: 在从下标1开始素数表里,下标为素数的素数,称为超级素数(Super-prime),给出一个n(n<=10000) ...
cmd 命令行下复制、粘贴的快捷键
1.单击左下角“开始”菜单,选择“运行”,输入“cmd”. 2.在弹出的cmd窗口的标题栏上点击“右键”,选择“属性”. 3.在弹出的对话框中选择“选项”这个选项卡,在“编辑选项”区域中勾选“快速编辑 ...
java.lang.String类compareTo()返回值解析
一.compareTo()的返回值是int,它是先比较对应字符的大小(ASCII码顺序)1.如果字符串相等返回值02.如果第一个字符和参数的第一个字符不等,结束比较,返回他们之间的差值(ascii码值 ...
自定义复选框 checkbox 样式
默认的复选框样式一般在项目中都很少用 ,看起来也丑丑的.这里提供一个优化样式后的复选框.原理就是隐藏掉默认样式,在用设计好的样式替代 html结构 <div> <input type ...
解决IE6不支持position:fixed;的问题
在网页设计中,时常要用到把某个元素始终定位在屏幕上,即使滚动浏览器窗口也不会发生变化. 一般我们会使用position:fixed来进行绝对固定,但IE6并不支持position:fixed属性,所以 ...
无法将类型为“System.__ComObject”的 COM 对象强制转换为接口类型,原因为没有注册类
错误描述 e = {"无法将类型为"System.__ComObject"的 COM 对象强制转换为接口类型"OpcRcw.Da.IOPCServer" ...
ubantu下重启apache
启动apache服务 sudo /etc/init.d/apache2 start重启apache服务sudo /etc/init.d/apache2 restart停止apache服务 sudo / ...
mysql命令行导出导入数据库
一.MYSQL的命令行模式的设置: 桌面->我的电脑->属性->环境变量->新建->PATH=“:path\mysql\bin;”其中path为MYSQL的安装路径.二. ...

[Audio processing] 数据集生成 & 性别年龄分类训练 Python

[Audio processing] 数据集生成 & 性别年龄分类训练 Python的更多相关文章

随机推荐

热门专题