语音性别识别 - 使用R提取特征

步骤

1）安装R。windows操作系统安装包的链接：https://cran.r-project.org/bin/windows/base/

2）切换当前路径为脚本所在路径

点击文件 > 改变工作目录

3）运行脚本

点击文件 > 运行R脚本文件

如果希望自己生成训练数据，就运行生成训练数据的脚本。如果只是想生成测试数据，就运行生成测试数据的脚本。

生成训练数据的脚本

将男声的音频文件置于male文件夹下，将女声的音频文件置于female文件夹下

packages <- c('tuneR', 'seewave', 'fftw', 'caTools', 'warbleR', 'mice', 'e1071', 'rpart', 'e1071')

if (length(setdiff(packages, rownames(installed.packages()))) > ) {

  install.packages(setdiff(packages, rownames(installed.packages())))

}

library(tuneR)

library(seewave)

library(caTools)

library(rpart)

library(warbleR)

library(mice)

library(e1071)

specan3 <- function(X, bp = c(,), wl = , threshold = , parallel = ){

  # To use parallel processing: library(devtools), install_github('nathanvan/parallelsugar')

  if(class(X) == "data.frame") {if(all(c("sound.files", "selec",

                                         "start", "end") %in% colnames(X)))

  {

    start <- as.numeric(unlist(X$start))

    end <- as.numeric(unlist(X$end))

    sound.files <- as.character(unlist(X$sound.files))

    selec <- as.character(unlist(X$selec))

  } else stop(paste(paste(c("sound.files", "selec", "start", "end")[!(c("sound.files", "selec",

                                                                        "start", "end") %in% colnames(X))], collapse=", "), "column(s) not found in data frame"))

  } else  stop("X is not a data frame")

  #if there are NAs in start or end stop

  if(any(is.na(c(end, start)))) stop("NAs found in start and/or end")  

  #if end or start are not numeric stop

  if(all(class(end) != "numeric" & class(start) != "numeric")) stop("'end' and 'selec' must be numeric")

  #if any start higher than end stop

  if(any(end - start<)) stop(paste("The start is higher than the end in", length(which(end - start<)), "case(s)"))  

  #if any selections longer than 20 secs stop

  if(any(end - start>)) stop(paste(length(which(end - start>)), "selection(s) longer than 20 sec"))

  options( show.error.messages = TRUE)

  #if bp is not vector or length!=2 stop

  if(!is.vector(bp)) stop("'bp' must be a numeric vector of length 2") else{

    if(!length(bp) == ) stop("'bp' must be a numeric vector of length 2")}

  #return warning if not all sound files were found

  fs <- list.files(path = getwd(), pattern = ".wav$", ignore.case = TRUE)

  if(length(unique(sound.files[(sound.files %in% fs)])) != length(unique(sound.files)))

    cat(paste(length(unique(sound.files))-length(unique(sound.files[(sound.files %in% fs)])),

              ".wav file(s) not found"))

  #count number of sound files in working directory and if  stop

  d <- which(sound.files %in% fs)

  if(length(d) == ){

    stop("The .wav files are not in the working directory")

  }  else {

    start <- start[d]

    end <- end[d]

    selec <- selec[d]

    sound.files <- sound.files[d]

  }

  # If parallel is not numeric

  if(!is.numeric(parallel)) stop("'parallel' must be a numeric vector of length 1")

  if(any(!(parallel %%  == ),parallel < )) stop("'parallel' should be a positive integer")

  # If parallel was called

  if(parallel > )

  { options(warn = -)

    if(all(Sys.info()[] == "Windows",requireNamespace("parallelsugar", quietly = TRUE) == TRUE))

      lapp <- function(X, FUN) parallelsugar::mclapply(X, FUN, mc.cores = parallel) else

        if(Sys.info()[] == "Windows"){

          cat("Windows users need to install the 'parallelsugar' package for parallel computing (you are not doing it now!)")

          lapp <- pbapply::pblapply} else lapp <- function(X, FUN) parallel::mclapply(X, FUN, mc.cores = parallel)} else lapp <- pbapply::pblapply

  options(warn = )

  if(parallel == ) cat("Measuring acoustic parameters:")

  x <- as.data.frame(lapp(:length(start), function(i) {

    r <- tuneR::readWave(file.path(getwd(), sound.files[i]), from = start[i], to = end[i], units = "seconds") 

    b<- bp #in case bp its higher than can be due to sampling rate

    if(b[] > ceiling(r@samp.rate/) - ) b[] <- ceiling(r@samp.rate/) -  

    #frequency spectrum analysis

    songspec <- seewave::spec(r, f = r@samp.rate, plot = FALSE)

    analysis <- seewave::specprop(songspec, f = r@samp.rate, flim = c(, /), plot = FALSE)

    #save parameters

    meanfreq <- analysis$mean/

    sd <- analysis$sd/

    median <- analysis$median/

    Q25 <- analysis$Q25/

    Q75 <- analysis$Q75/

    IQR <- analysis$IQR/

    skew <- analysis$skewness

    kurt <- analysis$kurtosis

    sp.ent <- analysis$sh

    sfm <- analysis$sfm

    mode <- analysis$mode/

    centroid <- analysis$cent/

    #Frequency with amplitude peaks

    peakf <- #seewave::fpeaks(songspec, f = r@samp.rate, wl = wl, nmax = , plot = FALSE)[, ]

    #Fundamental frequency parameters

    ff <- seewave::fund(r, f = r@samp.rate, ovlp = , threshold = threshold,

                        fmax = , ylim=c(, /), plot = FALSE, wl = wl)[, ]

    meanfun<-mean(ff, na.rm = T)

    minfun<-min(ff, na.rm = T)

    maxfun<-max(ff, na.rm = T)

    #Dominant frecuency parameters

    y <- seewave::dfreq(r, f = r@samp.rate, wl = wl, ylim=c(, /), ovlp = , plot = F, threshold = threshold, bandpass = b * , fftw = TRUE)[, ]

    meandom <- mean(y, na.rm = TRUE)

    mindom <- min(y, na.rm = TRUE)

    maxdom <- max(y, na.rm = TRUE)

    dfrange <- (maxdom - mindom)

    duration <- (end[i] - start[i])

    #modulation index calculation

    changes <- vector()

    for(j in which(!is.na(y))){

      change <- abs(y[j] - y[j + ])

      changes <- append(changes, change)

    }

    if(mindom==maxdom) modindx<- else modindx <- mean(changes, na.rm = T)/dfrange

    #save results

    return(c(duration, meanfreq, sd, median, Q25, Q75, IQR, skew, kurt, sp.ent, sfm, mode,

             centroid, peakf, meanfun, minfun, maxfun, meandom, mindom, maxdom, dfrange, modindx))

  }))

  #change result names

  rownames(x) <- c("duration", "meanfreq", "sd", "median", "Q25", "Q75", "IQR", "skew", "kurt", "sp.ent",

                   "sfm","mode", "centroid", "peakf", "meanfun", "minfun", "maxfun", "meandom", "mindom", "maxdom", "dfrange", "modindx")

  x <- data.frame(sound.files, selec, as.data.frame(t(x)))

  colnames(x)[:] <- c("sound.files", "selec")

  rownames(x) <- c(:nrow(x))

  return(x)

}

processFolder <- function(folderName) {

  # Start with empty data.frame.

  data <- data.frame()

  # Get list of files in the folder.

  list <- list.files(folderName, '\\.wav')

  # Add file list to data.frame for processing.

  for (fileName in list) {

    row <- data.frame(fileName, , , )

    data <- rbind(data, row)

  }

  # Set column names.

  names(data) <- c('sound.files', 'selec', 'start', 'end')

  # Move into folder for processing.

  setwd(folderName)

  # Process files.

  acoustics <- specan3(data, parallel=)

  # Move back into parent folder.

  setwd('..')

  acoustics

}

gender <- function(filePath) {

  if (!exists('genderBoosted')) {

    load('model.bin')

  }

  # Setup paths.

  currentPath <- getwd()

  fileName <- basename(filePath)

  path <- dirname(filePath)

  # Set directory to read file.

  setwd(path)

  # Start with empty data.frame.

  data <- data.frame(fileName, , , )

  # Set column names.

  names(data) <- c('sound.files', 'selec', 'start', 'end')

  # Process files.

  acoustics <- specan3(data, parallel=)

  # Restore path.

  setwd(currentPath)

  predict(genderCombo, newdata=acoustics)

}

# Load data

males <- processFolder('male')

females <- processFolder('female')

# Set labels.

males$label <-

females$label <-

data <- rbind(males, females)

data$label <- factor(data$label, labels=c('male', 'female'))

# Remove unused columns.

data$duration <- NULL

data$sound.files <- NULL

data$selec <- NULL

data$peakf <- NULL

# Remove rows containing NA's.

data <- data[complete.cases(data),]

# Write out csv dataset.

write.csv(data, file='voice.csv', sep=',', row.names=F)

meelo

生成测试数据的脚本

将测试音频文件置于test文件夹下

packages <- c('tuneR', 'seewave', 'fftw', 'caTools', 'warbleR', 'mice', 'e1071', 'rpart', 'e1071')

if (length(setdiff(packages, rownames(installed.packages()))) > ) {

  install.packages(setdiff(packages, rownames(installed.packages())))

}

library(tuneR)

library(seewave)

library(caTools)

library(rpart)

library(warbleR)

library(mice)

library(e1071)

specan3 <- function(X, bp = c(,), wl = , threshold = , parallel = ){

  # To use parallel processing: library(devtools), install_github('nathanvan/parallelsugar')

  if(class(X) == "data.frame") {if(all(c("sound.files", "selec",

                                         "start", "end") %in% colnames(X)))

  {

    start <- as.numeric(unlist(X$start))

    end <- as.numeric(unlist(X$end))

    sound.files <- as.character(unlist(X$sound.files))

    selec <- as.character(unlist(X$selec))

  } else stop(paste(paste(c("sound.files", "selec", "start", "end")[!(c("sound.files", "selec",

                                                                        "start", "end") %in% colnames(X))], collapse=", "), "column(s) not found in data frame"))

  } else  stop("X is not a data frame")

  #if there are NAs in start or end stop

  if(any(is.na(c(end, start)))) stop("NAs found in start and/or end")  

  #if end or start are not numeric stop

  if(all(class(end) != "numeric" & class(start) != "numeric")) stop("'end' and 'selec' must be numeric")

  #if any start higher than end stop

  if(any(end - start<)) stop(paste("The start is higher than the end in", length(which(end - start<)), "case(s)"))  

  #if any selections longer than 20 secs stop

  if(any(end - start>)) stop(paste(length(which(end - start>)), "selection(s) longer than 20 sec"))

  options( show.error.messages = TRUE)

  #if bp is not vector or length!=2 stop

  if(!is.vector(bp)) stop("'bp' must be a numeric vector of length 2") else{

    if(!length(bp) == ) stop("'bp' must be a numeric vector of length 2")}

  #return warning if not all sound files were found

  fs <- list.files(path = getwd(), pattern = ".wav$", ignore.case = TRUE)

  if(length(unique(sound.files[(sound.files %in% fs)])) != length(unique(sound.files)))

    cat(paste(length(unique(sound.files))-length(unique(sound.files[(sound.files %in% fs)])),

              ".wav file(s) not found"))

  #count number of sound files in working directory and if  stop

  d <- which(sound.files %in% fs)

  if(length(d) == ){

    stop("The .wav files are not in the working directory")

  }  else {

    start <- start[d]

    end <- end[d]

    selec <- selec[d]

    sound.files <- sound.files[d]

  }

  # If parallel is not numeric

  if(!is.numeric(parallel)) stop("'parallel' must be a numeric vector of length 1")

  if(any(!(parallel %%  == ),parallel < )) stop("'parallel' should be a positive integer")

  # If parallel was called

  if(parallel > )

  { options(warn = -)

    if(all(Sys.info()[] == "Windows",requireNamespace("parallelsugar", quietly = TRUE) == TRUE))

      lapp <- function(X, FUN) parallelsugar::mclapply(X, FUN, mc.cores = parallel) else

        if(Sys.info()[] == "Windows"){

          cat("Windows users need to install the 'parallelsugar' package for parallel computing (you are not doing it now!)")

          lapp <- pbapply::pblapply} else lapp <- function(X, FUN) parallel::mclapply(X, FUN, mc.cores = parallel)} else lapp <- pbapply::pblapply

  options(warn = )

  if(parallel == ) cat("Measuring acoustic parameters:")

  x <- as.data.frame(lapp(:length(start), function(i) {

    r <- tuneR::readWave(file.path(getwd(), sound.files[i]), from = start[i], to = end[i], units = "seconds") 

    b<- bp #in case bp its higher than can be due to sampling rate

    if(b[] > ceiling(r@samp.rate/) - ) b[] <- ceiling(r@samp.rate/) -  

    #frequency spectrum analysis

    songspec <- seewave::spec(r, f = r@samp.rate, plot = FALSE)

    analysis <- seewave::specprop(songspec, f = r@samp.rate, flim = c(, /), plot = FALSE)

    #save parameters

    meanfreq <- analysis$mean/

    sd <- analysis$sd/

    median <- analysis$median/

    Q25 <- analysis$Q25/

    Q75 <- analysis$Q75/

    IQR <- analysis$IQR/

    skew <- analysis$skewness

    kurt <- analysis$kurtosis

    sp.ent <- analysis$sh

    sfm <- analysis$sfm

    mode <- analysis$mode/

    centroid <- analysis$cent/

    #Frequency with amplitude peaks

    peakf <- #seewave::fpeaks(songspec, f = r@samp.rate, wl = wl, nmax = , plot = FALSE)[, ]

    #Fundamental frequency parameters

    ff <- seewave::fund(r, f = r@samp.rate, ovlp = , threshold = threshold,

                        fmax = , ylim=c(, /), plot = FALSE, wl = wl)[, ]

    meanfun<-mean(ff, na.rm = T)

    minfun<-min(ff, na.rm = T)

    maxfun<-max(ff, na.rm = T)

    #Dominant frecuency parameters

    y <- seewave::dfreq(r, f = r@samp.rate, wl = wl, ylim=c(, /), ovlp = , plot = F, threshold = threshold, bandpass = b * , fftw = TRUE)[, ]

    meandom <- mean(y, na.rm = TRUE)

    mindom <- min(y, na.rm = TRUE)

    maxdom <- max(y, na.rm = TRUE)

    dfrange <- (maxdom - mindom)

    duration <- (end[i] - start[i])

    #modulation index calculation

    changes <- vector()

    for(j in which(!is.na(y))){

      change <- abs(y[j] - y[j + ])

      changes <- append(changes, change)

    }

    if(mindom==maxdom) modindx<- else modindx <- mean(changes, na.rm = T)/dfrange

    #save results

    return(c(duration, meanfreq, sd, median, Q25, Q75, IQR, skew, kurt, sp.ent, sfm, mode,

             centroid, peakf, meanfun, minfun, maxfun, meandom, mindom, maxdom, dfrange, modindx))

  }))

  #change result names

  rownames(x) <- c("duration", "meanfreq", "sd", "median", "Q25", "Q75", "IQR", "skew", "kurt", "sp.ent",

                   "sfm","mode", "centroid", "peakf", "meanfun", "minfun", "maxfun", "meandom", "mindom", "maxdom", "dfrange", "modindx")

  x <- data.frame(sound.files, selec, as.data.frame(t(x)))

  colnames(x)[:] <- c("sound.files", "selec")

  rownames(x) <- c(:nrow(x))

  return(x)

}

processFolder <- function(folderName) {

  # Start with empty data.frame.

  data <- data.frame()

  # Get list of files in the folder.

  list <- list.files(folderName, '\\.wav')

  # Add file list to data.frame for processing.

  for (fileName in list) {

    row <- data.frame(fileName, , , )

    data <- rbind(data, row)

  }

  # Set column names.

  names(data) <- c('sound.files', 'selec', 'start', 'end')

  # Move into folder for processing.

  setwd(folderName)

  # Process files.

  acoustics <- specan3(data, parallel=)

  # Move back into parent folder.

  setwd('..')

  acoustics

}

gender <- function(filePath) {

  if (!exists('genderBoosted')) {

    load('model.bin')

  }

  # Setup paths.

  currentPath <- getwd()

  fileName <- basename(filePath)

  path <- dirname(filePath)

  # Set directory to read file.

  setwd(path)

  # Start with empty data.frame.

  data <- data.frame(fileName, , , )

  # Set column names.

  names(data) <- c('sound.files', 'selec', 'start', 'end')

  # Process files.

  acoustics <- specan3(data, parallel=)

  # Restore path.

  setwd(currentPath)

  predict(genderCombo, newdata=acoustics)

}

# Load data

data <- processFolder('test')

# Remove unused columns.

data$duration <- NULL

data$sound.files <- NULL

data$selec <- NULL

data$peakf <- NULL

# Remove rows containing NA's.

data <- data[complete.cases(data),]

# Write out csv dataset.

write.csv(data, file='test.csv', sep=',', row.names=F)

meelo

语音性别识别 - 使用R提取特征的更多相关文章

论文笔记：语音情感识别（三）手工特征+CRNN
一:Emotion Recognition from Human Speech Using Temporal Information and Deep Learning(2018 InterSpeec ...
论文笔记：语音情感识别（四）语音特征之声谱图，log梅尔谱，MFCC，deltas
一:原始信号从音频文件中读取出来的原始语音信号通常称为raw waveform,是一个一维数组,长度是由音频长度和采样率决定,比如采样率Fs为16KHz,表示一秒钟内采样16000个点,这个时候如果 ...
C++开发人脸性别识别教程（12）——加入性别识别功能
经过之前几篇博客的解说,我们已经成功搭建了MFC应用框架,并实现了主要的图像显示和人脸检測程序,在这篇博文中我们要向当中加入性别识别代码. 关于性别识别,之前已经专门拿出两篇博客的篇幅来进行解说.这里 ...
论文笔记：语音情感识别（五）语音特征集之eGeMAPS，ComParE，09IS，BoAW
一:LLDs特征和HSFs特征 (1)首先区分一下frame和utterance,frame就是一帧语音.utterance是一段语音,是比帧高一级的语音单位,通常指一句话,一个语音样本.uttera ...
论文笔记：语音情感识别（二）声谱图+CRNN
一:An Attention Pooling based Representation Learning Method for Speech Emotion Recognition(2018 Inte ...
基于人脸识别+IMDB-WIFI+Caffe的性别识别
本文用记录基于Caffe的人脸性别识别过程.基于imdb-wiki模型做finetune,imdb-wiki数据集合模型可从这里下载:https://data.vision.ee.ethz.ch/cv ...
图像物体检測识别中的LBP特征
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/xinzhangyanxiang/article/details/37317863 图像物体检測识别中 ...
基于深度学习的人脸性别识别系统（含UI界面，Python代码）
摘要:人脸性别识别是人脸识别领域的一个热门方向,本文详细介绍基于深度学习的人脸性别识别系统,在介绍算法原理的同时,给出Python的实现代码以及PyQt的UI界面.在界面中可以选择人脸图片.视频进行检 ...
卷积神经网络提取特征并用于SVM
模式识别课程的一次作业.其目标是对UCI的手写数字数据集进行识别,样本数量大约是1600个.图片大小为16x16.要求必须使用SVM作为二分类的分类器. 本文重点是如何使用卷积神经网络(CNN)来提取 ...

随机推荐

小Q与内存
Portal --> broken qwq Description (这个描述好像怎么都精简不起来啊qwq) 大概是说你的计算机有1GB的物理内存,按照Byte寻址,其物理地址空间为\(0\si ...
【CF601C】Kleofáš and the n-thlon
Portal -->CF601C Description 大概是说\(m\)个人参加\(n\)场比赛,每场一人有一个排名,每场没有两个人排名相同,一个人最后的得分是\(n\)场比赛的排名相加,现 ...
【bzoj3230】相似子串
Portal -->bzoj3230 Description 给你一个长度为\(n\)的字符串,把它的所有本质不同的子串按字典序大小排序,有\(m\)个询问,对于每一个询问\(x,y\)你需要回 ...
使用OpenCV进行标定（转载）
转载自牛猫靖 http://www.cnblogs.com/2008nmj/p/6278076.html 使用OpenCV进行相机标定 1. 使用OpenCV进行标定相机已经有很长一段历史了.但是 ...
洛谷P1199 三国游戏
题目描述小涵很喜欢电脑游戏,这些天他正在玩一个叫做<三国>的游戏. 在游戏中,小涵和计算机各执一方,组建各自的军队进行对战.游戏中共有 N 位武将(N为偶数且不小于 4),任意两个武将之 ...
js浏览器调试方法
chrome浏览器可在需要断点的地方写一个关键字 "debugger",这样在 js 运行到这里的时候会停止继续运行,并可以查看当前状态
laravel添加日常备份任务
app/Console/Command/MySqlDump.php <?php namespace App\Console\Commands; use Illuminate\Console\Co ...
mysql 查询小demo
两张表的的结构如下,需求是写出从one表到two表和从two表到one表的查询转换. create table student_one( name varchar(50) default '' not ...
flex属性设置详解
CSS代码中常见这样的写法:flex:1 这是flex 的缩写: flex-grow.flex-shrink.flex-basis,其取值可以考虑以下情况: 1. flex 的默认值是以上三个属性值的 ...
Jquery validate验证表单时多个name相同的元素只验证第一个的问题
下面搜集了五种方法,主要还是前两个提供了解决方案,第三种需要修改jQuery源码: 修复jquery.validate插件中name属性相同(如name='a[]')时验证的bug 使用jquery. ...

语音性别识别 - 使用R提取特征

语音性别识别 - 使用R提取特征的更多相关文章

随机推荐

热门专题