《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记
出处:ICLR 2017
Motivation
提出一个通用的基于RNN的pop music生成模型,在层次结构中封装了先验乐理知识(prior knowledge about how pop music is composed)。bottom layers生成旋律,higher levels生成鼓,和弦等。人工听觉测试的结论优于google提出的模型。并且作者基于该模型加了两个小应用:neural dancing and karaoke, as well as neural story singing.
Introduction
作者从机器学习往艺术领域的渗透开始谈起,目前已经在模仿梵高风格绘画,生成story,莎翁的小说等等方面取得进展,音乐是其中一个分属领域。RNN在自然语言文本处理方面有着自己的优势,在它的基础上完成音乐生成的工作具备可行性。例如[1,2,3,4]。但这些前人的工作基本都是生成单轨道的note,多轨道生成的研究有[5](polyphonic music)。作者希望能将旋律,和弦,鼓及其他乐器轨道同时生成出来,以构成完整意义上的pop song。作者的想法借鉴了yotube视频上基于$pi$的序列弹奏钢琴曲的启迪(https://youtu.be/OMq9he-5HUU),该钢琴曲的一些生成规则使随机不循环数列形成了音乐(shows both the randomness and the regularity of music. On one hand, since any possible digit sequence is a subset of the $pi$ digit sequence, this implies that pleasing music can be created even from a totally random base signal. On the other hand, the composer uses specific rules such as A Harmonic Minor scale and harmonies to convert the digit sequence into a music sheet. It is these rules that play the key role in converting randomness into music.)
Related work
基本上智能谱曲经历的时期是早期的机器学习+乐理[6],到神经网络学习[1,2,3],再到后面的深度学习(RNN)[4,7,8]+淡化乐理
音乐常识
what is note?defines the basic unit that music is composed of
12均分律 Music follows the 12-tone system, i.e., 12 is the cycle length of all notes. The 12 tones are: $C$, $C\^#=D\^b&, &D&,&D\^#=E\^b&, $E$,$F$, $F\^#=G\^b$, &G&, &G\^#=A\^b&, $A$,&A\^#=B\^b&, &B&.
A bar is a short segment of time that corresponds to a specific number of beats (notes).
Scale is a subset of notes.最常见的四种音阶:大小调Major (Minor), 和声小调Harmonic Minor, 旋律小调Melodic Minor and 布鲁斯Blues。如C大调音阶(C major)
从c开始
The subset of notes specified by C Major is thus C, D, E, F, G, A, and B (a subset of seven notes). All scales types have a subset of seven notes except for Blues which has six. In total we have 48 unique scales, i.e. 4 scale types and 12 possible starting notes. We treat Major and Minor as one type as for a Major scale there is always a Minor that has exactly the same set of notes. In music theory, this is referred to as Relative Minor.(关系小调)
Chord 和弦
The Circle of Fifths 五度音环

利用五度圈可以很容易进行和弦倾向的走向判定(strong chord progression),使整个乐章进行和谐。
模型结构
在生成音乐时,需要将scale作为条件,以便模型选择node。在每个timestep,将旋律melody封装为两个随机变量:key layer和press layer 分别表示按下的key值和duration时间。对于chord和鼓,作者假设它们与旋律是独立的,在每一个timestep,将旋律作为条件,生成chord layer和drum layer。

在实验时,作者针对Scale条件做了一些预处理。通常一个类型的音阶只会使用到12均分律中的7个音,或blues使用的6个。在数据集midi_man中采样了100个小时的pop song sample后,作者对所有note做了一个normalization,将首个note都平移至C(其余notes也做相应的平移),这样就便于将所有的歌曲归纳到4种类型的音阶中去。
旋律生成采用了两层的RNN(LSTM)模型,模型基于我们选定的音阶条件来生成音符,第一层为key layer,第二层为press layer。

由于有不同的scale,所以针对不同的scale,参数不一样,要重新训练???? notes的输入输出范围被限定在C3 to C6,鼓励但不限定输出note一定要在scale的范围内,这样就会得到3个全音程(每个12个音符)加上静音共37个输出的范围值。press layer的输出使用softmax(?为什么)
LSTM的输入包括:以one-hot形式编码的上一个时间节点的note输出
, Lookback features(由Google Magenta提出,可以使模型更容易记住近期的生成并在将来进行repeat,这里面有一些细节的数据结构,如用来记录一个bar和两个bar之前的输出与当前输出的对应关系之类的,需要看代码细致了解才行),melody profile(表现了high-level music flow,To get the profile for each song, we compute the local
note histogram at each time step with width of two bars, and cluster all local histograms within the song into 10 clusters via k-means. We order the 10 clusters with mean note ordered from low to high as cluster 1 to 10, and apply moving averages on the cluster id sequence to encourage local smoothness. This results in a 10-dimensional one-hot vector representation of the cluster id for each time step. This additional information allows the user to set the melody’s ups and downs of the song.本人理解这个profile定义了旋律的走向是升高还是降低)。使用了增序序列1,2,3...来表示按键的持续时间,作者指出这种方式相对于Magenta的单一note on消息要有优势,This is important, as Waite et al. has extremely unbalanced output distributions dominated by the repeat-of-holding event. We represent press $y_prs_^t$ as a 8-dimensional one-hot vector. The input to our LSTM is$y_prs_^{t-1}$ , concatenated with the 37-dimensional one-hot encoding of the melody key $y_key_^t$.
Chord layer
作者发现 99.19% of the chords belong to one of 72 chord classes (6 types X 12 start notes),且chord is strongly correlated with melody.如下为首音符与和弦的对应关系统计图

[1]Jamshed J. Bharucha and Peter M. Todd. Modeling the perception of tonal structure with neural nets. Computer Music Journal, 13(4):44–53, 1989.
[2]Michael C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connection Science, 6(2-3), 1996.
[3]Chun-Chi J. Chen and Risto Miikkulainen. Creating melodies with evolving recurrent neural networks. In International Joint Conference on Neural Networks, 2001.
[4]Douglas Eck and Juergen Schmidhuber. A first look at music composition using lstm recurrent neural networks. 2002.
[5]Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.In ICML, 2012.
[6]Michael Chan, John Potter, and Emery Schubert. Improving algorithmic music composition with machine learning. In 9th International Conference on Music Perception and Cognition, 2006.
[7]Semin Kang, Soo-Yol Ok, and Young-Min Kang. Automatic Music Generation and Machine Learning Based Evaluation, pp. 436–443. Springer Berlin Heidelberg, 2012.(复调,但是scale type is enforced)
[8]Allen Huang and Raymond Wu. Deep learning for music. arXiv preprint arXiv:1606.04930, 2016 (2-layer LSTM,able to create chord)
《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记的更多相关文章
- 《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
- [place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析(转)
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
- 论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
- 论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果 用5G的计算量和25M的参数. ...
- 论文笔记:Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
- 论文笔记:ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
- 论文笔记:DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
- 论文笔记:Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
- 论文笔记:Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
- 论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...
随机推荐
- Go -- 中结构体与字节数组能相互转化
编码时如下,假设默认你的结构体为data func Encode(data interface{}) ([]byte, error) { buf := bytes.NewBuffer(nil) enc ...
- android中MVC,MVP和MVVM三种模式详解析
我们都知道,Android本身就采用了MVC模式,model层数据源层我们就不说了,至于view层即通过xml来体现,而 controller层的角色一般是由activity来担当的.虽然我们项目用到 ...
- Javascript:如何调用全局变量?
怎样使用全局变量呢? window.globalVariableName 参考: https://blog.csdn.net/zyz511919766/article/details/7276089
- 体验Windows 2008 R2的RemoteApp
[说明]这是<中小企业虚拟机解决方案大全>一书中部分章节的摘抄.该书预计于2009年12月初由<电子工业出版社>出版,敬请期待! 通过远程桌面服务,组织可以为用户提供随时随 ...
- XP操作系统设置:[82]关机快捷键
磨镰刀不少割麦,掌握了快速关机的多种方法,在尴尬的时候说不定还真能派上用场呢. 工具/原料 手提电脑.台式电脑.Windows 操作系统. 方法一: 1 Windows XP 操作系统中有 ...
- redis hash 类型的操作命令
redis 文档: https://redis.readthedocs.io/en/2.4/index.html keys * type key --------------------------- ...
- Linux温习(三)Linux文件和文件夹管理
关于Linux文件夹的几个常见概念 路径 对文件位置信息的描写叙述机制.是指从树型文件夹中的某个文件夹层次到其内某个文件的一条通路.分为相对路径和绝对路径: 工作文件夹 登入系统后.用户始终处于某个文 ...
- iOS之UI--使用SWRevealViewController 实现侧边菜单功能详解实例
iOS之UI--使用SWRevealViewController 实现侧边菜单功能详解实例 使用SWRevealViewController实现侧边菜单功能详解 下面通过两种方法详解SWReveal ...
- 查询历史使用过的命令并使用(history)
一.什么是history 在bash功能中.它能记忆使用过的命令,这个功能最大的优点就是能够查询以前做过的举动.从而能够知道你的执行步骤.那么就能够追踪你曾下达过的命令.以作为除错的工具. 二.His ...
- 在VC中动态加载ODBC的方法
在使用VC.VB.Delphi等高级语言编写数据库应用程序时,往往需要用户自己在控制面板中配置ODBC数据源.对于一般用户而言,配置ODBC数据源可能是一件比较困难的工作.而且,在实际应用中,用户往往 ...