17 Great Machine Learning Libraries
17 Great Machine Learning Libraries
After wonderful feedback on my previous post on Scikit-learn from the guys at /r/MachineLearning, I decided to collect the list of machine learning libraries into this seperate note. Let me know if there’s a library that should be included here.
Update (15 May 2014): thanks to Djalel Benbouzid and Dwayne Campbell for additional suggestions. Sorry it’s taken me so long to add them…
Python
- Scikit-learn: comprehensive and easy to use, I wrote a whole article on why I like this library.
- PyBrain: Neural networks are one thing that are missing from SciKit-learn, but this module makes up for it.
- nltk: really useful if you’re doing anything NLP or text mining related.
- Theano: efficient computation of mathematical expressions using GPU. Excellent for deep learning.
- Pylearn2: machine learning toolbox built on top of Theano - in very early stages of development.
- MDP (Modular toolkit for Data Processing): a framework that is useful when setting up workflows.
Java
- Spark: Apache’s new upstart, supposedly up to a hundred times faster than Hadoop, now includes MLLib, which contains a good selection of machine learning algorithms, including classification, clustering and recommendation generation. Currently undergoing rapid development. Development can be in Python as well as JVM languages.
- Mahout: Apache’s machine learning framework built on top of Hadoop, this looks promising, but comes with all the baggage and overhead of Hadoop.
- Weka: this is a Java based library with a graphical user interface that allows you to run experiments on small datasets. This is great if you restrict yourself to playing around to get a feel for what is possible with machine learning. However, I would avoid using this in production code at all costs: the API is very poorly designed, the algorithms are not optimised for production use and the documentation is often lacking.
- Mallet: another Java based library with an emphasis on document classification. I’m not so familiar with this one, but if you have to use Java this is bound to be better than Weka.
- JSAT: stands for “Java Statistical Analysis Tool” - created by Edward Raff and was born out of his frustation with Weka (I know the feeling). Looks pretty cool.
.NET
- Accord.NET: this seems to be pretty comprehensive, and comes recommended by primaryobjects on Reddit. There is perhaps a slight slant towards image processing and computer vision, as it builds on the popular library AForge.NET for this purpose.
- Another option is to use one of the Java libraries compiled to .NET using IKVM - I have used this approach with success in production.
C++
- Vowpal Wabbit: designed for very fast learning and released under a BSD license, this comes recommended by terath on Reddit.
- MultiBoost: a fast C++ framework implementing some boosting algorithms as well as some cascades (like the Viola-Jones cascades). It’s mainly focused on AdaBoost.MH so it is multi-class/multi-label.
- Shogun: large machine learning library with a focus on kernel methods and support vector machines. Bindings to Matlab, R, Octave and Python.
General
- LibSVM and LibLinear: these are C libraries for support vector machines; there are also bindings or implementations for many other languages. These are the libraries used for support vector machine learning in Scikit-learn.
Conclusion
This article is a work in progress, so please send me your comments or criticisms!
Want more? Sign up below to get a free ebook Machine Learning in Practice, and updates on new posts:
这两天开始折腾ML的开源库,ML的开源库有很多,比如Torch,MLC,Weka(基于java),Waffles,Shark,scikit,opencv-ml,等等,综合比较了各个开源库的优劣,决定搞搞以下几个库:
1. Shark,基于c++
2. scikit,基于python
3. weka,基于java
4. opencv-ml,基于c++,图像处理中用的比较多,之前已接触过
花了一个下午的时间终于成功安装配置Shark,感觉Shark库还是挺强大的,基本上包含了常用的ML算法,而且是基于C++,用起来比较顺手。
环境:win32, vs10
网上对于Shark的安装的相关文章很少,以下内容基本参考:(感谢分享)
http://www.cnblogs.com/xiangwengao/archive/2013/05/04/3059632.html
http://www.cnblogs.com/xiangwengao/archive/2013/05/01/3052821.html
http://www.cnblogs.com/xiangwengao/archive/2013/05/01/3052827.html
一、Shark——之正确获取
有两篇错误安装方法.这两篇介绍的获取Shark路径都有问题,根本不可用或者获取不了.(我已验证过确实这样)
第1篇错误http://www.iteye.com/news/27669
. 严重不对,因为SVN下载的是开发版,有时会缺少文件导致VS编译不成功,最终无法使用.我在按照svn下载安装时,缺少LinAlg的文件,根本无法使用.坚决建议大家别采用.
第2篇错误 http://shark-project.sourceforge.net/,根本找不到文件,地址早就失效了.该篇文章后面介绍的安装和使用还凑合.
正确的下载地址:https://sourceforge.net/projects/shark-project/files/Shark%20Core/下载zip文件进行安装.
版本:2.3.4
Shark利用CMake进行编译,需要C++ Boost库支持.具体后续.
二、Shark——之安装篇
Shark Machine Learning Library 的主页链接是:http://shark-project.sourceforge.net/,shark是由德国波鸿大学开发的,曾获得2011年世界开源大赛金奖。shark基于C++的泛型编程,里面大量使用了模板,因此封装性和继承性极佳。由于是基于C++的,所以函数的效率还是不错的。
shark的库主要分为4部分
- ReClaM 回归与分类模块 涵盖了线性方法、神经网络、SVM、Kernel 等
- EALib 进化计算模块
- MOO-EAlib 多目标的进化计算
- Fuzzy 模糊计算模块
OK, 开始吧,下面进入安装过程。shark的函数库可以安装在Microsoft,Linux,Mac 的操作系统上,本文介绍其在
Microsoft Windows 上的安装过程。值得注意的是,在下载的shark包路径 Shark/doc/TutorialsOld/
下面有一个在各种平台下的安装说明,但是比较老。
第一步,准备安装软件,产生编译文件。跨平台编译工具 Cmake v2.8,Mircosoft Visual Stdio 2005 或更高版本。我的shark 包的路径在 D:/shark ,cmake的设置如下
点击configure 按钮,选择我们需要的编译器 VS2005,然后再点击 Generate。完成后显示如下
这时候去看看 D:/build_shark 路径下,cmake 已经为我们生成了 VS2005 需要的编译文件了
第二步,使用 VS2005 编译连接,得到我们需要的 shark.lib 静态链接库。
双击 build_shark 文件夹下面的 shark.sln, 把工程导入到 vs2005 编译环境下。
这里大家就可以看到 shark
自带的所有实例工程和shark.lib的工程了,可以选择工具栏的“生成”—>“重新生成解决方案”,这时候vs2005就会为我们生成所有的实
例程序,由于实例比较多,整个过程可能持续数分钟,出去喝杯茶吧,保持耐心哦。当然,我是为了演示一下实例程序,所以选择重新生成了,你可以根据自己的需
要选择特定的工程,比如,你打开shark.vcproj,就会生成shark.lib。
这里再称赞一下德国人的严谨精神,70个工程,作为一个开源库居然没有错误一次编译成功,做工精细啊。
OK,编译完成后,看看 build_shark 文件夹下面多出来了好几个文件件,其中examples 下面就是所有的实例程序,当然还没有debug呢,需要哪个的话,自己去搞吧,关键是注意 debug 文件夹,下面终于见到我们需要的东西了:shark.lib
(Release也可以做一遍)
下一篇我讲一下如何把我们得到的shark.lib 导入到自己的工程里面,运行一个实例。
二、Shark——之运行篇
在上一篇里面,我们最后得到了Shark Machine Learning Library 的shark.lib 静态链接库。本文将继续讲解,使用得到的库,在VS2005 环境里运行一个shark自带的例子,这个例子叫做“TSP_GA”,看名字就知道了,使用遗传算法求解TSP问题的。
OK,开始吧。
第一步,先到这个路径Shark\examples\EALib 下面,找到本文要用的源文件TSP_GA.cpp。新建一个工程,文件路径下新建两个文件夹,一个叫include,一个叫lib,分别用于放置shark的头文件和链接库。
第二步,给工程添加静态链接库和头文件包含。点击“项目”->“属性”,选择“C/C++”->"常规",如下图所示,添加头文件的路径(附加包含目录)
然后,点击“链接器”->“常规”,添加shark.lib的附加库目录,如下图
继续,点击“链接器”->“输入”,填写库名称,如下图
OK,到此为止,我们就把工程的链接库和头文件都设置好了。
第三步,运行 TSP_GA 工程,成功!恭喜你,你已经成功安装了 shark 库函数!
说明一下,由于是控制台应用程序,最后运行完可能闪一下就没了。一个小技巧是,在程序最后加一句 getchar(); 这样敲回车才会退出。
总结:安装过程还算顺利,linux下面的安装待续......
17 Great Machine Learning Libraries的更多相关文章
- SOME USEFUL MACHINE LEARNING LIBRARIES.
		from: http://www.erogol.com/broad-view-machine-learning-libraries/ http://www.slideshare.net/Vincenz ... 
- How do I learn machine learning?
		https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ... 
- Python -- machine learning, neural network -- PyBrain 机器学习 神经网络
		I am using pybrain on my Linuxmint 13 x86_64 PC. As what it is described: PyBrain is a modular Machi ... 
- 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
		##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ... 
- Python Tools for Machine Learning
		Python Tools for Machine Learning Python is one of the best programming languages out there, with an ... 
- Deep Learning Libraries by Language
		Deep Learning Libraries by Language Tweet Python Theano is a python library for defining and ... 
- 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 17—Large Scale Machine Learning 大规模机器学习
		Lecture17 Large Scale Machine Learning大规模机器学习 17.1 大型数据集的学习 Learning With Large Datasets 如果有一个低方差的模型 ... 
- 【机器学习Machine Learning】资料大全
		昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ... 
- Machine Learning : Pre-processing features
		from:http://analyticsbot.ml/2016/10/machine-learning-pre-processing-features/ Machine Learning : Pre ... 
随机推荐
- 配置php连接apache
			配置php连接apache 1.安装php所需要的库 yum install zlib-devel libxml2-devel libjpeg-devel libjpeg-turbo-devel li ... 
- 配置apache+trac环境
			按照trac官网上的配置始终通不过.仔细看了,原来我们使用的apache版本是2.4的,在2.4中有些directive已经变了. 例如:原来的 Allow from all 现在变成了 Requir ... 
- winform模拟鼠标按键
			今天朋友说被他们公司的学习网站恶心到了,下班后要他看学习资料,看完点下一页,而且一页必须停留多少时间才能点击下一页,想不看都不行,于是晚上我突发奇想要给他做一个模拟鼠标按键的程序,可以让鼠标定时间隔触 ... 
- mvc中的OutputCache
			mvc4中有一个标记属性OutputCache,用来对ActionResult结果进行缓存,如何理解呢?概括地说,就是当你的请求参数没有发生变化时,直接从缓存中取结果,不会再走服务端的Action代码 ... 
- session阻塞机制,解决方法
			session从生成到读取,或从生成到写入都出现锁定的情况. 1.session_start();session_commit(); 2.session_start();session_write_c ... 
- Windows下MySQL数据库备份脚本(二)
			说明: MySQL数据库安装目录:C:\Program Files\MySQL\MySQL Server 5.0 MySQL数据库存放目录:C:\Program Files\MySQL\MySQL S ... 
- [python]filter
- xcopy总是询问是文件名还是目录名
			我需要运行类似xcopy /y a.xml .\pics\b.xml很多次,但xcopy总是问我“文件名还是目录名” 可以这样通过管道来做echo f | xcopy /y a.xml .\pics\ ... 
- ARM中的PC和AXD的PC
			R15 (PC)总是指向“正在取指”的指令,而不是指向“正在执行”的指令或正在“译码”的指令.一般来说,人们习惯性约定将“正在执行的指令作为参考点”,称之为当前第一条指令,因此PC 总是指向第三条指令 ... 
- 学习TextKit框架(上)
			TextKit简介 在iOS7之前我们要实现图文混排要使用CoreText,iOS6时有了Attribute string 可以解决一些简单的富文本需求.直到iOS7 苹果推出了TextKit,Tex ... 
