from: http://www.erogol.com/broad-view-machine-learning-libraries/

http://www.slideshare.net/VincenzoLomonaco/deep-learning-libraries-and-rst-experiments-with-theano

Especially, with the advent of many different and intricate Machine Learning algorithms, it is very hard to come up with your code to any problem. Therefore, the use of a library and its choice is imperative provision before you start the project. However, there are many different libraries having different quirks and rigs in different languages, even in multiple languages so that choice is not very straight forward as it seems.

Before you start, I strongly recommend you to experiment the library of your interest so as not to say " Ohh Buda!" at the end. For being a simple guide, I will point some possible libraries and signify some of them as my choices with the reason behind.

My simple bundle for small projects ----

I basically use Python for my problems, in general. Here are my frequently used libraries.

  • Scikit-learn - Very broad and well established library. It has different functionalities that meet your requisites at your work flow. If you do not need some peculiar algorithms, Scikit-learn is just enough for all. It is predicated with Numpy and Scipy at Python. It also proposes very easy way to paralleling your code with very easy way.
  • Pandas - Other than being a machine learning library pandas is a "Data Analysis Library". It gives very handy features to have some observations on data, just before you design your work flow. It support in memory  and storage functions. Hence, It is especially useful, if your data is up to some large scales that is not easy to be handled via simple methods or cannot be fit into memory as a whole.
  • Theano -  It is yet another Python library but it is a nonesuch library. Simply, it interfaces your python code to low-level languages. As you type in python like you do Numpy, it converts your code into prescribe low level counterparts and then compile them at that level. It gives very significant performance gains, particularly for large matrix operations. It is also able to utilize from GPU after simple configuration of the library without any further code change. One caveat is, it is not easy to debug  because of that compilation layer.
  • NLTK - It is a natural language processing tool with very unique and salient features. It also includes some basic classifiers like Naive Bayes. If your work is about text processing this is the right tool to process data.

Other Libraries -- (This list is being constantly updated.)

Deep Learning Libraries
  • Pylearn2 - "A machine learning research library". It is widely used especially among deep learning researches. It also includes some other features like Latent Dirichlet Allocation based on Gibbs sampling.
  • Theanets (new) - This is yet another Neural Networks library based on Theano. It is very simple to use and I think one of the best library for quick prototyping new ideas.
  • Hebel  - Another young alternative for Deep Learning implementation. "Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA."
  • Caffe  - A Convolutional Neural Network library for large scale tackles. It differs by having its own implemntation of CNN in low level C++ instead of well-known ImageNet implementation of Alex Krizhevsky. It assets faster alternative to Alex's code. It also provides MATLAB and Python interfaces.
  • cxxnet - Very similar to Caffe. It supports multi-GPU training as well. I've not used it extensively but it seems promising after my small experiments with MNIST dataset. It also servers very modular and easy development interface for new ideas. It has Python and Matlab interfaces as well.
  • mxnet - This is a library from the same developers of cxxnet. It has additional features after the experience gathered from cxxnet and other backhand libraries. Different than cxxnet, it has a good interface with Python which provides exclusive development features for deep learning and even general purpose algorithms requiring GPU parallelism.
  • Pybrain - "PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library."
  • Brainstorm - Python based, GPU possible deep learning library released by IDSIA lab. It is at ery early stage of development but it is still eye catching. At least for now, it targets recurrent networks and 2D convolution layers.
Linear Model and SVM Libraries
  • LibLinear - A Library for Large Linear Classification. It is also interfaced by Scikit-learn.
  • LibSVM - State of art SVM library with kernel support. It has also third-party plug-ins, if its built-in capabilities are not enough for you.
  • Vowpal Wabbit - I hear the name very often but haven't use it by now. However, it seems a decent library for fast machine learning.
General Purpose Libraries
  • Shougun - General usage ML library, similar to Scikit-learn. It supports for different programming languages.
  • MLPACK- "a scalable c++ machine learning library".
  • Orange- One another general use ML library. "Open source data visualization and analysis for novice and experts". It has Self-Organizing ( I am studying on  ) maps implementation that diverse it from others.
  • MILK- "SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be joined in many ways to form different classification systems."
  • Weka - Weka is a very command tool for machine learning with GUI support. If you do not want to code, you can cull the data to Weka and select your algorithm from drop-menu, set the parameters and go. Moreover, you can call its functions from your java code. It supports some other languages as well.
  • KNIME- Albeit I am not very fan of those kind of tools, KNIME is another example of GUI based framework. You just define your work-flow by creating a visual work-flow. Carry some process boxes to workspace, connect them as you want, set parameters and run.
  • Rapid-Miner - Yer another GUI based tool. It is very similar to KNIME but out of my practice, it has wider capabilities suited different domain of expertise.
Others
  • MontePython - Monte (python) is a Python framework for building gradient based learning machines, like neural networks, conditional random fields, logistic regression, etc. Monte contains modules (that hold parameters, a cost-function and a gradient-function) and trainers (that can adapt a module's parameters by minimizing its cost-function on training data).
  • Modular Toolkit for Data Processing - From the user’s perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures.
  • Statsmodels is another great library which focuses on statistical models and is used mainly for predictive and exploratory analysis. If you want to fit linear models, do statistical analysis, maybe a bit of predictive modeling, then Statsmodels is a great fit.
  • PyMVPA is another statistical learning library which is similar to Scikit-learn in terms of its API. It has cross-validation and diagnostic tools as well, but it is not as comprehensive as Scikit-learn.
  • PyMC is the tool of choice for Bayesians. It includes Bayesian models, statistical distributions and diagnostic tools for the convergence of models. It includes some hierarchical models as well. If you want to do Bayesian Analysis, you should check it out.
  • Gensim is topic modelling tool that is centered on Latent Dirichlet Allocation model. It also serves some degree of NLP functionalities.
  • Pattern-  Pattern is a web mining module for Python
  • Mirado-  is data visualization tool for complicated datasets supporting MAC and Win
  • XGBoost (new)-  If you like Gradient Boosting models and you like to o it faster and stronger, it is very useful library with C++ backend and Python, R wrappers. I should say that it is far faster than Sklearn's implementation

My computation stack ---

After the libraries, I feel the need of saying something about the computation environment that I use.

  • Numpy, Scipy,Ipython,Ipython-Notebook,Spyder - After waste some time with Matlab, I discovered those tools that  empower scientific computing with sufficient results. Numpy and Scipy are the very well-known scientific computing libraries. Ipython is an alternative to native python interpreter with very useful features. Ipython-Notebook is a very peculiar editor that is able to run on web-browser so it is good especially if you are working on a remote machine. Spyder is a python IDE and it has very useful capabilities that makes your experience very similar to Matlab. Last bu not least, all of them are very free. I really suggest to look at those items before you select a framework for your scientific effort.

At the end, for being self promoting I list my own ML codes ----

  • KLP_KMEANS - this is a very fast clustering procedure underpinned by Kohonen's Learning Procedure. It includes two alternative with basic Numpy and faster at large data Theano implementations.
  • Random Forests - It is Matlab code based on C++ back-end.
  • Dominant Set Clustering -  A Matlab code implementing very fast graph based clustering formulated by Replicator Dynamics Optimization.

SOME USEFUL MACHINE LEARNING LIBRARIES.的更多相关文章

  1. 17 Great Machine Learning Libraries

    17 Great Machine Learning Libraries 08 October 2013 After wonderful feedback on my previous post on ...

  2. Python -- machine learning, neural network -- PyBrain 机器学习 神经网络

    I am using pybrain on my Linuxmint 13 x86_64 PC. As what it is described: PyBrain is a modular Machi ...

  3. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. Python Tools for Machine Learning

    Python Tools for Machine Learning Python is one of the best programming languages out there, with an ...

  6. Deep Learning Libraries by Language

    Deep Learning Libraries by Language Tweet         Python Theano is a python library for defining and ...

  7. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  8. Machine Learning : Pre-processing features

    from:http://analyticsbot.ml/2016/10/machine-learning-pre-processing-features/ Machine Learning : Pre ...

  9. ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS

    ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS We recently interviewed ...

随机推荐

  1. 【pyQuery分析实例】分析体育网冠军联盟比赛成绩

    目标地址:http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.html liz@nb-liz: ...

  2. 打开了chrome审查元素 发现报错 Uncaught SyntaxError: Unexpected token )

    这个错误并不影响业务处理,但是看到有报错,心里总是不爽. 经过几番查找,发现了原因. <a href="javascript:void()" oncick="onS ...

  3. opencv之深拷贝及浅拷贝,IplImage装换为Mat

    一.(1)  浅拷贝: Mat B; B = image  // 第一种方式 Mat C(image); // 第二种方式 这两种方式称为浅copy,是由于它们有不同的矩阵头,但是它们共享内存空间,即 ...

  4. 【NOIP模拟赛】工资

    工资 [试题描述] 聪哥在暑假参加了打零工的活动,这个活动分为n个工作日,每个工作日的工资为Vi.有m个结算工钱的时间,聪哥可以自由安排这些时间,也就是说什么时候拿钱,老板说的不算,聪哥才有发言权!( ...

  5. UIPage

    分页控件是一种用来取代导航栏的可见指示器,方便手势直接翻页,最典型的应用便是iPhone的主屏幕,当图标过多会自动增加页面,在屏幕底部你会看到原点,用来只是当前页面,并且会随着翻页自动更新. 一.创建 ...

  6. 161027、Java 中的 12 大要素及其他因素

    对于许多人来说,"原生云"和"应用程序的12要素"是同义词.本文的目的是说有很多的原生云只坚持了最初的12个因素.在大多数情况下,Java 能胜任这一任务.在本 ...

  7. html插入视频

    http://www.jb51.net/web/168548.html http://www.w3school.com.cn/html/html_media.asp

  8. Char、AnsiChar、WideChar、PChar、PAnsiChar、PWideChar 的用法

     varc: Char; {Char 类型的取值范围是: #0..#255, 用十六进制表示是: #$0..#$FF}begin{用十进制方式赋值:}c := #65;ShowMessage(c); ...

  9. Microsoft Office 2013 Product Key

    Microsoft Office 2013 Product Key ( Professional Plus ) PGD67-JN23K-JGVWW-KTHP4-GXR9G B9GN2-DXXQC-9D ...

  10. Codefroces Gym 100781A(树上最长路径)

    http://codeforces.com/gym/100781/attachments 题意:有N个点,M条边,问对两两之间的树添加一条边之后,让整棵大树最远的点对之间的距离最近,问这个最近距离是多 ...