1.下载mahout

下载地址:http://mahout.apache.org

我下载的最新版:mahout-distribution-0.9

2.把mahout解压到你想存放的文档,我是放在/Users/jia/Documents/hadoop-0.20.2,即hadoop的安装目录上。

3.为mahout配置环境

打开终端,打开profile文件所在的目录

JIAS-MacBook-Pro:~ jia$ open /etc

把profile文件复制到桌面,然后编辑,在它后面加入环境变量

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home
export HADOOP_HOME=Documents/hadoop-0.20.2
export MAHOUT_HOME=Documents/hadoop-0.20.2/mahout-distribution-0.9
export MAVEN_HOME=Documents/apache-maven-3.2.2 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$MAHOUT_HOME/bin export HADOOP_CONF_DIR=Documents/hadoop-0.20.2/conf
export MAHOUT_CONF_DIR=Documents/hadoop-0.20.2/mahout-distribution-0.9/conf export classpath=$classpath:$JAVA_HOME/lib:$MAHOUT_HOME/lib:$HADOOP_CONF_DIR:$MAHOUT_CONF_DIR

然后把桌面上的profile文件覆盖/etc上的profile,期间要输入管理员密码

注意:

1.如果在ubuntu下安装的是hadoop2.6进行配置的话,路径为:
export JAVA_HOME=/usr/lib/jvm/java-1.7.-openjdk-amd64
export HADOOP_HOME=/home/sendi/hadoop-2.6.
export MAHOUT_HOME=/home/sendi/mahout-distribution-0.9
export MAVEN_HOME=/home/sendi/apache-maven-3.3. export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$MAHOUT_HOME/bin export HADOOP_CONF_DIR=/home/sendi/hadoop-2.6./etc/hadoop
export MAHOUT_CONF_DIR=/home/sendi/mahout-distribution-0.9/conf export classpath=$classpath:$JAVA_HOME/lib:$MAHOUT_HOME/lib:$HADOOP_CONF_DIR:$MAHOUT_CONF_DIR

2.配置MAHOU_CONF_DIR时有些网站说时export MAHOUT_CONF_DIR=Documents/hadoop-0.20.2/mahout-distribution-0.9/src/conf

 0.9版本的正确配置是:export MAHOUT_CONF_DIR=Documents/hadoop-0.20.2/mahout-distribution-0.9/conf ,因为当你打开mahout文件夹时,发现没有src这个目录
mahout官网上0.9版本有几个压缩文件,我自己试过,前面两个小压缩文件不行。

这里我选择的是第5个78M的。


4.检验mahout是否配置成功

4.1启动hadoop

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/start-all.sh 

4.2查看mahout

JIAS-MacBook-Pro:mahout-distribution-0.9 jia$ bin/mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
An example program must be given as the first argument.
Valid program names are:
arff.vector: : Generate Vectors from an ARFF file or directory
baumwelch: : Baum-Welch algorithm for unsupervised HMM training
canopy: : Canopy clustering
cat: : Print a file or resource as the logistic regression models would see it
cleansvd: : Cleanup and verification of SVD output
clusterdump: : Dump cluster output to text
clusterpp: : Groups Clustering Output In Clusters
cmdump: : Dump confusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix
cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes
fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering
kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index
lucene2seq: : Generate Text SequenceFiles from a Lucene index
matrixdump: : Dump matrix in CSV format
matrixmult: : Take the product of two matrices
parallelALS: : ALS-WR factorization of a rating matrix
qualcluster: : Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorization of a rating matrix
recommenditembased: : Compute recommendations using item-based collaborative filtering
regexconverter: : Convert text files on a per line basis based on regular expressions
resplit: : Splits a set of SequenceFiles into a number of equal splits
rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model
runlogistic: : Run a logistic regression model against CSV data
seq2encoded: : Encoded Sparse Vector generation from Text sequence files
seq2sparse: : Sparse Vector generation from Text sequence files
seqdirectory: : Generate sequence files (of Text) from a directory
seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets
splitDataset: : split a rating dataset into training and probe parts
ssvd: : Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : Lanczos Singular Value Decomposition
testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic: : Train a logistic regression using stochastic gradient descent
trainnb: : Train the Vector-based Bayes classifier
transpose: : Take the transpose of a matrix
validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors
vectordump: : Dump vectors from a sequence file to text
viterbi: : Viterbi decoding of hidden states from given output states sequence

这里需要说明下,当你看到下面的代码时,以为是错的,其实不是,原因:

MAHOUT_LOCAL:设置是否本地运行,如果设置这个参数就不会运行hadoop了,一旦设置这个参数,那HADOOP_CONF_DIR 和HADOOP_HOME 这两个参数的

设置就自动失效了。

当初我就在这个问题上纠结了很久。

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally

5.运行mahout的算法

5.1到下面的地址去下载测试数据

http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

5.2创建测试目录testdata,并把数据导入到这个testdata目录中

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -mkdir testdata

5.3把测试数据上传到hdfs上,不能把测试数据存在mac上用pages建立的文档上,而是建立一个新的文件命令:touch data

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -put workspace/data testdata/

5.4运行mahout上的kmeans算法

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop jar mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

5.5查看结果

JIAS-MacBook-Pro:~ jia$ cd Documents/hadoop-0.20.2/
JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop fs -ls output/
Found 15 items
-rwxrwxrwx 1 jia staff 194 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/_policy
drwxr-xr-x - jia staff 136 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusteredPoints
drwxr-xr-x - jia staff 544 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-0
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-1
drwxr-xr-x - jia staff 204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-10-final
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-2
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-3
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-4
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-5
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-6
drwxr-xr-x - jia staff 204 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/clusters-7
drwxr-xr-x - jia staff 204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-8
drwxr-xr-x - jia staff 204 2014-08-03 14:42 /Users/jia/Documents/hadoop-0.20.2/output/clusters-9
drwxr-xr-x - jia staff 136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/data
drwxr-xr-x - jia staff 136 2014-08-03 14:41 /Users/jia/Documents/hadoop-0.20.2/output/random-seeds

 

mahout安装配置的更多相关文章

  1. Mahout 安装配置

    http://log.medcl.net/item/2011/02/mahout_install/ Apache Mahout是一个机器学习的框架,构建在hadoop上支持大规模数据集的处理,目前最新 ...

  2. mahout 安装测试

    1 下载 在http://archive.apache.org/dist/mahout下载相应版本的mahout 版本,获取官网查看http://mahout.apache.org 相关的信息

  3. Mahout安装部署

    0x01 简介 Mahout 是一套具有可扩充能力的机器学习类库.它提供机器学习框架的同时,还实现了一些可扩展的机器学习领域经典算法的实现,可以帮助开发人员更加方便快捷地创建智能应用程序.通过和 Ap ...

  4. NO.2 安装配置

    检测当前系统下的jdk安装情况: [root@Centos 桌面]# rpm -qa | grep java tzdata-java-2012j-1.el6.noarch java-1.7.0-ope ...

  5. Hive安装配置指北(含Hive Metastore详解)

    个人主页: http://www.linbingdong.com 本文介绍Hive安装配置的整个过程,包括MySQL.Hive及Metastore的安装配置,并分析了Metastore三种配置方式的区 ...

  6. Hive on Spark安装配置详解(都是坑啊)

    个人主页:http://www.linbingdong.com 简书地址:http://www.jianshu.com/p/a7f75b868568 简介 本文主要记录如何安装配置Hive on Sp ...

  7. ADFS3.0与SharePoint2013安装配置(原创)

    现在越来越多的企业使用ADFS作为单点登录,我希望今天的内容能帮助大家了解如何配置ADFS和SharePoint 2013.安装配置SharePoint2013这块就不做具体描述了,今天主要讲一下怎么 ...

  8. Hadoop的学习--安装配置与使用

    安装配置 系统:Ubuntu14.04 java:1.7.0_75 相关资料 官网 下载地址 官网文档 安装 我们需要关闭掉防火墙,命令如下: sudo ufw disable 下载2.6.5的版本, ...

  9. redis的安装配置

    主要讲下redis的安装配置,以及以服务的方式启动redis 1.下载最新版本的redis-3.0.7  到http://redis.io/download中下载最新版的redis-3.0.7 下载后 ...

随机推荐

  1. 洛谷 P1195 口袋的天空

    题目背景 小杉坐在教室里,透过口袋一样的窗户看口袋一样的天空. 有很多云飘在那里,看起来很漂亮,小杉想摘下那样美的几朵云,做成棉花糖. 题目描述 给你云朵的个数N,再给你M个关系,表示哪些云朵可以连在 ...

  2. OpenJudge 666:放苹果

    总时间限制: 1000ms 内存限制: 65536kB 描述 把M个同样的苹果放在N个同样的盘子里,允许有的盘子空着不放,问共有多少种不同的分法?(用K表示)5,1,1和1,5,1 是同一种分法. 输 ...

  3. Assembly(程序集) 反射和缓存

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...

  4. Silverlight 中DataGrid中全选与非全选问题

    问题:当点击全选时,全选所有的复选框,但是滚动屏幕时,却复选框就会取消选中 一.解决方法(将要展示的实体数据模型添加bool属性,在数据绑定时添加click时间,盘带选中的状态,就可以了) 1. xa ...

  5. C#中Split函数的使用

    Split函数 描述 :返回一个下标从零开始的一维数组,它包含指定数目的子字符串. 语法 :Split(expression[,   delimiter[,   count[,   compare]] ...

  6. 使用Animation实现Button的透明度Opacity变化

    接着之前的使Button的Content变化的例子,这里给出使Button的透明度变化的写法. 前台写法: 后台写法: 效果图:Opacity的值正在变小 效果还不错,前台是用Blend生成的,后台的 ...

  7. 关于字符串 “*****AB**C*D*****” 中前缀、后缀和中间 '*' 的处理

    一.删除前缀 '*' #include<iostream> #include<cstdio> using namespace std; //主函数 int main() { ] ...

  8. HDU 1405 第六周 J题

    Description Tomorrow is contest day, Are you all ready?  We have been training for 45 days, and all ...

  9. JMS之开源实现ActiveMQ

    1.ActiveMQ是开源的JMS实现. 可以把不影响用户执行结果又比较耗时的任务(比如发邮件通知管理员)异步的扔给jms 服务端,而尽快的把屏幕返还给用户,且服务端能够多线程排队响应高并发的请求.可 ...

  10. 用Redis bitmap统计活跃用户、留存

    Spool的开发者博客,描述了Spool利用Redis的bitmaps相关的操作,进行网站活跃用户统计工作. 原文:http://blog.getspool.com/2011/11/29/fast-e ...