lk@lk-virtual-machine:~$ cd hadoop-1.0.1

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin dfs -mkdir input

bash: ./bin: 是一个文件夹

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop  dfs -mkdir input

14/05/11 21:12:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).

14/05/11 21:12:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).

14/05/11 21:12:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).

14/05/11 21:12:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).

14/05/11 21:12:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).

14/05/11 21:12:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).

14/05/11 21:12:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).

14/05/11 21:12:14 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).

14/05/11 21:12:15 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).

14/05/11 21:12:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).

Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop namenode -format

14/05/11 21:12:48 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = lk-virtual-machine/127.0.1.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.1

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012

************************************************************/

14/05/11 21:12:48 INFO util.GSet: VM type       = 32-bit

14/05/11 21:12:48 INFO util.GSet: 2% max memory = 19.33375 MB

14/05/11 21:12:48 INFO util.GSet: capacity      = 2^22 = 4194304 entries

14/05/11 21:12:48 INFO util.GSet: recommended=4194304, actual=4194304

14/05/11 21:12:50 INFO namenode.FSNamesystem: fsOwner=lk

14/05/11 21:12:50 INFO namenode.FSNamesystem: supergroup=supergroup

14/05/11 21:12:50 INFO namenode.FSNamesystem: isPermissionEnabled=true

14/05/11 21:12:50 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

14/05/11 21:12:50 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

14/05/11 21:12:50 INFO namenode.NameNode: Caching file names occuring more than 10 times

14/05/11 21:12:50 INFO common.Storage: Image file of size 108 saved in 0 seconds.

14/05/11 21:12:50 INFO common.Storage: Storage directory /tmp/hadoop-lk/dfs/name has been successfully formatted.

14/05/11 21:12:50 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at lk-virtual-machine/127.0.1.1

************************************************************/

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop namenode -format

14/05/11 21:13:12 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = lk-virtual-machine/127.0.1.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.1

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012

************************************************************/

Re-format filesystem in /tmp/hadoop-lk/dfs/name ? (Y or N) n

Format aborted in /tmp/hadoop-lk/dfs/name

14/05/11 21:13:21 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at lk-virtual-machine/127.0.1.1

************************************************************/

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/start-all.sh

starting namenode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-namenode-lk-virtual-machine.out

localhost: starting datanode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-datanode-lk-virtual-machine.out

localhost: starting secondarynamenode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-secondarynamenode-lk-virtual-machine.out

starting jobtracker, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-jobtracker-lk-virtual-machine.out

localhost: starting tasktracker, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-tasktracker-lk-virtual-machine.out

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop dfs -mkdir input

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop dfs -put ~/input/file* input

lk@lk-virtual-machine:~/hadoop-1.0.1$ javac -classpath hadoop-core-1.0.1.jar:lib/commons-cli-1.2.jar -d WordCount WordCount.java

javac: 找不到文件: WordCount.java

使用方法: javac <options> <source files>

-help 用于列出可能的选项

lk@lk-virtual-machine:~/hadoop-1.0.1$ javac -classpath hadoop-core-1.0.1.jar:lib/commons-cli-1.2.jar -d WordCount ~/WordCount.java

lk@lk-virtual-machine:~/hadoop-1.0.1$ jar -cvf wordcount.jar -C WordCount .

标明清单(manifest)

添加:wordcount/(读入= 0) (写出= 0)(存储了 0%)

添加:wordcount/WordCount$Map.class(读入= 1765) (写出= 771)(压缩了 56%)

添加:wordcount/WordCount.class(读入= 1808) (写出= 963)(压缩了 46%)

添加:wordcount/WordCount$Reduce.class(读入= 1741) (写出= 738)(压缩了 57%)

lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop jar wordcount.jar wordcount input output

Exception in thread "main" java.lang.ClassNotFoundException: wordcount

    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:266)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$

lk@lk-virtual-machine:~/hadoop-1.0.1$ jps

2598 SecondaryNameNode

2341 DataNode

2693 JobTracker

2950 TaskTracker

3247 Jps

2061 NameNode

lk@lk-virtual-machine:~/hadoop-1.0.1$ cd bin

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Runer.jar wordcount input output

Exception in thread "main" java.lang.ClassNotFoundException: wordcount

    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:266)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Runer.jar WordCount input output

Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/apache/hadoop/examples/WordCount)

    at java.lang.ClassLoader.defineClass1(Native Method)

    at java.lang.ClassLoader.defineClass(ClassLoader.java:634)

    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)

    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)

    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:314)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

    at java.lang.Class.forName0(Native Method)

    at java.lang.Class.forName(Class.java:266)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Run.jar wordcount.WordCount  input output

Exception in thread "main" java.net.UnknownHostException: unknown host: master01

    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)

    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1192)

    at org.apache.hadoop.ipc.Client.call(Client.java:1046)

    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)

    at sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)

    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)

    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)

    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)

    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)

    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)

    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)

    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)

    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)

    at wordcount.WordCount.main(WordCount.java:61)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:616)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Ru.jar wordcount.WordCount  input output

14/05/11 22:43:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

****hdfs://localhost:9000/user/lk/input

14/05/11 22:43:52 INFO input.FileInputFormat: Total input paths to process : 4

14/05/11 22:43:56 INFO mapred.JobClient: Running job: job_201405112114_0001

14/05/11 22:43:57 INFO mapred.JobClient:  map 0% reduce 0%

14/05/11 22:45:45 INFO mapred.JobClient:  map 50% reduce 0%

14/05/11 22:46:59 INFO mapred.JobClient:  map 100% reduce 0%

14/05/11 22:47:02 INFO mapred.JobClient:  map 100% reduce 16%

14/05/11 22:47:05 INFO mapred.JobClient:  map 100% reduce 100%

14/05/11 22:47:33 INFO mapred.JobClient: Job complete: job_201405112114_0001

14/05/11 22:47:34 INFO mapred.JobClient: Counters: 29

14/05/11 22:47:34 INFO mapred.JobClient:   Job Counters

14/05/11 22:47:34 INFO mapred.JobClient:     Launched reduce tasks=1

14/05/11 22:47:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=321173

14/05/11 22:47:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/11 22:47:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/05/11 22:47:34 INFO mapred.JobClient:     Launched map tasks=4

14/05/11 22:47:34 INFO mapred.JobClient:     Data-local map tasks=2

14/05/11 22:47:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=84371

14/05/11 22:47:34 INFO mapred.JobClient:   File Output Format Counters

14/05/11 22:47:34 INFO mapred.JobClient:     Bytes Written=41

14/05/11 22:47:34 INFO mapred.JobClient:   FileSystemCounters

14/05/11 22:47:34 INFO mapred.JobClient:     FILE_BYTES_READ=104

14/05/11 22:47:34 INFO mapred.JobClient:     HDFS_BYTES_READ=480

14/05/11 22:47:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=106420

14/05/11 22:47:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41

14/05/11 22:47:34 INFO mapred.JobClient:   File Input Format Counters

14/05/11 22:47:34 INFO mapred.JobClient:     Bytes Read=50

14/05/11 22:47:34 INFO mapred.JobClient:   Map-Reduce Framework

14/05/11 22:47:34 INFO mapred.JobClient:     Map output materialized bytes=122

14/05/11 22:47:34 INFO mapred.JobClient:     Map input records=2

14/05/11 22:47:34 INFO mapred.JobClient:     Reduce shuffle bytes=122

14/05/11 22:47:34 INFO mapred.JobClient:     Spilled Records=16

14/05/11 22:47:34 INFO mapred.JobClient:     Map output bytes=82

14/05/11 22:47:34 INFO mapred.JobClient:     CPU time spent (ms)=20800

14/05/11 22:47:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=718479360

14/05/11 22:47:34 INFO mapred.JobClient:     Combine input records=0

14/05/11 22:47:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=430

14/05/11 22:47:34 INFO mapred.JobClient:     Reduce input records=8

14/05/11 22:47:34 INFO mapred.JobClient:     Reduce input groups=5

14/05/11 22:47:34 INFO mapred.JobClient:     Combine output records=0

14/05/11 22:47:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=550727680

14/05/11 22:47:34 INFO mapred.JobClient:     Reduce output records=5

14/05/11 22:47:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1881079808

14/05/11 22:47:34 INFO mapred.JobClient:     Map output records=8

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -ls output

Found 3 items

-rw-r--r--   1 lk supergroup          0 2014-05-11 22:47 /user/lk/output/_SUCCESS

drwxr-xr-x   - lk supergroup          0 2014-05-11 22:43 /user/lk/output/_logs

-rw-r--r--   1 lk supergroup         41 2014-05-11 22:47 /user/lk/output/part-r-00000

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -cat output/part-r-00000

Bye    1

Goodbye    1

Hadoop    2

Hello    2

World    2

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/topk.jar topk.TopK  input output

14/05/11 23:00:26 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-lk/mapred/staging/lk/.staging/job_201405112114_0002

14/05/11 23:00:26 ERROR security.UserGroupInformation: PriviledgedActionException as:lk cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists

    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)

    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)

    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:416)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)

    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)

    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)

    at topk.TopK.run(TopK.java:86)

    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

    at topk.TopK.main(TopK.java:90)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:616)

    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -rmr output

Deleted hdfs://localhost:9000/user/lk/output

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/topk.jar topk.TopK  input output

****hdfs://localhost:9000/user/lk/input

14/05/11 23:01:28 INFO input.FileInputFormat: Total input paths to process : 4

14/05/11 23:01:29 INFO mapred.JobClient: Running job: job_201405112114_0003

14/05/11 23:01:30 INFO mapred.JobClient:  map 0% reduce 0%

14/05/11 23:02:32 INFO mapred.JobClient:  map 50% reduce 0%

14/05/11 23:02:34 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_0, Status : FAILED

java.lang.ArrayIndexOutOfBoundsException: 8

    at topk.TopK$MapClass.map(TopK.java:43)

    at topk.TopK$MapClass.map(TopK.java:1)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:416)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

    at org.apache.hadoop.mapred.Child.main(Child.java:249)



14/05/11 23:02:36 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_0, Status : FAILED

java.lang.ArrayIndexOutOfBoundsException: 8

    at topk.TopK$MapClass.map(TopK.java:43)

    at topk.TopK$MapClass.map(TopK.java:1)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:416)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

    at org.apache.hadoop.mapred.Child.main(Child.java:249)



14/05/11 23:02:39 INFO mapred.JobClient:  map 0% reduce 0%

14/05/11 23:03:02 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_1, Status : FAILED

14/05/11 23:03:02 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_1, Status : FAILED

14/05/11 23:03:27 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_2, Status : FAILED

java.lang.ArrayIndexOutOfBoundsException: 8

    at topk.TopK$MapClass.map(TopK.java:43)

    at topk.TopK$MapClass.map(TopK.java:1)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:416)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

    at org.apache.hadoop.mapred.Child.main(Child.java:249)



14/05/11 23:03:29 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_2, Status : FAILED

java.lang.ArrayIndexOutOfBoundsException: 8

    at topk.TopK$MapClass.map(TopK.java:43)

    at topk.TopK$MapClass.map(TopK.java:1)

    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:416)

    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)

    at org.apache.hadoop.mapred.Child.main(Child.java:249)



14/05/11 23:03:59 INFO mapred.JobClient: Job complete: job_201405112114_0003

14/05/11 23:04:00 INFO mapred.JobClient: Counters: 7

14/05/11 23:04:00 INFO mapred.JobClient:   Job Counters

14/05/11 23:04:00 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=255192

14/05/11 23:04:00 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/05/11 23:04:00 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/05/11 23:04:00 INFO mapred.JobClient:     Launched map tasks=8

14/05/11 23:04:00 INFO mapred.JobClient:     Data-local map tasks=8

14/05/11 23:04:00 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

14/05/11 23:04:00 INFO mapred.JobClient:     Failed map tasks=1

lk@lk-virtual-machine:~/hadoop-1.0.1/bin$


package topk;

/**
* Created with IntelliJ IDEA.
* User: Isaac Li
* Date: 12/4/12
* Time: 5:48 PM
* To change this template use File | Settings | File Templates.
*/ import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; import java.io.IOException;
import java.util.TreeMap; //鍒╃敤MapReduce姹傛渶澶у€兼捣閲忔暟鎹腑鐨凨涓暟
public class TopK extends Configured implements Tool { public static class MapClass extends Mapper<LongWritable, Text, NullWritable, Text> {
public static final int K = 1;
private TreeMap<Integer, Text> fatcats = new TreeMap<Integer, Text>();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException { String[] str = value.toString().split(",", -2);
int temp = Integer.parseInt(str[8]);
fatcats.put(temp, value);
if (fatcats.size() > K)
fatcats.remove(fatcats.firstKey());
}
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
for(Text text: fatcats.values()){
context.write(NullWritable.get(), text);
}
}
} public static class Reduce extends Reducer<NullWritable, Text, NullWritable, Text> {
public static final int K = 1;
private TreeMap<Integer, Text> fatcats = new TreeMap<Integer, Text>();
public void reduce(NullWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text val : values) {
String v[] = val.toString().split("\t");
Integer weight = Integer.parseInt(v[1]);
fatcats.put(weight, val);
if (fatcats.size() > K)
fatcats.remove(fatcats.firstKey());
}
for (Text text: fatcats.values())
context.write(NullWritable.get(), text);
}
} public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(conf, "TopK");
job.setJarByClass(TopK.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MapClass.class);
// job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new TopK(), args);
System.exit(res);
} }

hadoop记录topk的更多相关文章

  1. hadoop记录-Hadoop参数汇总

    Hadoop参数汇总 linux参数 以下参数最好优化一下: 文件描述符ulimit -n 用户最大进程 nproc (hbase需要 hbse book) 关闭swap分区 设置合理的预读取缓冲区 ...

  2. hadoop记录-hive常见设置

    分区表 set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;create tabl ...

  3. Hadoop记录-日常运维操作

    1.Active NameNode hang死,未自动切换 #登录当前hang死 Active namenode主机,停止Namenode,触发自动切换.hadoop-daemon.sh stop n ...

  4. Hadoop记录-hdfs转载

    Hadoop 存档 每个文件均按块存储,每个块的元数据存储在namenode的内存中,因此hadoop存储小文件会非常低效.因为大量的小文件会耗尽namenode中的大部分内存.但注意,存储小文件所需 ...

  5. Hadoop记录-hadoop2.x常用端口及定义方法

    Hadoop集群的各部分一般都会使用到多个端口,有些是daemon之间进行交互之用,有些是用于RPC访问以及HTTP访问.而随着Hadoop周边组件的增多,完全记不住哪个端口对应哪个应用,特收集记录如 ...

  6. Hadoop记录-Hadoop NameNode 高可用 (High Availability) 实现解析

    Hadoop NameNode 高可用 (High Availability) 实现解析   NameNode 高可用整体架构概述 在 Hadoop 1.0 时代,Hadoop 的两大核心组件 HDF ...

  7. Hadoop记录-MRv2(Yarn)运行机制

    1.MRv2结构—Yarn模式运行机制 Client---客户端提交任务 ResourceManager---资源管理 ---Scheduler调度器-资源分配Containers ----在Yarn ...

  8. Hadoop记录-hadoop介绍

    1.hadoop是什么? Hadoop 是Apache基金会下一个开源的大数据分布式计算平台,它以分布式文件系统HDFS和MapReduce算法为核心,为用户提供了系统底层细节透明的分布式基础架构. ...

  9. hadoop记录-如何换namenode机器

    namenode机器磁盘IO负载持续承压,造成NAMENODE切换多次及访问异常. 1 初始化新机器1.1 在新器1.1.1.3部署hadoop软件(直接复制standby1.1.1.2节点)1.2 ...

随机推荐

  1. Jmeter性能测试 及压测入门

    Jmeter是一个非常好用的压力测试工具.  Jmeter用来做轻量级的压力测试,非常合适,只需要十几分钟,就能把压力测试需要的脚本写好. 为什么要建立线程组?原因很简单,因为我们要模拟多个线程(用户 ...

  2. 简单Demo的用户登录JSP界面IE、Firefox(chrome) Enter 键提交表单

    <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding= ...

  3. python图片小爬虫

    import re import urllib import os def rename(name): name = name + '.jpg' return name def getHtml(url ...

  4. OpenStack Block Storage安装配置use LVM

    1,storage systems use LVM Ins and configuration Block Storage; apt-get install lvm2; 创建Physical volu ...

  5. python list求交集

    方法一: a=[1,2,3] b=[1,3,4] c=list(set(a).intersection(set(b))) print c #[1,3] 这种方法是先把list转换为set,再用set求 ...

  6. 各种浏览器兼容篡位的css样式写法

    谷歌浏览器的识别 @media screen and (-webkit-min-device-pixel-ratio:0) { height:10px; } IE6特制识别的 *HTML .Searc ...

  7. hdu 4627 The Unsolvable Problem(暴力的搜索)

    Problem Description There are many unsolvable problem in the world.It could be about one or about ze ...

  8. 常用wxPython事件描述

          事件描述 EVT_SIZE 由于用户干预或由程序实现,当一个窗口大小发生改变时发送给窗口. EVT_MOVE 由于用户干预或由程序实现,当一个窗口被移动时发送给窗口. EVT_CLOSE ...

  9. Mac 安装Qt5.1编译出现的错误解决

    错误提示: :-1: 错误:Xcode is not installed in /Volumes/Xcode/Xcode.app/Contents/Developer. Please use xcod ...

  10. 使用Web Application Stress Tool 进行压力测试

    1.在测试客户端机器上启动Web Application Stress Tool,在弹出的“建立新脚本”对话框中选择“Record”按钮: 2.在“Record”参数设置第一步中,所有的checkbo ...