Hadoop 如何使用Ubuntu终端运行MapReduce程序?

Hadoop 如何使用Ubuntu终端运行MapReduce程序?,hadoop,mapreduce,Hadoop,Mapreduce,我的hadoop路径是/usr/local/hadoop,jar与Java7一起包含在/usr/local/hadoop/share中。 请帮我解决这个问题 JAVA_HOME=/ust/lib/jvm/jdk-7-amd64最近,我使用以下步骤通过终端执行它。我的系统是Ubuntu 14.04 LTS follow this step.. Compilation Process for MapReduce By Kamalakar Thakare: --> STEP 1. start

我的hadoop路径是
/usr/local/hadoop
,jar与Java7一起包含在
/usr/local/hadoop/share
中。 请帮我解决这个问题
JAVA_HOME=/ust/lib/jvm/jdk-7-amd64

最近,我使用以下步骤通过终端执行它。我的系统是Ubuntu 14.04 LTS

follow this step..

Compilation Process for MapReduce By Kamalakar Thakare:

--> STEP 1. start hadoop.

$ start-all.sh

--> STEP 2. Check all components of Hadoop whether it is ready or not.

$ jps

--> STEP 3. Assuming environment variables are set as follows:

export JAVA_HOME=/usr/java/default          <comment : Dont worry if you have other version of java instead of default.>
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar  <comment: this is MOST IMPORTANT tool file. Make sure you have. If you didnt find it                                dont worry its having different location on your PC.>

--> STEP 4. Yepppiii...now copy the code of to the home directory. Make one note 'Its not nessesory to store our code onto HDFS file'.

--> STEP 5. Now its time to compile our main code. Fire below command

$ javac -classpath <hadooop-core.jar file> -d <Your New Directory>/ <sourceCode.java>

Meaning of this command :
*Its simply compile your Java source file that is sourceCode.java.
*Required <hadoop-core.jar file must contain all libraries mention in your source code. Here I suggest you some file version and their location address.

http://www.java2s.com/Code/Jar/h/Downloadhadoop0201devcorejar.htm

in this link at below you get download link. its name is hadoop-0.20.1-dev-core.jar.zip. Download it and extract it. It generate one 'jar' file. Which is Most Important while compiling. In above command <hadooop-core.jar file> file is this generated .jar file.

* -d option create a directory for you and store all class file into it.

--> STEP 6. Mapreduce code consist of three main component 1. Mapper class 2. Driver Class 3. Reducer Class.
so its focusable that we create one jar file which contains three component's class defination.

so fire below command to generate jar file.

$ jar -cvf <File you have to create> -C <Directory you have obtained in previous command> .

* Remember at the last dot '.' is must its stands for all contains.
* option -c for create new archive
  option -v for generate verbose output on standard output
  option -f for  specify archive file name


for example..

$ javac -classpath hadoop-0.20.1-dev-core.jar -d LineCount/ LineCount.java  : we create LineCount/ directory here.
$ jar -cvf LineCount.jar -C LineCount/ .                    : here LineCount.jar is our jar file which creating here and                                            LineCount/ is my directory.


-->STEP 7. Now its tym to run your code on hadoop framework.
make sure you put your input files on your hdfs alredy. If not then add them using

$ hadoop fs -put <source file path> /input


-->STEP 8. Now run your program using ur Jar file.

$ hadoop jar <your jar file> <directory name without /> /input/<your file name> /output/<output file name>

for example..

if my jar file is test.jar,
directory I was created is test/
my input file is /input/a.txt
and I want entire output on output/test then my command will be.

$ hadoop jar test.jar test /input/a.txt /output/test

--> STEP 9. Wow your so lucky that upto now you crosses thousand of error bridge where others programmers are still stuck.

after successfully completion of your program /output directory create two files for you.

one is _SUCCESS for completion and programs log information.
second one is part-r-00000 which is context file containing respective output.

read it using..

$ hadoop fs -cat /output/<your file>/part-r-00000


IMPORTANT NOTES :

1.  If you get auxService error while creating job then make sure your yarn that is resource manager must contain auxilliary services configuration. If its not then add following piece of line to your yarn-site.xml file.

Its location is.. /usr/local/hadoop/etc/hadoop

copy this..and paste to yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

2. If your get error for Job.getInstance while running code over hadoop. Its just because hadoop cannot create job instance on that moment for you so simply replace your jobInstance statement with 

Job job = new Job(configurationObject,"Job Dummy Name");


References:
https://dataheads.wordpress.com/2013/11/21/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-1/
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework
https://sites.google.com/site/hadoopandhive/home/how-to-run-and-compile-a-hadoop-program
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
遵循此步骤。。
Kamalakar Thakare编写MapReduce的过程:
-->第一步。启动hadoop。
$start-all.sh
-->第二步。检查Hadoop的所有组件是否准备就绪。
$jps
-->第三步。假设环境变量设置如下:
导出JAVA\u HOME=/usr/JAVA/default
导出路径=${JAVA_HOME}/bin:${PATH}
导出HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
-->第四步。Yepppiii…现在将的代码复制到主目录。请注意“没有必要将代码存储到HDFS文件中”。
-->第五步。现在是时候编译我们的主代码了。命令下开火
$javac-classpath-d/
此命令的含义:
*它只是编译您的Java源文件,即sourceCode.Java。
*所需步骤6。Mapreduce代码由三个主要组件组成1。制图员2级。三级司机。减速器类。
因此,我们可以创建一个jar文件,其中包含三个组件的类定义。
所以激发下面的命令来生成jar文件。
$jar-cvf-C。
*记住,在最后一点“.”必须代表所有内容。
*用于创建新存档的选项-c
选项-v用于在标准输出上生成详细输出
选项-f用于指定存档文件名
例如
$javac-classpath hadoop-0.20.1-dev-core.jar-d LineCount/LineCount.java:我们在这里创建LineCount/目录。
$jar-cvf LineCount.jar-C LineCount/:这里LineCount.jar是我们的jar文件,它在这里创建,LineCount/是我的目录。
-->第七步。现在是在hadoop框架上运行代码的时候了。
确保将输入文件放在hdfs上。如果没有,则使用
$hadoop fs-put/input
-->第八步。现在使用ur Jar文件运行您的程序。
$hadoop jar/输入//输出/
例如
如果我的jar文件是test.jar,
我创建的目录是test/
我的输入文件是/input/a.txt
我希望在输出/测试中获得全部输出,然后我的命令将被执行。
$hadoop jar test.jar test/input/a.txt/output/test
-->第九步。哇,你真幸运,到现在为止,你已经跨越了数千个错误之桥,而其他程序员仍然陷在其中。
成功完成程序/输出目录后,为您创建两个文件。
一个是成功完成和程序日志信息。
第二个是第r-00000部分,它是包含各自输出的上下文文件。
用..读它。。
$hadoop fs-cat/output//part-r-00000
重要提示:
1.若在创建作业时出现auxService错误,那个么请确保资源管理器必须包含辅助服务配置。如果不是,则将下面的一行添加到您的warn-site.xml文件中。
它的位置是/usr/local/hadoop/etc/hadoop
复制此..并粘贴到warn-site.xml
纱线.节点管理器.辅助服务
mapreduce_shuffle
warn.nodemanager.aux-services.mapreduce\u shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
2.如果在hadoop上运行代码时Job.getInstance出现错误。这只是因为hadoop当时无法为您创建job实例,所以只需将jobInstance语句替换为
作业作业=新作业(configurationObject,“作业虚拟名称”);
参考资料:
https://dataheads.wordpress.com/2013/11/21/hadoop-2-setup-on-64-bit-ubuntu-12-04-part-1/
https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-count-number-of-lines-in-a-file-using-map-reduce-framework
https://sites.google.com/site/hadoopandhive/home/how-to-run-and-compile-a-hadoop-program
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

您肯定给出了一个详细信息!但您可以按照以下步骤执行jar文件:

1-在bashrc中添加依赖项:

export HADOOP_PREFIX=/path/to/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin
export CLASSPATH=$CLASSPATH:$HADOOP_PREFIX/*:.
2-从/bin运行以下命令:

hadoop jar /path/to/jar/jar-name name.of.the.driver.class.in.jar <input-path> <output-path>
hadoop jar/path/to/jar/jar name.of.the.driver.class.in.jar
如果您共享自己的系统命令会更好。
希望这会有所帮助。

如果您显示您使用的命令和实际出现的错误,这会有所帮助?您面临的问题是什么?请避免一行文字的问题。为问题提供足够的信息。在您的情况下,您需要提供用于执行的命令以及您得到的错误。