Hadoop 如何在PIG中导入/加载.csv文件?

Hadoop 如何在PIG中导入/加载.csv文件?,hadoop,apache-pig,bigdata,hadoop-streaming,Hadoop,Apache Pig,Bigdata,Hadoop Streaming,假设有一个文本文件选项卡(datetemp.txt),我想在pig中加载此文本文件进行处理,但当我在下面的行中键入时,它给我的错误如下: grunt>inputfile=load'/training/pig/datetemp.txt',使用PigStorage()作为(EventID:chararray,eventdate:chararray,count:int) grunt>转储输入文件 2014-09-06 08:41:23527[main]INFO org.apache.pig.tools

假设有一个文本文件选项卡(datetemp.txt),我想在pig中加载此文本文件进行处理,但当我在下面的行中键入时,它给我的错误如下:

grunt>inputfile=load'/training/pig/datetemp.txt',使用PigStorage()作为(EventID:chararray,eventdate:chararray,count:int)

grunt>转储输入文件

2014-09-06 08:41:23527[main]INFO org.apache.pig.tools.pigstats.ScriptState-脚本中使用的pig功能:未知 2014-09-06 08:41:23544[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler-文件连接阈值:100乐观?假的 2014-09-06 08:41:23548[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-优化前的MR计划大小:1 2014-09-06 08:41:23548[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer-优化后的MR计划大小:1 2014-09-06 08:41:23551[main]INFO org.apache.pig.tools.pigstats.ScriptState-将pig脚本设置添加到作业中 2014-09-06 08:41:23551[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-mapred.job.reduce.markreset.buffer.percent未设置,设置为默认值0.3 2014-09-06 08:41:23552[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-创建jar文件job27391717857739333.jar 2014-09-06 08:42:39608[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-jar文件job27391717857739333.jar已创建 2014-09-06 08:42:39612[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler-设置单存储作业 2014-09-06 08:42:39619[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-1个map reduce作业正在等待提交。 2014-09-06 08:42:39630[Thread-12]WARN org.apache.hadoop.mapred.JobClient-使用GenericOptionsParser解析参数。应用程序应该为相同的应用程序实现工具。 2014-09-06 08:42:39891[Thread-12]INFO org.apache.hadoop.mapred.JobClient-清理临时区域hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job\u 201408292336\u 0009 2014-09-06 08:42:39891[Thread-12]错误org.apache.hadoop.security.UserGroupInformation-PriviledEdActionException as:培训(身份验证:简单)原因:org.apache.pig.backend.executionengine.ExecutionException:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt 2014-09-06 08:42:40119[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLancher-完成0% 2014-09-06 08:42:40125[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-作业null失败!停止运行所有相关作业 2014-09-06 08:42:40125[main]INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher-100%完成 2014-09-06 08:42:40131[main]错误org.apache.pig.tools.pigstats.SimplePostStats-错误2997:无法从后端重新创建异常错误:org.apache.pig.backend.executionengine.ExecutionException:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt 位于org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285) 位于org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014) 位于org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031) 位于org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172) 位于org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943) 位于org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896) 位于java.security.AccessController.doPrivileged(本机方法) 位于javax.security.auth.Subject.doAs(Subject.java:396) 位于org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) 位于org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896) 位于org.apache.hadoop.mapreduce.Job.submit(Job.java:531) 位于org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318) 位于org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.startReadyJobs(jobcontrol.java:238) 位于org.apache.hadoop.mapreduce.lib.jobcontrol.jobcontrol.run(jobcontrol.java:269) 运行(Thread.java:662) 位于org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.mapreduceLancher$1.run(mapreduceLancher.java:260) 原因:org.apache.hadoop.mapreduce.lib.input.InvalidInputException:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt 位于org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) 位于org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) 位于org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) 位于org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273) ... 还有15个

2014-09-06 08:42:40131[main]错误org.apache.pig.tools.pigstats.PigStatsUtil-1映射减少作业失败! 2014-09-06 08:42:40135[main]INFO org.apache.pig.tools.pigstats.SimplePostStats-脚本统计:

HadoopVersion PigVersion用户ID在FinishedAt功能启动 2.0.0-cdh4.1.1 0.10.0-cdh4.1.1培训2014-09-06 08:41:23 2014-09-06 08:42:40未知

失败了

失败的作业: JobId别名功能消息输出 不适用inputfile MAP_ONLY消息:org.apache.pig.backend.executionengine.ExecuteException:错误2118:输入路径不存在:hdfs://192.168.195.130:8020/training/pig/datetemp.txt 位于org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.ja
hdfs://192.168.195.130:8020/training/pig/datetemp.txt 
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);

grunt> dump inputfile;