Hadoop PIG拉丁语-DUMP命令未显示
我只是尝试使用DUMP显示分组记录的结果,但是没有显示数据,而是有很多日志数据。我只是在玩10张唱片 详情如下:Hadoop PIG拉丁语-DUMP命令未显示,hadoop,apache-pig,cloudera-cdh,Hadoop,Apache Pig,Cloudera Cdh,我只是尝试使用DUMP显示分组记录的结果,但是没有显示数据,而是有很多日志数据。我只是在玩10张唱片 详情如下: grunt> DUMP grouped_records; 2016-02-21 17:34:24,338 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER 2016-02-21 17:34:24,339 [main]
grunt> DUMP grouped_records;
2016-02-21 17:34:24,338 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2016-02-21 17:34:24,339 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2016-02-21 17:34:24,354 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-21 17:34:24,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-21 17:34:24,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-21 17:34:24,434 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:24,440 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2016-02-21 17:34:24,527 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-21 17:34:24,530 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2016-02-21 17:34:24,534 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-02-21 17:34:24,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=142
2016-02-21 17:34:24,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2016-02-21 17:34:25,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job662989067023626482.jar
2016-02-21 17:34:31,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job662989067023626482.jar created
2016-02-21 17:34:31,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2016-02-21 17:34:31,338 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-02-21 17:34:31,549 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-02-21 17:34:31,550 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:34:31,556 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:31,607 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:34:31,918 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:34:31,918 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-21 17:34:31,921 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-02-21 17:34:31,979 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-02-21 17:34:32,092 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1454294818944_0034
2016-02-21 17:34:32,192 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1454294818944_0034
2016-02-21 17:34:32,198 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1454294818944_0034/
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1454294818944_0034
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases filtered_records,grouped_records,records
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R:
2016-02-21 17:34:32,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1454294818944_0034
2016-02-21 17:34:32,428 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-02-21 17:35:02,623 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2016-02-21 17:35:23,469 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-02-21 17:35:23,470 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.5.0 0.12.0-cdh5.5.0 cloudera 2016-02-21 17:34:24 2016-02-21 17:35:23 GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1454294818944_0034 1 1 12 12 12 12 16 16 16 16 filtered_records,grouped_records,records GROUP_BY hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361,
Input(s):
Successfully read 10 records (525 bytes) from: "/user/hduser/input/maxtemppig.tsv"
Output(s):
Successfully stored 0 records in: "hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1454294818944_0034
2016-02-21 17:35:23,646 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-02-21 17:35:23,648 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:35:23,648 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:35:23,649 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-21 17:35:23,660 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:35:23,660 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
我尝试过的命令:
记录=加载“/user/hduser/input/maxtemppig.tsv”为(年份:chararray,温度:int,质量:int);
过滤记录=按(-10,19)中的温度和(0,1,4,5,9)中的质量过滤记录;
转储已过滤的数据记录
分组的_记录=按年份分组过滤的_记录;
转储分组的用户记录
max_temp=FOREACH grouped_records GENERATE group,max(过滤的_records.temperature);
卸载最大温度
我的输入tsv文件
1950 32001459
1951 33 01459
1950 21 01459
1940 24 01459
1950 33 01459
2000 30 01459
2010 44 01459
2014 -10 01459
2016 -20 01459
2011 1901459
我遗漏了什么?解析很可能不起作用,而您正在筛选所有记录 试一试
请用您的文件和PIG代码的其余部分编辑您的问题。有些人可能想要重现问题。请在您的问题下找到相应的“编辑”链接。请不要将注释用于代码。另外,再次添加tsv文件的内容。请删除这些评论和您的问题谢谢您的回复…我将检查解析部分。
records = LOAD '/user/hduser/input/maxtemppig.tsv' USING PigStorage('\t') AS (year:chararray, temperature:int, quality:int);