Replace 清管器更换产生错误

Replace 清管器更换产生错误,replace,apache-pig,Replace,Apache Pig,假设我的文件名为“data”,如下所示: 2724 1919 2012-11-18T23:57:56.000Z {(33.80981975),(-118.105289)} 2703 6401 2012-11-18T23:57:56.000Z {(55.83525609),(-4.07733138)} 1200 4015 2012-11-18T23:57:56.000Z {(41.49609152),(13.8411998)} 7104

假设我的文件名为“data”,如下所示:

2724    1919    2012-11-18T23:57:56.000Z    {(33.80981975),(-118.105289)}
2703    6401    2012-11-18T23:57:56.000Z    {(55.83525609),(-4.07733138)}
1200    4015    2012-11-18T23:57:56.000Z    {(41.49609152),(13.8411998)}
7104    9227    2012-11-18T23:57:56.000Z    {(-24.95351118),(-53.46538723)}
2343234{23.8375,-2.339921102}{(343.34333,-20.0000022)}5-23-2013-11-am

我需要将第二个字段转换为一对坐标数。因此,我编写了以下代码并将其命名为basic.pig:

A = LOAD 'data' AS (f1:int, f2:chararray, f3:chararray. f4:chararray);

B = foreach A generate STRSPLIT(f2,',').$0 as f5, STRSPLIT(f2,',').$1 as f6;

C = foreach B generate REPLACE(f5,'{',' ') as f7, REPLACE(f6,'}',' ') as f8;
然后使用(float)将字符串转换为float。但是,命令“REPLACE”无法工作,我得到以下错误:

-bash-3.2$ pig -x local basic.pig 


2013-06-24 16:38:45,030 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled 

Mar 22 2013, 02:13:53 2013-06-24 16:38:45,031 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/--/p/--test/pig_1372117125028.log

2013-06-24 16:38:45,321 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/isl/pmahboubi/.pigbootup not found

2013-06-24 16:38:45,425 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///

2013-06-24 16:38:46,069 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 7, column 0.  Encountered: <EOF> after : ""

Details at logfile: /home/--/p/--test/pig_1372117125028.log
-bash-3.2$pig-x local basic.pig
2013-06-24 16:38:45030[main]INFO org.apache.pig.main-已编译apache pig版本0.11.1(r1459641)
2013年3月22日02:13:53 2013-06-24 16:38:45031[main]INFO org.apache.pig.main-将错误消息记录到:/home/--/p/--test/pig1372117125028.log
2013-06-24 16:38:45321[main]INFO org.apache.pig.impl.util.Utils-未找到默认启动文件/home/isl/pmahboubi/.pigbootup
2013-06-24 16:38:45425[main]INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine-连接到hadoop文件系统的位置:file:///
2013-06-24 16:38:46069[main]ERROR org.apache.pig.tools.grunt.grunt-ERROR 1000:解析过程中出错。第7行第0列出现词法错误。遇到的问题:在“之后”
详细信息见日志文件:/home/--/p/--test/pig_1372117125028.log
这是清管器日志的详细信息

Pig Stack Trace
---------------
ERROR 1000: Error during parsing. Lexical error at line 7, column 0.  Encountered: <EOF> after : ""

org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 7, column 0.  Encountered: <EOF> after : ""
    at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3266)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1134)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:104)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:604)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
================================================================================
清管器堆栈跟踪
---------------
错误1000:解析期间出错。第7行第0列出现词法错误。遇到的问题:在“之后”
org.apache.pig.tools.pigscript.parser.TokenMgrError:第7行第0列的词法错误。遇到的问题:在“之后”
位于org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3266)
位于org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1134)
位于org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:104)
位于org.apache.pig.tools.grunt.GruntParser.parsetoponerror(GruntParser.java:194)
位于org.apache.pig.tools.grunt.GruntParser.parsetoponerror(GruntParser.java:170)
位于org.apache.pig.tools.grunt.grunt.exec(grunt.java:84)
位于org.apache.pig.Main.run(Main.java:604)
位于org.apache.pig.Main.Main(Main.java:157)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)中
位于java.lang.reflect.Method.invoke(Method.java:597)
位于org.apache.hadoop.util.RunJar.main(RunJar.java:197)
================================================================================

我得到了如下数据:

2724    1919    2012-11-18T23:57:56.000Z    {(33.80981975),(-118.105289)}
2703    6401    2012-11-18T23:57:56.000Z    {(55.83525609),(-4.07733138)}
1200    4015    2012-11-18T23:57:56.000Z    {(41.49609152),(13.8411998)}
7104    9227    2012-11-18T23:57:56.000Z    {(-24.95351118),(-53.46538723)}
我可以做到:

A = LOAD 'my_tsv_data' USING PigStorage('\t') AS (id1:int, id2:int, date:chararray, loc:chararray);
B = FOREACH A GENERATE REPLACE(loc,'\\{|\\}|\\(|\\)','');                                                                                                 
C = LIMIT B 10;                                                                                                                                           
DUMP C;
这个错误

ERROR 1000: Error during parsing. Lexical error at line 7, column 0.  Encountered: <EOF> after : ""
ERROR 1000:解析过程中出错。第7行第0列出现词法错误。遇到的问题:在“之后”
来找我是因为我用了不同类型的引号。我以“开始,以”或“结束”,花了很长时间才发现哪里出了问题。因此,它与第7行无关(我的脚本不太长,我将数据缩短为四行,这自然没有帮助),与第0列无关,与数据的EOF无关,与我没有使用的“标记”几乎没有任何关系。因此,错误信息非常具有误导性


我通过使用grunt-pig命令shell找到了原因。

这个答案救了我一天。这是一个非常误导人的错误消息。