Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从Pig流媒体加载JSON数据_Python_Json_Stream_Streaming_Apache Pig - Fatal编程技术网

Python 从Pig流媒体加载JSON数据

Python 从Pig流媒体加载JSON数据,python,json,stream,streaming,apache-pig,Python,Json,Stream,Streaming,Apache Pig,我有一个python脚本stream.py,它从stdin读取json行,处理它,然后将json行写入stdout data.json中的输入行示例: 示例输出行: {"user_id":3217,"description":"some text PROCESSED","rating":1.78} 在Pig中,我尝试以这种方式流式处理数据: data = LOAD 'data.json'; DEFINE my_stream `./stream.py` output (stdout USING

我有一个python脚本stream.py,它从stdin读取json行,处理它,然后将json行写入stdout

data.json中的输入行示例:

示例输出行:

{"user_id":3217,"description":"some text PROCESSED","rating":1.78}
在Pig中,我尝试以这种方式流式处理数据:

data = LOAD 'data.json';

DEFINE my_stream `./stream.py` output (stdout USING JsonLoader('user_id:int, description:chararray, rating:float'));
data_streamed = STREAM data THROUGH my_stream;

ratings = FOREACH data_streamed GENERATE rating;
ratings_unique = DISTINCT ratings;

ratings_test = LIMIT ratings_unique 10;
DUMP ratings_test;
当我尝试执行时,会出现以下错误:

pig script failed to validate: java.lang.ClassCastException: class org.apache.pig.builtin.JsonLoader does not implement interface org.apache.pig.StreamToPig
到目前为止,我只看到了两个我希望尽可能避免的解决方案:

将流数据存储到临时文件中,并使用JsonLoader加载。 修改stream.py以编写tsv行而不是json行,这样我就可以使用默认的PigStorage加载它。 可以使用JsonLoader使清管器流媒体工作吗

pig script failed to validate: java.lang.ClassCastException: class org.apache.pig.builtin.JsonLoader does not implement interface org.apache.pig.StreamToPig