Text pig拉丁语-使用文本限定符加载
我正试图加载一个用pig拉丁语编写的数据文件, 数据有2列,但第2列中有一个文本限定符,示例数据如下:Text pig拉丁语-使用文本限定符加载,text,load,apache-pig,qualifiers,Text,Load,Apache Pig,Qualifiers,我正试图加载一个用pig拉丁语编写的数据文件, 数据有2列,但第2列中有一个文本限定符,示例数据如下: DEVICE_ID,SUPPORTED_TECH a2334,"GSM900,GSM1500,GSM200" a54623,"GSM900,GSM1500" a86646,"GSM1500,GSM200" 当我尝试按如下方式加载日期时,第2列不会被识别为1列 deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200"
当我尝试按如下方式加载日期时,第2列不会被识别为1列
deviceList = load 'deviceList.csv' Using PigStorage(',') as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
加载数据集时如何定义文本限定符?试试这个,如果需要不同的输出格式,请告诉我 input.txt
DEVICE_ID,SUPPORTED_TECH
a2334,"GSM900,GSM1500,GSM200"
a54623,"GSM900,GSM1500"
a86646,"GSM1500,GSM200
PigScript:
A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;
(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")
输出:
A = LOAD 'input.txt' AS line;
deviceList = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\w+),(.*)$')) as (DEVICE_ID:chararray, SUPPORTED_TECH:chararray );
DUMP deviceList;
(DEVICE_ID,SUPPORTED_TECH)
(a2334,"GSM900,GSM1500,GSM200")
(a54623,"GSM900,GSM1500")
(a86646,"GSM1500,GSM200")
您期望的输出是什么?thnx对于答案,这适用于2列,但我的原始文件有625列。您能在不定义每列的情况下重新返回任何内容吗?