Hadoop 如何将数据从csv映射到嵌套的avro模式
假设我有一个如下的模式Hadoop 如何将数据从csv映射到嵌套的avro模式,hadoop,apache-pig,Hadoop,Apache Pig,假设我有一个如下的模式 { "name": "phoneNumber", "type": { "type": "record", "name": "internalNumber", "namespace": "com.wiki", "fields": [{ "name": "areacode", "type": "string", }, { "name": "pho
{
"name": "phoneNumber",
"type": {
"type": "record",
"name": "internalNumber",
"namespace": "com.wiki",
"fields": [{
"name": "areacode",
"type": "string",
}, {
"name": "phone",
"type": ["null", "string"],
"doc": "Acutal full number",
"default": null
}]
}
}
我有一个csv,它将这些数据分散到多个列中,如:
areaCode phoneNumber
+1 1234512345
如何从pig脚本获取avro文件,如:
"phoneNumber" : {
"areacode" : "+1",
"phone" : "1234512345"
}
因为它是嵌套的
A = LOAD 'path' USING CSVLoader as (areaCode: chararray, phoneNumber: chararray);
B = foreach A generate (areaCode, phoneNumber as phone) as phoneNumber;
STORE B INTO 'path' using AvroStorage;
你需要从储物罐中取出CSV装载机和AVR存储