使用Pig解析XML时获取空值

使用Pig解析XML时获取空值,xml,apache-pig,Xml,Apache Pig,在使用PIG解析xml文件时,我得到了空值作为输出。 下面是我要解析的xml文件 此XML文件似乎没有任何与之关联的样式信息。文档树如下所示 <PhysicalProgress> <row> <State_Name>Andhra Pradesh</State_Name> <District_Name>ADILABAD</District_Name> <Pr

在使用PIG解析xml文件时,我得到了空值作为输出。 下面是我要解析的xml文件

此XML文件似乎没有任何与之关联的样式信息。文档树如下所示

    <PhysicalProgress>
    <row>
        <State_Name>Andhra Pradesh</State_Name>
        <District_Name>ADILABAD</District_Name>
        <Project_Objectives_IHHL_BPL>247475</Project_Objectives_IHHL_BPL>
        <Project_Objectives_IHHL_APL>148181</Project_Objectives_IHHL_APL>
        <Project_Objectives_IHHL_TOTAL>395656</Project_Objectives_IHHL_TOTAL>
        <Project_Objectives_SCW>0</Project_Objectives_SCW>
        <Project_Objectives_School_Toilets>4462</Project_Objectives_School_Toilets>
        <Project_Objectives_Anganwadi_Toilets>427</Project_Objectives_Anganwadi_Toilets>
        <Project_Objectives_RSM>10</Project_Objectives_RSM>
        <Project_Objectives_PC>0</Project_Objectives_PC>
        <Project_Performance-IHHL_BPL>176300</Project_Performance-IHHL_BPL>
        <Project_Performance-IHHL_APL>52431</Project_Performance-IHHL_APL>
        <Project_Performance-IHHL_TOTAL>228731</Project_Performance-IHHL_TOTAL>
        <Project_Performance-SCW>0</Project_Performance-SCW>
        <Project_Performance-School_Toilets>4462</Project_Performance-School_Toilets>
        <Project_Performance-Anganwadi_Toilets>427</Project_Performance-Anganwadi_Toilets>
        <Project_Performance-RSM>0</Project_Performance-RSM>
        <Project_Performance-PC>0</Project_Performance-PC>
    </row>
    <row>
        <State_Name>Andhra Pradesh</State_Name>
        <District_Name>ANANTAPUR</District_Name>
        <Project_Objectives_IHHL_BPL>363314</Project_Objectives_IHHL_BPL>
        <Project_Objectives_IHHL_APL>181335</Project_Objectives_IHHL_APL>
        <Project_Objectives_IHHL_TOTAL>544649</Project_Objectives_IHHL_TOTAL>
        <Project_Objectives_SCW>0</Project_Objectives_SCW>
        <Project_Objectives_School_Toilets>3421</Project_Objectives_School_Toilets>
        <Project_Objectives_Anganwadi_Toilets>284</Project_Objectives_Anganwadi_Toilets>
        <Project_Objectives_RSM>10</Project_Objectives_RSM>
        <Project_Objectives_PC>0</Project_Objectives_PC>
        <Project_Performance-IHHL_BPL>366557</Project_Performance-IHHL_BPL>
        <Project_Performance-IHHL_APL>42000</Project_Performance-IHHL_APL>
        <Project_Performance-IHHL_TOTAL>408557</Project_Performance-IHHL_TOTAL>
        <Project_Performance-SCW>0</Project_Performance-SCW>
        <Project_Performance-School_Toilets>4258</Project_Performance-School_Toilets>
        <Project_Performance-Anganwadi_Toilets>302</Project_Performance-Anganwadi_Toilets>
        <Project_Performance-RSM>0</Project_Performance-RSM>
        <Project_Performance-PC>0</Project_Performance-PC>
    </row>
    <row>
        <State_Name>Andhra Pradesh</State_Name>
        <District_Name>CHITTOOR</District_Name>
        <Project_Objectives_IHHL_BPL>296465</Project_Objectives_IHHL_BPL>
        <Project_Objectives_IHHL_APL>236986</Project_Objectives_IHHL_APL>
        <Project_Objectives_IHHL_TOTAL>533451</Project_Objectives_IHHL_TOTAL>
        <Project_Objectives_SCW>0</Project_Objectives_SCW>
        <Project_Objectives_School_Toilets>8171</Project_Objectives_School_Toilets>
        <Project_Objectives_Anganwadi_Toilets>375</Project_Objectives_Anganwadi_Toilets>
        <Project_Objectives_RSM>10</Project_Objectives_RSM>
        <Project_Objectives_PC>0</Project_Objectives_PC>
        <Project_Performance-IHHL_BPL>269750</Project_Performance-IHHL_BPL>
        <Project_Performance-IHHL_APL>190905</Project_Performance-IHHL_APL>
        <Project_Performance-IHHL_TOTAL>460655</Project_Performance-IHHL_TOTAL>
        <Project_Performance-SCW>0</Project_Performance-SCW>
        <Project_Performance-School_Toilets>8171</Project_Performance-School_Toilets>
        <Project_Performance-Anganwadi_Toilets>375</Project_Performance-Anganwadi_Toilets>
        <Project_Performance-RSM>11</Project_Performance-RSM>
        <Project_Performance-PC>0</Project_Performance-PC>
    </row>
</PhysicalProgress>

安得拉邦
阿迪拉巴德
247475
148181
395656
0
4462
427
10
0
176300
52431
228731
0
4462
427
0
0
安得拉邦
阿南塔普尔
363314
181335
544649
0
3421
284
10
0
366557
42000
408557
0
4258
302
0
0
安得拉邦
奇托
296465
236986
533451
0
8171
375
10
0
269750
190905
460655
0
8171
375
11
0
我的猪票是:

 1. REGISTER '/home/training/pig_xml.jar'

 2. xml_input_data = load '/home/training/project/StatewiseDistrictwisePhysicalProgress.xml' using pig.XML.newloader('row') as (x:chararray);

`3. data_from_xml_ip = foreach xml_input_data GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<row>\\s*<State_Name>(.*)</State_Name>\\s*<District_Name>(.*)</District_Name>\\s*<Project_Objectives_IHHL_BPL>(.*)</Project_Objectives_IHHL_BPL>\\s*<Project_Objectives_IHHL_APL>(.*)</Project_Objectives_IHHL_APL>\\s*<Project_Objectives_IHHL_TOTAL>(.*)</Project_Objectives_IHHL_TOTAL>\\s*<Project_Objectives_SCW>(.*)</Project_Objectives_SCW>\\s*<Project_Objectives_School_Toilets>(.*)</Project_Objectives_School_Toilets>\\s*<Project_Objectives_Anganwadi_Toilets>(.*)</Project_Objectives_Anganwadi_Toilets>\\s*<Project_Objectives_RSM>(.*)</Project_Objectives_RSM>\\s*<Project_Objectives_PC>(.*)</Project_Objectives_PC>\\s*<Project_Performance-IHHL_BPL>(.*)</Project_Performance-IIHL_BPL>\\s*<Project_Performance-IHHL_APL>(.*)</Project_Performance-IHHL_APL>\\s*<Project_Performance-IHHL_TOTAL>(.*)</Project_Performance-IHHL_TOTAL>//s*<Project_Performance-SCW>(.*)</Project_Performance-SCW>\\s*<Project_performance-School_Toilets>(.*)</Project_Performance-School_Toilets>\\s*<Project_Performance-Anganwadi_Toilets>(.*)</Project_Performance-Anganwadi_Toilets>\\s*<Project_Performance-RSM>(.*)</Project_Performance-RSM>\\s*<Project_Performance-PC>(.*)</Project_Performance-PC>\\s*</row>'));`"
1。注册“/home/training/pig_xml.jar”

2.xml_input_data=load'/home/training/project/statewiseDistrictionWisePhysicalProgress.xml',使用pig.xml.newloader('row')作为(x:chararray); `3.数据从xml\U ip=foreach xml\U input\U数据生成扁平化(正则表达式提取\u ALL(x,'\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(.*)\\s*(*)\\s*(*)\\s*(*)\\s*(*)\\s*(*)\\s*)\\s*(*`"

您收到了什么错误消息?请格式化您的pig脚本以便更好地阅读。我没有收到任何错误,但是得到了空输出…比如()()().xml\u input\u data=load'/home/training/project/statewiseDistrictionWisePhysicalProgress.xml',使用pig.xml.newloader('row')作为(x:chararray)我已经添加了脚本的图像…你能检查一下吗