Sql 如何在Linux中将postgres的XML输出解析为basex的输入
如何将Postgres的XML输出解析为Linux中Basex的输入?Setup 数据来源: 研究问题(RQ): RQ1:每次人口普查,维也纳共有多少人居住? RQ2:每次人口普查,每个维也纳区有多少人居住 准备 为了回答RQ,选择了postgre DB。坚持俗语“在哪里?” 有一个shell,有一种方法“这段代码展示了一个针对BASH(CLI-Debian/Ubuntu)的简洁解决方案 调味的)。此外,在创建所需文件时,从BASH与postgre进行交互要容易得多 以便进一步处理。关于安装过程,请咨询: 首先使用wget下载文件:Sql 如何在Linux中将postgres的XML输出解析为basex的输入,sql,xml,linux,postgresql,basex,Sql,Xml,Linux,Postgresql,Basex,如何将Postgres的XML输出解析为Linux中Basex的输入?Setup 数据来源: 研究问题(RQ): RQ1:每次人口普查,维也纳共有多少人居住? RQ2:每次人口普查,每个维也纳区有多少人居住 准备 为了回答RQ,选择了postgre DB。坚持俗语“在哪里?” 有一个shell,有一种方法“这段代码展示了一个针对BASH(CLI-Debian/Ubuntu)的简洁解决方案 调味的)。此外,在创建所需文件时,从BASH与postgre进行交互要容易得多 以便进一步处理。关于安装过程
cd /path/to/directory/ ;
wget -O ./vie_101.csv http://www.wien.gv.at/statistik/ogd/vie_101.csv ;
然后查看包含您最喜欢的电子表格计算程序(Libre Office Calc)的文件。
vie_101应采用UTF-8编码,并可能使用半列\;定界符。打开
检查、更改、保存。
为了便于后续处理,需要进行一些重新格式化。首先,创建一个头文件
使用适当的列名。其次,下载的文件是“斩首”(前两行是
删除)和“剪切”(到感兴趣的列中)。最后,它被附加到头文件
echo 'DISTRICT,POPULATION,MALE,FEMALE,DATE' > ./vie.csv ;
declare=$(sed -e 's/,/ INT,/g' ./vie.csv)' INT' ;
sed 's/\;/\,/g' ./vie_101.csv | sed 's/\.//g' | tail -n+3 | cut -d ','
-f4,6-9 >> ./vie.csv ;
Postgre
为了将数据加载到postgres中,首先需要创建一个模式:
echo“createtablevie($declare);”sudo-uppostgrespsql;
为了将数据实际加载到postgres中,请使用先前创建并格式化的文件(vie.csv)
需要复制到超级用户postgres可访问的文件夹中。只有那份拷贝
可以执行命令将数据加载到postgres中。需要注意的是,根权限是
此操作(sudo)所需
XML模式
在创建XML文档之前,我们必须设计文件的结构。我们决定
创建XML模式(schema.xsd)而不是DTD。
我们的模式定义了一个根元素及其子元素,它们是复杂的元素。
该元素可以以任意数量出现。元素的子元素是,
和这5个元素(兄弟)很简单
元素,并且定义的值类型始终为整数
使用Postgre创建XML
由于最终目标是通过xquery回答RQ,因此需要一个xml文件。这个文件
(xml.xml)需要正确格式化和格式良好。作为下一步,将查询转换为xml
命令通过管道传输到postgres-Aqt用于:
-A [aligned mode disable, remove header and + at end of line]
-q [quiet output]
-t [tuples only, removes footer]
echo "select query_to_xml( 'select * from vie order by date asc', true,
false, 'vie' ) ;" | sudo -u postgres psql -Aqt > ./vie_data.xml ;
现在,将带有table_的表的模式导出到_xmlschema()非常重要
这就结束了postgre和BASH中的所有任务。最后一个命令basex可以启动
basexgui
Xquery
使用basex,可以通过以下方式根据模式轻松验证XML文件:
验证:xsd('vie_data.xml','vie_schema.xsd')
可以通过单击以下方式导入XML文件:
file:write('path/to/directory/file_name')。
文件:write(“/path/to/directory/population\u year\u total.xml”,
对于//表/行中的$row
按$日期分组:=$行/日期
按$date升序订购
回来
)
RQ2通过嵌套两个for循环来回答。外部循环按日期分组并返回
每个给定日期的人口总数。内部循环按区域分组,因此返回
人口的小计
file:write( '/path/to/directory/district_year_subtotal.xml',
for $row in //table/row
group by $date:= $row/date
order by $date ascending
return <sub_sum date="{$date}"
population="{sum($row/population)}">{
for $sub_item in $row
group by $district := $sub_item/district
order by $district ascending
return <sub_item district="{$district}"
population="{sum($sub_item/population)}"/>
}</sub_sum>)
file:write('/path/to/directory/district\u year\u subtotal.xml',
对于//表/行中的$row
按$日期分组:=$行/日期
按$date升序订购
返回{
对于$row中的$sub_项
按$地区分组:=$子项目/地区
按$district Ascept订购
回来
})
完成了哦,我看梅的回答有些过时了;然而,我将把它留在这里,因为在我看来,你在回答中描述的方法对于手头的任务来说可能是过分的
我不确定您是否有问题,但我想提出一种更精简的方法;-) 我希望它能帮上一点忙!玩得开心 对于当前用例,您可以扔掉
awk
、sed
、postgres
和wget
,您可以在25行XQuery中完成所有需要的操作:
1) 一些基础知识,从远程服务器获取文件:
fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
2) 跳过第一行。
我决定使用原始文件附带的头文件,但是
fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
因此,总的来说,您的需求可概括为: RQ1.: 问题2:
(: Fetch CSV as Text, split it per line, skip the first line: :)
let $lines := fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Parse the csv file, first line contains element names.:)
let $csv := csv:parse($lines, map { "header": true(), "separator": ";"})
for $record in $csv/csv/record
group by $date := $record/REF_DATE
order by $date ascending
return element year_total {
attribute date { $date },
attribute population { sum($record/POP_TOTAL) => format-number("0000000")},
for $sub_item in $record
group by $per-district := $sub_item/DISTRICT_CODE
return element district {
attribute name { $per-district },
attribute population { sum($sub_item/POP_TOTAL) => format-number("0000000")}
}
}
包括以更可读的方式格式化的文件写入和日期:
(: wrap elements in single root element :)
let $result := element result {
(: Fetch CSV as Text, split it per line, skip the first line: :)
let $lines := fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Parse the csv file, first line contains element names.:)
let $csv := csv:parse($lines, map { "header": true(), "separator": ";"})
for $record in $csv/csv/record
group by $date := $record/REF_DATE
order by $date ascending
return element year_total {
attribute date { $date => replace("^(\d{4})(\d{2})(\d{2})","$3.$2.$1")},
attribute population { sum($record/POP_TOTAL) => format-number("0000000")},
for $sub_item in $record
group by $per-district := $sub_item/DISTRICT_CODE
return element district {
attribute name { $per-district },
attribute population { sum($sub_item/POP_TOTAL) => format-number("0000000")},
$sub_item
}
}
}
return file:write("result.xml", $result)
如果你要发布一个自我回答的问题,至少试着问一个问题。。。你的问题的主体不是让你只给你能想到的每一种产品贴上垃圾邮件标签。
fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Fetch CSV as Text, split it per line, skip the first line: :)
let $lines := fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Parse the csv file, first line contains element names.:)
let $csv := csv:parse($lines, map { "header": true(), "separator": ";"})
for $record in $csv/csv/record
group by $date := $record/REF_DATE
order by $date ascending
return element year_total {
attribute date { $date },
attribute population { sum($record/POP_TOTAL) => format-number("0000000")}
}
(: Fetch CSV as Text, split it per line, skip the first line: :)
let $lines := fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Parse the csv file, first line contains element names.:)
let $csv := csv:parse($lines, map { "header": true(), "separator": ";"})
for $record in $csv/csv/record
group by $date := $record/REF_DATE
order by $date ascending
return element year_total {
attribute date { $date },
attribute population { sum($record/POP_TOTAL) => format-number("0000000")},
for $sub_item in $record
group by $per-district := $sub_item/DISTRICT_CODE
return element district {
attribute name { $per-district },
attribute population { sum($sub_item/POP_TOTAL) => format-number("0000000")}
}
}
(: wrap elements in single root element :)
let $result := element result {
(: Fetch CSV as Text, split it per line, skip the first line: :)
let $lines := fetch:text('https://www.wien.gv.at/statistik/ogd/vie_101.csv')
=> tokenize(out:nl()) (: Split string by newline :)
=> tail() (: Skip first line :)
=> string-join(out:nl()) (: Join strings with newline :)
(: Parse the csv file, first line contains element names.:)
let $csv := csv:parse($lines, map { "header": true(), "separator": ";"})
for $record in $csv/csv/record
group by $date := $record/REF_DATE
order by $date ascending
return element year_total {
attribute date { $date => replace("^(\d{4})(\d{2})(\d{2})","$3.$2.$1")},
attribute population { sum($record/POP_TOTAL) => format-number("0000000")},
for $sub_item in $record
group by $per-district := $sub_item/DISTRICT_CODE
return element district {
attribute name { $per-district },
attribute population { sum($sub_item/POP_TOTAL) => format-number("0000000")},
$sub_item
}
}
}
return file:write("result.xml", $result)