Azure data factory U-SQL无法从JSON文件提取数据
我试图使用USQL从JSON文件中提取数据。查询成功运行而不生成任何输出数据,或者导致“顶点失败快速错误” JSON文件如下所示:Azure data factory U-SQL无法从JSON文件提取数据,azure-data-factory,azure-data-lake,u-sql,Azure Data Factory,Azure Data Lake,U Sql,我试图使用USQL从JSON文件中提取数据。查询成功运行而不生成任何输出数据,或者导致“顶点失败快速错误” JSON文件如下所示: { “结果”:[ { “名称”:“销售/账户”, “id”:“7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c”, “deviceName”:“HP”, “设备模型”:“g6展馆”, “客户端”:“0.41.4.1” }, { “名称”:“销售/账户”, “id”:“c01efba0-e0d5-11e5-ae20-af6dc1f2c036”,
{
“结果”:[
{
“名称”:“销售/账户”,
“id”:“7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c”,
“deviceName”:“HP”,
“设备模型”:“g6展馆”,
“客户端”:“0.41.4.1”
},
{
“名称”:“销售/账户”,
“id”:“c01efba0-e0d5-11e5-ae20-af6dc1f2c036”,
“deviceName”:“宏碁”,
“deviceModel”:“veriton”,
“客户端”:“10.10.14.36”
}
]
}
我的U-SQL脚本是
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";
DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";
@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@jsonify=SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(results,"name","id","deviceName","deviceModel","clientip") AS rec FROM @trail2;
@logSchema=SELECT rec["name"] AS sysName,
rec["id"] AS sysId,
rec["deviceName"] AS domainDeviceName,
rec["deviceModel"] AS domainDeviceModel,
rec["clientip"] AS domainClientIp
FROM @jsonify;
OUTPUT @logSchema TO @out USING Outputters.Tsv();
萨拉
问题是@trail2输出是json数组“[{…},{…}]”,据我所知,JsonFunction无法解析该数组。所以我把它输出到一个文件中,然后用Inputer重新读取它,Inputer可以解析数组
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";
DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";
DECLARE @mid string="adl://xyz.azuredatalakestore.net/intermediate.txt";
@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
OUTPUT @trail2 TO @mid USING Outputters.Text(quoting:false);
@jsonify=EXTRACT name string,
id string,
deviceName string ,
deviceModel string,
clientip string
FROM @mid USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@logSchema=SELECT name AS sysName,
id AS sysId,
deviceName AS domainDeviceName,
deviceModel AS domainDeviceModel,
clientip AS domainClientIp
FROM @jsonify;
OUTPUT @logSchema TO @out USING Outputters.Tsv();
实际上,
JSONExtractor
支持中表示的rowpath参数,该参数使您能够识别要映射到行中的JSON对象或JSON数组项。因此,您可以从JSON文档中用一条语句提取数据:
@logSchema =
EXTRACT name string, id string, deviceName string, deviceModel string, clientip string
FROM @input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("results[*]");
无需中间文件(实际上需要提交两个作业,因为脚本无法读取它创建的数据),您就可以更高效地执行此操作。请参阅我的备选答案。脚本无法读取它创建的数据,那么他如何对同一资源执行
输出
,然后执行提取
,@mid
??!?!?!