Snowflake cloud data platform 用阵列和dict混合的方法横向展平雪管数据
我有两个不同的结构化json文件,它们是从一个雪管中导入的。唯一的区别是它没有嵌套dict,而是有许多嵌套数组。我正在试图找出如何将结构1转换为一个最终确定的表。我已经成功地将结构2转换为一个表,并包含了下面的代码 我知道我需要利用横向展平,但没有成功Snowflake cloud data platform 用阵列和dict混合的方法横向展平雪管数据,snowflake-cloud-data-platform,flatten,lateral,snowflake-task,Snowflake Cloud Data Platform,Flatten,Lateral,Snowflake Task,我有两个不同的结构化json文件,它们是从一个雪管中导入的。唯一的区别是它没有嵌套dict,而是有许多嵌套数组。我正在试图找出如何将结构1转换为一个最终确定的表。我已经成功地将结构2转换为一个表,并包含了下面的代码 我知道我需要利用横向展平,但没有成功 **Structure 1: Nested Arrays (Need help on)** This json lives within a table and in column **JSONTEXT** [ { "ID": "xx
**Structure 1: Nested Arrays (Need help on)**
This json lives within a table and in column **JSONTEXT**
[
{
"ID": "xxx-xxxx-xxxx xxx-xxx",
"caseTypeID": "xx-xxxx-xxxx-xxxxx",
"content": {
"AccountID": "xx-xxxxx-xxxx-xxxx xxxx-xxxxx",
"AccountName": "XXXX",
"Address": {
"pxObjClass": "Data-Address-Postal"
},
"Addresses": [],
"AllKickoffsComplete": "true",
"BillingContactList": [],
"ClientCurrency": "USD",
"ClientID": "XXXXXX",
"ClientNSID": "XXXXXXXX-00",
"ClientName": "XXXXX XXXX Inc.",
"CompanyPhoneNumber": "XXX-XXX-XXXX",
"CrmSearchOrg": "XXXX",
"EEList": [
{
"AccountID": "xxx-xxxxx-xxxx-xxxxx xxxx-xxxxx",
"AccountName": "XXXX",
"AllowanceList": [
{
"AllowanceAmount": "327",
"AllowanceName": "Car Allowance",
"pxObjClass": "xxxxx-xxxxx-xxxxx"
]
结构2:嵌套Dict
此json位于表和列中JSONTEXT
[
{
"OppID": "xxxx-xxxxx",
"pxObjClass": "xx-xxxxx-xxxx-xxxxxx",
"pxPages": {
"EEList": {
"Country": "xxx",
"CountryName": "xxx",
"Currency": "xxx",
"EstimatedICPCost": "xxxxxxxxxxx",
"ICPCurrency": "xxxxx",
"ICPID": "xxxxxxxxx.",
"ICPNSID": "xxxx-xx",
"ICPName": "xxx xx xx.",
"LocalMonthlySalary": "xxxxxx",
"MinFee": "xxxx",
"MonthlyGrossCost": "xxxxx",
"NewOrRepeatCustomer": "xxxxx",
"OppCloseDate": "xxx-xxx-xx",
"OppID": "xxx-xxxx",
"OpportunityName": "xxx - xxx xxx - xxx - xxxx",
"ReferralSource": "xxxxxx",
"pxObjClass": "Index-xx-xxxx-xxxx-xxxxxx",
"pxSubscript": "EEList"
}
},
"pyID": "xxxxxx",
"pzInsKey": "xxxx-xxxx-xxxx xxxxx-xxx"
},
]
下面是我的第二个结构的代码
create or replace table xxxx
as select
value:ID::varchar as ID,
value:caseTypeID::varchar as caseTypeID,
value:content:AccountID::varchar as AccountID,
value:content:AccountName::varchar as AccountName,
value:content:AllKickoffsComplete::boolean as AllKickoffsComplete,
value:content:ClientCurrency::varchar as ClientCurrency,
value:content:ClientID::varchar as ClientID,
value:content:ClientNSID::varchar as ClientNSID,
value:content:ClientName::varchar as ClientName,
value:content:CompanyAddressCountryName::varchar as CompanyAddressCountryName,
value:content:CompanyPhoneNumber::varchar as CompanyPhoneNumber,
value:content:CreateNew::boolean as CreateNew,
value:content:CrmSearchOrg::varchar as CrmSearchOrg,
value:content:EEList:AccountID::varchar as EE_AccountID,
value:content:EEList:AccountName::varchar as EE_AccountName
from new_raw_json,
lateral flatten (input =>jsontext);
这是我尝试过的代码,它只在放入jsontext[n]时有效
select
value:ID::varchar as ID,
value:EEListID::varchar as EEListID,
value:caseTypeID::varchar as caseTypeID
from new_raw_json,
lateral flatten (input => jsontext[0]:content:EEList);
谢谢你的帮助 您可以继续分解为嵌套结构(数组中的数组)
明确定义的方法可能以这种方式出现(此处仅投影一些列,以说明级别):
选择
外部对象。值:caseTypeID作为caseTypeID,
外部对象.value:content.AccountID作为parentAccountID,
eelist_对象。值:AccountID作为eeListAccountID,
容差对象。值:AllowanceName
从…起
新的原始json,
横向展平(输入=>jsontext)外部对象,
横向展平(输入=>外部对象。值:content.EEList)EEList对象,
横向展平(输入=>eelist_对象。值:AllowanceList)余量_对象;
请注意,这只会分解一个已标识的多值路径(List->EEList->AllowanceList
)。问题不清楚是否必须分解所有路径(如List->EEList->Addresses AND AllowanceList
),或者是否可以在最终结果中将部分路径存储为VARIANT
(或其他复杂)类型
例如,如果需要为
EEList
下的Addresses
中列出的每个地址复制AllowanceList
值,这可以通过从两个分解查询结果执行连接来实现(一个链接List->address
,另一个链接List->EEList->AllowanceList
).我认为我的表格第一行的主要问题实际上是一个列表,而不仅仅是一个原始的json。我正在尝试用python重建数据是如何过来的!谢谢你的建议和指导,非常感谢!@Harsh J,你能看一下我的帖子吗?我面临着与雪花相关的问题: