Python json数组拆分为多个数组
我正在阅读api的内容https://myapi.com/allcolor“)转换为数据帧,使用azure databricks笔记本中下面的pyspark代码。当我以某种方式将json内容读入数据帧时,带有一个“colors”数组的json负载被转换为多个数组 代码:Python json数组拆分为多个数组,python,json,pyspark,Python,Json,Pyspark,我正在阅读api的内容https://myapi.com/allcolor“)转换为数据帧,使用azure databricks笔记本中下面的pyspark代码。当我以某种方式将json内容读入数据帧时,带有一个“colors”数组的json负载被转换为多个数组 代码: spark.sql("set spart.databricks.delta.preview.enabled=true") spark.sql("set spart.databricks.delta.
spark.sql("set spart.databricks.delta.preview.enabled=true")
spark.sql("set spart.databricks.delta.retentionDutationCheck.preview.enabled=false")
import json
import requests
from requests.auth import HTTPDigestAuth
import pandas as pd
user = "username"
password = "password"
myResponse = requests.get('https://myapi.com/allcolor', auth=(user, password))
if(myResponse.ok):
jData = json.loads(myResponse.content)
s1 = json.dumps(jData)
#load data from api
x = json.loads(s1)
data = pd.read_json(json.dumps(x))
#create dataframe
spark_df = spark.createDataFrame(data)
spark_df.show()
spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net","<your-storage-account-access-key>")
spark_df.write.mode("overwrite").json("wasbs://<container>@<storage-account-name>.blob.core.windows.net/<directory>/")
else:
myResponse.raise_for_status()
{
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
255,
1
],
"hex": "#000"
}
},
{
"color": "white",
"category": "value",
"code": {
"rgba": [
0,
0,
0,
1
],
"hex": "#FFF"
}
},
{
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
0,
0,
1
],
"hex": "#FF0"
}
},
{
"color": "blue",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
0,
0,
255,
1
],
"hex": "#00F"
}
},
{
"color": "yellow",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
0,
1
],
"hex": "#FF0"
}
},
{
"color": "green",
"category": "hue",
"type": "secondary",
"code": {
"rgba": [
0,
255,
0,
1
],
"hex": "#0F0"
}
}
]
}
{
"colors":
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
255,
1
],
"hex": "#000"
}
}
}
{
"colors":
{
"color": "white",
"category": "value",
"code": {
"rgba": [
0,
0,
0,
1
],
"hex": "#FFF"
}
}
}
结果数据帧:
spark.sql("set spart.databricks.delta.preview.enabled=true")
spark.sql("set spart.databricks.delta.retentionDutationCheck.preview.enabled=false")
import json
import requests
from requests.auth import HTTPDigestAuth
import pandas as pd
user = "username"
password = "password"
myResponse = requests.get('https://myapi.com/allcolor', auth=(user, password))
if(myResponse.ok):
jData = json.loads(myResponse.content)
s1 = json.dumps(jData)
#load data from api
x = json.loads(s1)
data = pd.read_json(json.dumps(x))
#create dataframe
spark_df = spark.createDataFrame(data)
spark_df.show()
spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net","<your-storage-account-access-key>")
spark_df.write.mode("overwrite").json("wasbs://<container>@<storage-account-name>.blob.core.windows.net/<directory>/")
else:
myResponse.raise_for_status()
{
"colors": [
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
255,
1
],
"hex": "#000"
}
},
{
"color": "white",
"category": "value",
"code": {
"rgba": [
0,
0,
0,
1
],
"hex": "#FFF"
}
},
{
"color": "red",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
0,
0,
1
],
"hex": "#FF0"
}
},
{
"color": "blue",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
0,
0,
255,
1
],
"hex": "#00F"
}
},
{
"color": "yellow",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
0,
1
],
"hex": "#FF0"
}
},
{
"color": "green",
"category": "hue",
"type": "secondary",
"code": {
"rgba": [
0,
255,
0,
1
],
"hex": "#0F0"
}
}
]
}
{
"colors":
{
"color": "black",
"category": "hue",
"type": "primary",
"code": {
"rgba": [
255,
255,
255,
1
],
"hex": "#000"
}
}
}
{
"colors":
{
"color": "white",
"category": "value",
"code": {
"rgba": [
0,
0,
0,
1
],
"hex": "#FFF"
}
}
}
json.dumps(x)
看起来像源文件吗?Hi@Kris,Nojson.dumps(x)
有多个数组。jData
与源代码相同。如果jData
正确,那么对我来说pd.DataFrame(jData['colors')
似乎可以工作。Hi@Kris,pd.DataFrame(jData['colors')
仍然给我多个数组。