Python 在新列中转换json内容

Python 在新列中转换json内容,python,python-3.x,pandas,pyspark,Python,Python 3.x,Pandas,Pyspark,我有一个带有半结构化数据的数据集,我需要在其他列的内容列中转换json 数据: 预期结果如下所示: +--------+-----+-------+-------------------+-------------------+-----+--------------+------------------+ |customer|flow |session|first_answer_dt |last_answer_dt |name |cpf |delivery_c

我有一个带有
半结构化数据的数据集
,我需要在其他列的
内容
列中转换
json

数据:

预期结果如下所示:

+--------+-----+-------+-------------------+-------------------+-----+--------------+------------------+
|customer|flow |session|first_answer_dt    |last_answer_dt     |name |cpf           |delivery_confirmed|
+--------+-----+-------+-------------------+-------------------+-----+--------------+------------------+
|C1000   |F1000|S1000  |2019-12-16T13:59:58|2019-12-16T14:00:01|maria|305.584.960-40|sim               |
|C1000   |F1000|S2000  |2019-12-16T13:59:59|2019-12-16T14:00:00|joao |733.600.420-26|não               |
+--------+-----+-------+-------------------+-------------------+-----+--------------+------------------+

我正在互联网上搜索,但很难找到解决这个问题的方法。

IIUC,你可以试试
。加入
pd.Series

#use eval if your json is a string.
df1 = df.join(df['content'].map(eval).apply(pd.Series)).drop('content',axis=1)
#or if not string
df1 = df.join(df['content'].apply(pd.Series)).drop('content',axis=1)
print(df1)
  customer   flow session                 timestamp  name             cpf
0    C1000  F1000   S2000 2019-12-16 13:59:58+00:00                   NaN
1    C1000  F1000   S2000 2019-12-16 13:59:59+00:00  joao             NaN
2    C1000  F1000   S2000 2019-12-16 13:59:59+00:00   NaN  733.600.420-26

IIUC,您可以尝试
.join
pd.Series

#use eval if your json is a string.
df1 = df.join(df['content'].map(eval).apply(pd.Series)).drop('content',axis=1)
#or if not string
df1 = df.join(df['content'].apply(pd.Series)).drop('content',axis=1)
print(df1)
  customer   flow session                 timestamp  name             cpf
0    C1000  F1000   S2000 2019-12-16 13:59:58+00:00                   NaN
1    C1000  F1000   S2000 2019-12-16 13:59:59+00:00  joao             NaN
2    C1000  F1000   S2000 2019-12-16 13:59:59+00:00   NaN  733.600.420-26

嘿,谢谢。但是我不理解eval的功能。@RafaelLima它是将原始字符串转换成object在我的例子中,这个列是一个object。客户对象流对象会话对象时间戳日期时间64[ns,UTC]内容对象数据类型:objectHey,谢谢。但是我不理解eval的功能。@RafaelLima它是将原始字符串转换成object在我的例子中,这个列是一个object。客户对象流对象会话对象时间戳datetime64[ns,UTC]内容对象数据类型:objectrelated:for pysparkrelated:for pyspark