Python 如何取消显示为字符串的列_Python_Pandas

Python 如何取消显示为字符串的列

python pandas

Python 如何取消显示为字符串的列,python,pandas,Python,Pandas,我正在读取一个.parquet文件，其中包含以下字符串列： {"circuitStatus": "CREATED", "startedAt": "2019-02-11T16:07:31.121Z", "event": "CIRCUIT_CREATION"}, {"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", "diff": [], "event": "CIRCUIT_UPDATED"}]} 我想取消此

我正在读取一个.parquet文件，其中包含以下字符串列：

{"circuitStatus": "CREATED", "startedAt": "2019-02-11T16:07:31.121Z",
"event": "CIRCUIT_CREATION"}, 
{"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", 
"diff": [], "event": "CIRCUIT_UPDATED"}]}

我想取消此列的测试，但它失败了，因为它是一个字符串

这是原始数据帧：

这就是我需要它的方式：

我在我的Jupyter笔记本中手动完成了最不寻常的操作：

df =pd.concat([df.drop(['B'], axis=1), df['B'].apply(pd.Series)], axis=1)

但仅当列不是字符串时：

df = pd.DataFrame({'A':'7e1ab727-a9e9-4c00-b6dc-9e65e91b9e4f','B':[{"circuitStatus": "CREATED", "startedAt": "2019-02-11T16:07:31.121Z", "event": "CIRCUIT_CREATION"}, {"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", "diff": [], "event": "CIRCUIT_UPDATED"}]})
df2 = pd.DataFrame({'A':'22222222-a9e9-4c00-b6dc-9e65e91b9e4f','B':[{"circuitStatus": "CREATED",` "startedAt": "2019-02-11T16:07:31.121Z", "event": "CIRCUIT_CREATION"}, {"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", "diff": [], "event": "CIRCUIT_UPDATED"}]})
df3 = pd.concat([df, df2])
df3 =pd.concat([df3.drop(['B'], axis=1), df3['B'].apply(pd.Series)], axis=1)
df3

当我尝试从.parquet读取相同的代码时，它不会抛出错误，但最不重要的是它没有执行

您可以使用

json.loads（）

（），但我建议您在

pandas

中不要执行

parquet

pyspark

可以避免此类问题（）