Python 根据条件将包含列表的列拆分为多行

Python 根据条件将包含列表的列拆分为多行,python,pandas,Python,Pandas,我有一个如下所示的数据 [(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (date

我有一个如下所示的数据

[(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'), 
(datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "London", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 15, 1, 245619), u'[["Tokyo", "Moscow", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 7, 18, 1, 413066), u'[["Mumbai", "Los Angeles", "IDLE"], ["Mumbai", "Berlin", "IDLE"]]'), 
(datetime.datetime(2021, 2, 6, 7, 17, 1, 154138), u'[]'), 
(datetime.datetime(2021, 2, 6, 7, 16, 1, 253111), u'[]')]
2021-02-06 08:16:01.182888 ["Amsterdam", "Chicago", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Londom", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Berlin", "BUSY"]
它有两列,第一列是
日期
,第二列是
列表的字符串列表

这就是它在熊猫身上的样子

                        date                                             status
0 2021-02-06 08:18:01.212763                   [["London", "New York", "BUSY"]]
1 2021-02-06 08:17:01.018633                      [["Mumbai", "Tokyo", "IDLE"]]
2 2021-02-06 08:16:01.182888  [["Amsterdam", "Chicago", "IDLE"], ["Amsterdam...
3 2021-02-06 08:15:01.245619                      [["Tokyo", "Moscow", "IDLE"]]
4 2021-02-06 07:18:01.413066  [["Mumbai", "Los Angeles", "IDLE"], ["Mumabi",...
5 2021-02-06 07:17:01.154138                                                 []
6 2021-02-06 07:16:01.253111                                                 []
有两个问题

首先,我需要将列表的
字符串列表
转换为
常规列表
,这是通过如下操作实现的

df[column].apply(literal_eval)
第二个问题是一些列值在
列表中包含多个项,我需要拆分它们中的每个项,并创建一个新行,将该值作为独立的
列表来包含。此外,每个列值应转换为
列表
,而不是
列表

例如,我有这个特定的列值

(datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "Londom", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]')
此处,
列表
中的每个项目应形成一个新行,在列中包含该值作为
列表
。像下面这样

[(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'), 
(datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "London", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 15, 1, 245619), u'[["Tokyo", "Moscow", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 7, 18, 1, 413066), u'[["Mumbai", "Los Angeles", "IDLE"], ["Mumbai", "Berlin", "IDLE"]]'), 
(datetime.datetime(2021, 2, 6, 7, 17, 1, 154138), u'[]'), 
(datetime.datetime(2021, 2, 6, 7, 16, 1, 253111), u'[]')]
2021-02-06 08:16:01.182888 ["Amsterdam", "Chicago", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Londom", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Berlin", "BUSY"]
列表中只有一项的任何其他列

(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]')
这应该转换为

2021-02-06 08:18:01.212763 ["London", "New York", "BUSY"]
因此,最终的数据帧应该如下所示

                        date                                             status
0 2021-02-06 08:18:01.212763                   ["London", "New York", "BUSY"]
1 2021-02-06 08:17:01.018633                   ["Mumbai", "Tokyo", "IDLE"]
2 2021-02-06 08:16:01.182888                   ["Amsterdam", "Chicago", "IDLE"]
3 2021-02-06 08:16:01.182888                   ["Amsterdam", "London", "IDLE"]
4 2021-02-06 08:16:01.182888                   ["Amsterdam", "Berlin", "BUSY"]
5 2021-02-06 08:15:01.245619                   ["Tokyo", "Moscow", "IDLE"]
6 2021-02-06 07:18:01.413066                   ["Mumbai", "Los Angeles", "IDLE"]
7 2021-02-06 07:18:01.413066                   ["Mumbai", "Berlin", "IDLE"]
8 2021-02-06 07:17:01.154138                                                 []
9 2021-02-06 07:16:01.253111                                                 []
这就是我到目前为止所做的

import datetime
import pandas as pd
import json
from ast import literal_eval

df = pd.DataFrame(data)
df.columns = ["date", "status"]
df = df[df["status"] != '[]'] # remove empty lists
df['status'] = df['status'].apply(literal_eval) # convert string list of list into regular list

如何执行上面提到的下一组操作?

在使用
literal\u eval
评估列
status
中的字符串后,您可以使用:

对于熊猫版本
>=0.25
,您可以使用
分解

# Explode dataframe
df_out = df.explode('status').reset_index(drop=True)

# fill the NaN with empty lists
df_out['status'] = df_out['status'].dropna().reindex(df_out.index, fill_value=[])
对于pandas版本
<0.25
,由于
explode
不可用,您可以使用
索引复制类似explode的行为。重复
,然后使用
链将嵌套列表展平:

from itertools import chain

l = df['status'].str.len()
m = l > 0

df_out = df.reindex(df[m].index.repeat(l[m]))
df_out['status'] = list(chain(*df.loc[m, 'status']))
df_out = df_out.append(df[~m]).sort_index().reset_index(drop=True)


您好,我目前使用pandas
0.24
,因此我认为我得到了这个错误
“DataFrame”对象没有属性“explode”
。我通常不提及版本号,因为大多数解决方案对我都有效,但我应该提及这一个。很抱歉。这是否适用于熊猫<
0.25
?没问题。将更新解决方案。@SouvikRay检查熊猫编辑版本谢谢!现在它工作了!