Python 根据条件将包含列表的列拆分为多行
我有一个如下所示的数据Python 根据条件将包含列表的列拆分为多行,python,pandas,Python,Pandas,我有一个如下所示的数据 [(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (date
[(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'),
(datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "London", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 15, 1, 245619), u'[["Tokyo", "Moscow", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 7, 18, 1, 413066), u'[["Mumbai", "Los Angeles", "IDLE"], ["Mumbai", "Berlin", "IDLE"]]'),
(datetime.datetime(2021, 2, 6, 7, 17, 1, 154138), u'[]'),
(datetime.datetime(2021, 2, 6, 7, 16, 1, 253111), u'[]')]
2021-02-06 08:16:01.182888 ["Amsterdam", "Chicago", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Londom", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Berlin", "BUSY"]
它有两列,第一列是日期
,第二列是列表的字符串列表
这就是它在熊猫身上的样子
date status
0 2021-02-06 08:18:01.212763 [["London", "New York", "BUSY"]]
1 2021-02-06 08:17:01.018633 [["Mumbai", "Tokyo", "IDLE"]]
2 2021-02-06 08:16:01.182888 [["Amsterdam", "Chicago", "IDLE"], ["Amsterdam...
3 2021-02-06 08:15:01.245619 [["Tokyo", "Moscow", "IDLE"]]
4 2021-02-06 07:18:01.413066 [["Mumbai", "Los Angeles", "IDLE"], ["Mumabi",...
5 2021-02-06 07:17:01.154138 []
6 2021-02-06 07:16:01.253111 []
有两个问题
首先,我需要将列表的字符串列表
转换为常规列表
,这是通过如下操作实现的
df[column].apply(literal_eval)
第二个问题是一些列值在列表中包含多个项,我需要拆分它们中的每个项,并创建一个新行,将该值作为独立的列表来包含。此外,每个列值应转换为列表
,而不是列表
例如,我有这个特定的列值
(datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "Londom", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]')
此处,列表
中的每个项目应形成一个新行,在列中包含该值作为列表
。像下面这样
[(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]'),
(datetime.datetime(2021, 2, 6, 8, 17, 1, 18633), u'[["Mumbai", "Tokyo", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 8, 16, 1, 182888), u'[["Amsterdam", "Chicago", "IDLE"], ["Amsterdam", "London", "IDLE"], ["Amsterdam", "Berlin", "BUSY"]]'), (datetime.datetime(2021, 2, 6, 8, 15, 1, 245619), u'[["Tokyo", "Moscow", "IDLE"]]'), (datetime.datetime(2021, 2, 6, 7, 18, 1, 413066), u'[["Mumbai", "Los Angeles", "IDLE"], ["Mumbai", "Berlin", "IDLE"]]'),
(datetime.datetime(2021, 2, 6, 7, 17, 1, 154138), u'[]'),
(datetime.datetime(2021, 2, 6, 7, 16, 1, 253111), u'[]')]
2021-02-06 08:16:01.182888 ["Amsterdam", "Chicago", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Londom", "IDLE"]
2021-02-06 08:16:01.182888 ["Amsterdam", "Berlin", "BUSY"]
在列表中只有一项的任何其他列
(datetime.datetime(2021, 2, 6, 8, 18, 1, 212763),u'[["London", "New York", "BUSY"]]')
这应该转换为
2021-02-06 08:18:01.212763 ["London", "New York", "BUSY"]
因此,最终的数据帧应该如下所示
date status
0 2021-02-06 08:18:01.212763 ["London", "New York", "BUSY"]
1 2021-02-06 08:17:01.018633 ["Mumbai", "Tokyo", "IDLE"]
2 2021-02-06 08:16:01.182888 ["Amsterdam", "Chicago", "IDLE"]
3 2021-02-06 08:16:01.182888 ["Amsterdam", "London", "IDLE"]
4 2021-02-06 08:16:01.182888 ["Amsterdam", "Berlin", "BUSY"]
5 2021-02-06 08:15:01.245619 ["Tokyo", "Moscow", "IDLE"]
6 2021-02-06 07:18:01.413066 ["Mumbai", "Los Angeles", "IDLE"]
7 2021-02-06 07:18:01.413066 ["Mumbai", "Berlin", "IDLE"]
8 2021-02-06 07:17:01.154138 []
9 2021-02-06 07:16:01.253111 []
这就是我到目前为止所做的
import datetime
import pandas as pd
import json
from ast import literal_eval
df = pd.DataFrame(data)
df.columns = ["date", "status"]
df = df[df["status"] != '[]'] # remove empty lists
df['status'] = df['status'].apply(literal_eval) # convert string list of list into regular list
如何执行上面提到的下一组操作?在使用literal\u eval
评估列status
中的字符串后,您可以使用:
对于熊猫版本>=0.25
,您可以使用分解
:
# Explode dataframe
df_out = df.explode('status').reset_index(drop=True)
# fill the NaN with empty lists
df_out['status'] = df_out['status'].dropna().reindex(df_out.index, fill_value=[])
对于pandas版本<0.25
,由于explode
不可用,您可以使用索引复制类似explode的行为。重复,然后使用链将嵌套列表展平:
from itertools import chain
l = df['status'].str.len()
m = l > 0
df_out = df.reindex(df[m].index.repeat(l[m]))
df_out['status'] = list(chain(*df.loc[m, 'status']))
df_out = df_out.append(df[~m]).sort_index().reset_index(drop=True)
您好,我目前使用pandas0.24
,因此我认为我得到了这个错误“DataFrame”对象没有属性“explode”
。我通常不提及版本号,因为大多数解决方案对我都有效,但我应该提及这一个。很抱歉。这是否适用于熊猫<0.25
?没问题。将更新解决方案。@SouvikRay检查熊猫编辑版本谢谢!现在它工作了!