Python 转换列表列表中的数据帧
我有一个熊猫数据框,格式如下Python 转换列表列表中的数据帧,python,pandas,Python,Pandas,我有一个熊猫数据框,格式如下 df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1], [1, 3, 1, 3, 4, 6, 1], [1, 4, 1, 2, 6, 5, 0], [1, 5, 1, 3, 3, 6, 0], [2, 6, 3, 5, 1, 3, 1], [2, 7
df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1],
[1, 3, 1, 3, 4, 6, 1],
[1, 4, 1, 2, 6, 5, 0],
[1, 5, 1, 3, 3, 6, 0],
[2, 6, 3, 5, 1, 3, 1],
[2, 7, 3, 2, 6, 8, 1],
[2, 1, 3, 1, 0, 4, 1]],
columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label'])
从更直观的角度来看,这就是DataFrame的外观。它有一个个人id
和一个对象id
列。然后是一些列,例如col\u x
,最后是标签
person_id object_id col_1 col_2 col_3 col_4 label
0 1 2 4 5 7 8 1
1 1 3 1 3 4 6 1
2 1 4 1 2 6 5 0
3 1 5 1 3 3 6 0
4 2 6 3 5 1 3 1
5 2 7 3 2 6 8 1
6 2 1 3 1 0 4 1
我想使用一个库中的函数,它需要特定格式的输入。具体地说,我想按person\u id
、object\u id
和label
进行分组,然后创建带有col\u x
的列表和带有标签的常规列表。根据上面的例子,它将是
bags = [
[[4, 5, 7, 8],[1, 3, 4, 6]],
[[1, 2, 6, 5],[1, 3, 3, 6]],
[[3, 5, 1, 3],[3, 2, 6, 8],[3, 1, 0, 4]]
]
labels = [1,0,1]
我现在所做的是在pandas中迭代并动态创建两个新列表。然而,我知道这是不明智的,我正在寻找一种更具python风格、性能更好的方法
我丑陋的解决方案
bags = []
labels = []
uniquePeople = df['person_id'].unique()
predictors = ['col_1','col_2','col_3','col_4']
for unp in uniquePeople:
person = df[ (df['person_id'] == unp) && (df['label'] == 1) ][predictors].values
label = 1
if len(person) > 0:
bags.append(person)
labels.append(label)
person = df[ (df['person_id'] == unp) && (df['label'] == 0) ][predictors].values
label = 0
if len(person) > 0:
bags.append(paper)
labels.append(label)
顺便说一句,我在代码中做了一个繁重的修改,使其适合堆栈溢出。如果你发现那里有什么不对劲,不要麻烦了。其目的是找到更好的,而不是修复丑陋的:P通过
系列的两列使用lambda函数:
predictors = ['col_1','col_2','col_3','col_4']
s = (df.groupby(['person_id','label'], sort=False)[predictors]
.apply(lambda x: x.values.tolist()))
print (s)
person_id label
1 1 [[4, 5, 7, 8], [1, 3, 4, 6]]
0 [[1, 2, 6, 5], [1, 3, 3, 6]]
2 1 [[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]
dtype: object
然后将系列
转换为列表:
bags = s.tolist()
print (bags)
[[[4, 5, 7, 8], [1, 3, 4, 6]],
[[1, 2, 6, 5], [1, 3, 3, 6]],
[[3, 5, 1, 3], [3, 2, 6, 8], [3, 1, 0, 4]]]
以及第二级的多索引
:
不确定这是否是你要找的
import pandas as pd
df = df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1],
[1, 3, 1, 3, 4, 6, 1],
[1, 4, 1, 2, 6, 5, 0],
[1, 5, 1, 3, 3, 6, 0],
[2, 6, 3, 5, 1, 3, 1],
[2, 7, 3, 2, 6, 8, 1],
[2, 1, 3, 1, 0, 4, 1]],
columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label']) # example dataframe
df['cols'] = df[['col_1', 'col_2', 'col_3', 'col_4']].apply(lambda x: list(x), axis=1) # create a new column with col_x as element of a list
tmp = df.groupby(['person_id', 'label'])[['cols']].agg(list) # group by and create list of lists
bags = tmp['cols'].tolist() # unpack
labels = tmp.index.droplevel(0)
疯狂地提高性能!令人惊叹的!总是沉浸在熊猫能取得的成就中。非常感谢。
import pandas as pd
df = df = pd.DataFrame([[1, 2, 4, 5, 7, 8, 1],
[1, 3, 1, 3, 4, 6, 1],
[1, 4, 1, 2, 6, 5, 0],
[1, 5, 1, 3, 3, 6, 0],
[2, 6, 3, 5, 1, 3, 1],
[2, 7, 3, 2, 6, 8, 1],
[2, 1, 3, 1, 0, 4, 1]],
columns=['person_id', 'object_id', 'col_1','col_2','col_3','col_4','label']) # example dataframe
df['cols'] = df[['col_1', 'col_2', 'col_3', 'col_4']].apply(lambda x: list(x), axis=1) # create a new column with col_x as element of a list
tmp = df.groupby(['person_id', 'label'])[['cols']].agg(list) # group by and create list of lists
bags = tmp['cols'].tolist() # unpack
labels = tmp.index.droplevel(0)