Python 处理Dataframe列中的列表_Python_Pandas_Numpy_Dataframe

Python 处理Dataframe列中的列表

python pandas numpy dataframe

Python 处理Dataframe列中的列表,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我使用sim\u measure\u I创建了一个数据帧，它也是一个数据帧 neighbours= sim_measure_i.apply(lambda s: s.nlargest(k).index.tolist(), axis =1) 邻居如下所示： 1500 [0, 1, 2, 3, 4] 1501 [0, 1, 2, 3, 4] 1502 [0, 1, 2,

我使用

sim\u measure\u I

创建了一个数据帧，它也是一个数据帧

neighbours= sim_measure_i.apply(lambda s: s.nlargest(k).index.tolist(), axis =1)

邻居

如下所示：

1500                       [0, 1, 2, 3, 4]
1501                       [0, 1, 2, 3, 4]
1502                       [0, 1, 2, 3, 4]
1503     [7230, 12951, 13783, 8000, 18077]
1504                     [1, 3, 6, 27, 47]

这里的第二列有列表——我想迭代这个数据帧并处理这个列表，这样我就可以读取列表中的每个元素——比如说7230，并在另一个包含（id，score）的数据帧中查找7230的分数

然后，我想向这个数据帧添加一列，使其看起来像

test_case_id               nbr_list             scores             
1500                       [0, 1, 2, 3, 4]        [+1, -1, -1, +1, -1]
1501                       [0, 1, 2, 3, 4]        [+1, +1, +1, -1, -1]
1502                       [0, 1, 2, 3, 4]        [+1, +1, +1, -1, -1]
1503     [7230, 12951, 13783, 8000, 18077]        [+1, +1, +1, -1, -1]
1504                     [1, 3, 6, 27, 47]        [+1, +1, +1, -1, -1]

编辑：我写了一个方法

get\u scores（）

当我尝试在每个

nbr\U列表

上使用

lambda

时，我得到以下错误：

TypeError: ("cannot do positional indexing on <class 'pandas.indexes.numeric.Int64Index'> with these indexers [0] of <type 'str'>", u'occurred at index 1500')

您可以尝试嵌套循环：

for i in range(neighbours.shape[0]): #iterate over each row
    for j in range(len(neighbours['neighbours_lists'].iloc[i])): #iterate over each element of the list
        a = neighbours['neighbours_lists'].iloc[i][j] #access the element of the list index j in cell location of row i

其中，

是在每行上迭代的外循环变量，

是在每个单元格内的列表长度上迭代的内循环变量。

原始数据帧：

In [68]: df
Out[68]: 
   test_case_id                   neighbours_lists
0          1500                    [0, 1, 2, 3, 4]
1          1501                    [0, 1, 2, 3, 4]
2          1502                    [0, 1, 2, 3, 4]
3          1503  [7230, 12951, 13783, 8000, 18077]
4          1504                  [1, 3, 6, 27, 47]

自定义函数，它接受id和列表并进行一些计算以评估分数：

In [69]: def g(_id, nbs):
    ...:     return ['-1' if (_id + 1) % (nb + 1) else '+1' for nb in nbs]
    ...:

原始数据框所有行的自定义函数：

In [70]: scores = df.apply(lambda x: g(x.test_case_id, x.neighbours_lists), axis=1)

将分数系列添加到数据框中，并与原始数据框进行比较：

In [71]: df = pd.concat([df, scores.to_frame(name='scores')], 1)

In [72]: df
Out[72]: 
   test_case_id                   neighbours_lists                scores
0          1500                    [0, 1, 2, 3, 4]  [+1, -1, -1, -1, -1]
1          1501                    [0, 1, 2, 3, 4]  [+1, +1, -1, -1, -1]
2          1502                    [0, 1, 2, 3, 4]  [+1, -1, +1, -1, -1]
3          1503  [7230, 12951, 13783, 8000, 18077]  [-1, -1, -1, -1, -1]
4          1504                  [1, 3, 6, 27, 47]  [-1, -1, +1, -1, -1]

假设你从邻居开始，看起来像这样

In [87]: neighbors = pd.DataFrame({'neighbors_list': [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]})

In [88]: neighbors
Out[88]: 
    neighbors_list
0  [0, 1, 2, 3, 4]
1  [0, 1, 2, 3, 4]

您没有确切指定另一个数据帧（包含id分数对）的外观，因此这里是一个近似值

In [89]: id_score = pd.DataFrame({'id': [0, 1, 2, 3, 4], 'score': [1, -1, -1, 1, -1]})

In [90]: id_score
Out[90]: 
   id  score
0   0      1
1   1     -1
2   2     -1
3   3      1
4   4     -1

您可以将其转换为字典：

In [91]: d = id_score.set_index('id')['score'].to_dict()

然后应用：

In [92]: neighbors.neighbors_list.apply(lambda l: [d[e] for e in l])
Out[92]: 
0    [1, -1, -1, 1, -1]
1    [1, -1, -1, 1, -1]
Name: neighbors_list, dtype: object

非常感谢。这对我的案子起了作用，但做了一些小小的修改。谢谢！我不用口述就能做到，但通过你的回答我学会了另一种方法。

In [91]: d = id_score.set_index('id')['score'].to_dict()

In [92]: neighbors.neighbors_list.apply(lambda l: [d[e] for e in l])
Out[92]: 
0    [1, -1, -1, 1, -1]
1    [1, -1, -1, 1, -1]
Name: neighbors_list, dtype: object