Python 使用apply（）从两列创建新列_Python_Pandas_Dataframe_Apply

Python 使用apply（）从两列创建新列

python pandas dataframe

Python 使用apply（）从两列创建新列,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我想使用带有数据框的apply（）创建一列s['C'] 我的数据集与此类似： [In]：我需要创建一个s['C']列，其中每行的值是一个列表，其中的1和0取决于a列的单词是否在B列的列表中，以及元素在B列列表中的位置。我的输出应该如下所示： [Out]: A B C 0 hello [all, say, hello] [0, 0, 1] 1 good [good, for, you] [1, 0, 0] 2

我想使用带有数据框的apply（）创建一列

s['C']

我的数据集与此类似：

[In]：

我需要创建一个s['C']列，其中每行的值是一个列表，其中的1和0取决于a列的单词是否在B列的列表中，以及元素在B列列表中的位置。我的输出应该如下所示：

[Out]: 
    A       B                   C
0   hello   [all, say, hello]   [0, 0, 1]
1   good    [good, for, you]    [1, 0, 0]
2   my      [so, hard]          [0, 0]
3   pandas  [pandas]            [1]
4   wrong   []                  [0]

我一直在尝试使用función和apply，但我仍然没有意识到错误在哪里

[In]:
def func(valueA,listB):
  new_list=[]
  for i in listB:
    if listB[i] == valueA:
      new_list.append(1)
    else:
      new_list.append(0)
  return new_list

s['C']=s.apply( lambda x: func(x.loc[:,'A'], x.loc[:,'B']))

错误是：索引器太多

我还尝试了：

[In]:
list=[]
listC=[]
for i in s['A']:
  for j in s['B'][i]:
     if s['A'][i] == s['B'][i][j]:
        list.append(1)
     else:
        list.append(0)
  listC.append(list)

s['C']=listC

错误是：KeyError:'hello'

有什么建议吗

如果您使用的是熊猫0.25+，

explode

是一个选项：

(s.explode('B')
  .assign(C=lambda x: x['A'].eq(x['B']).astype(int))
  .groupby(level=0).agg({'A':'first','B':list,'C':list})
)

输出：

        A                  B          C
0   hello  [all, say, hello]  [0, 0, 1]
1    good   [good, for, you]  [1, 0, 0]
2      my         [so, hard]     [0, 0]
3  pandas           [pandas]        [1]
4   wrong              [nan]        [0]

        A                  B                     C
0   hello  [all, say, hello]  [False, False, True]
1    good   [good, for, you]  [True, False, False]
2      my         [so, hard]        [False, False]
3  pandas           [pandas]                [True]
4   wrong                 []                   [0]

选项2：根据您的逻辑，您可以进行列表理解。这适用于任何版本的

pandas

：

s['C'] = [[x==a for x in b] if b else [0] for a,b in zip(s['A'],s['B'])]

输出：

        A                  B          C
0   hello  [all, say, hello]  [0, 0, 1]
1    good   [good, for, you]  [1, 0, 0]
2      my         [so, hard]     [0, 0]
3  pandas           [pandas]        [1]
4   wrong              [nan]        [0]

        A                  B                     C
0   hello  [all, say, hello]  [False, False, True]
1    good   [good, for, you]  [True, False, False]
2      my         [so, hard]        [False, False]
3  pandas           [pandas]                [True]
4   wrong                 []                   [0]

另一种需要

numpy

以便于索引的方法：

import numpy as np

def create_vector(word, vector):

    out = np.zeros(len(vector))
    indices = [i for i, x in enumerate(vector) if x == word]
    out[indices] = 1

    return out.astype(int)


s['C'] = s.apply(lambda x: create_vector(x.A, x.B), axis=1)

# Output
#      A        B                   C
# 0    hello    [all, say, hello]   [0, 0, 1]
# 1    good     [good, for, you]    [1, 0, 0]
# 2    my       [so, hard]          [0, 0]
# 3    pandas   [pandas]            [1]
# 4    wrong    []                  []

我可以让您的函数进行一些小的更改：

def func(valueA, listB):
    new_list = []
    for i in range(len(listB)): #I changed your in listB with len(listB)
        if listB[i] == valueA:
            new_list.append(1)
        else:
            new_list.append(0)
    return new_list

并将参数

axis=1

添加到应用函数中

s['C'] = s.apply(lambda x: func(x.A, x.B), axis=1)

使用

应用

将

s['c']=s.apply（λx:[int（x.A==i）表示x.B中的i]，轴=1）
s
A、B、c
0你好[大家说你好][0,0,1]
1 good[good，for，you][1,0,0]
2我的[so，hard][0,0]
3只熊猫[熊猫][1]
4错误[][]

是否需要这些列表？您可以使用一个多索引来组织它，其中第一级是原始索引，第二级是列表索引。然后所有这些操作都变得更加有效。@ALollz，你说的很有趣。你有什么例子吗？我的github用户名是Ignacio Ibarra，谢谢