Python &引用；在；易变性问题_Python_Pandas

Python &引用；在；易变性问题

python pandas

Python &引用；在；易变性问题,python,pandas,Python,Pandas,这是超基本的，但我需要一只手我有一个如下所示的数据帧： name yes no 0 'a' ('b',) 1 'b' ('a',) 2 'c' ('a', 'b') def score(x): if x[0] in x[1] == True: return 1 if x[0] in x[2] == True: return 0 else:

这是超基本的，但我需要一只手

我有一个如下所示的数据帧：

   name     yes      no
0  'a'     ('b',)
1  'b'     ('a',)
2  'c'              ('a', 'b')

def score(x):
    if x[0] in x[1] == True:   
        return 1
    if x[0] in x[2] == True:  
        return 0
    else:
        []

sh['label']= sh.apply(score, axis=1)

我试着这样给数据打分：

   name     yes      no
0  'a'     ('b',)
1  'b'     ('a',)
2  'c'              ('a', 'b')

def score(x):
    if x[0] in x[1] == True:   
        return 1
    if x[0] in x[2] == True:  
        return 0
    else:
        []

sh['label']= sh.apply(score, axis=1)

在第二个if语句（而不是第一个）中，我得到了这个错误

TypeError: ("argument of type 'float' is not iterable", 'occurred at index 1')

它似乎对一个项的元组没有问题，但不喜欢两个项的元组

如何修复它？

问题在于

NaN

，因此一种可能的解决方案是首先使用列

名称中没有的某个值：
#if need select by position use iloc
def score(x):
    print (x)
    if x.iloc[0] in x.iloc[1]:   
        return 1
    elif x.iloc[0] in x.iloc[2]:  
        return 0

sh['label']= sh.fillna('tmp').apply(score, axis=1)

print(sh)


样本：
sh = pd.DataFrame({
    'name': ['b','b','b'],
    'yes': [('b',),('a',),np.nan],
    'no':[np.nan, np.nan, ('a','b')]        
})
print(sh)
  name      no   yes
0    b     NaN  (b,)
1    b     NaN  (a,)
2    b  (a, b)   NaN

def score(x):
    #print (x)
    if x['name'] in x['yes']:   
        return 1
    elif x['name'] in x['no']:  
        return 0

sh['label']= sh.fillna('tmp').apply(score, axis=1)
print(sh)
  name      no   yes  label
0    b     NaN  (b,)    1.0
1    b     NaN  (a,)    NaN
2    b  (a, b)   NaN    0.0

但上面代码的问题是，如果值同时出现在yes
和no
两列中。一种可能的解决方案是使用布尔值True
和False
创建2
新列，然后通过astype
转换为int
（1
，0
）：
sh = pd.DataFrame({
    'name': ['b','b','b'],
    'yes': [('b',),('a',),np.nan],
    'no':[np.nan, ('b',), ('a','b')]        
})
print(sh)
  name      no   yes
0    b     NaN  (b,)
1    b    (b,)  (a,)
2    b  (a, b)   NaN

sh['label-yes']= sh.fillna('tmp').apply(lambda x: x['name'] in x['yes'], axis=1)
sh['label-no']= sh.fillna('tmp').apply(lambda x: x['name'] in x['no'], axis=1)
sh[['label-yes', 'label-no']] = sh[['label-yes', 'label-no']].astype(int)
print(sh)
  name      no   yes  label-yes  label-no
0    b     NaN  (b,)          1         0
1    b    (b,)  (a,)          0         1
2    b  (a, b)   NaN          0         1

期望的输出是什么？为什么==True
？…x
可能不包含您期望的内容。将print（x[2]）
放在第二个if
之前。