Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何重新分类数据帧列?_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何重新分类数据帧列?

Python 如何重新分类数据帧列?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个熊猫数据框,看起来像这样: > print(df) image_name tags 0 img1 class1 class2 class3 1 img2 class2 2 img3 class2 class3 3 img

我有一个熊猫数据框,看起来像这样:

> print(df)

           image_name                       tags
0                img1       class1 class2 class3
1                img2                     class2
2                img3              class2 class3
3                img4                     class1
如何对
标记
列进行重新分类,以使任何具有
class3
值的行都被分配字符串“yes”,而其他所有行都被分配字符串“no”

我知道我可以使用以下方法检查搜索词的实例:

df['tags'].str.contains('class3')
然而,我不知道如何将其整合到手头的任务中

以下是预期输出:

           image_name                       tags
0                img1                        yes
1                img2                         no
2                img3                        yes
3                img4                         no
用作:


上述方法的输出:

print(df)
  image_name tags
0       img1  yes
1       img2   no
2       img3  yes
3       img4   no
您还可以执行以下操作:

df['tags'] = df.tags.str.contains('class3').map({True:'Yes',False:'No'})
>>> df
  image_name tags
0       img1  Yes
1       img2   No
2       img3  Yes
3       img4   No

也许这比str.contains要快一点

v=np.array(['Yes','No'])[np.array(['class3' in x for x in df.tags]).astype(int)]
v
Out[267]: array(['No', 'Yes', 'No', 'Yes'], dtype='<U3')
#df['tags']=v
print(df)
  image_name tags
0       img1  yes
1       img2   no
2       img3  yes
3       img4   no
df['tags'] = df.tags.str.contains('class3').map({True:'Yes',False:'No'})
>>> df
  image_name tags
0       img1  Yes
1       img2   No
2       img3  Yes
3       img4   No
v=np.array(['Yes','No'])[np.array(['class3' in x for x in df.tags]).astype(int)]
v
Out[267]: array(['No', 'Yes', 'No', 'Yes'], dtype='<U3')
#df['tags']=v
#df=pd.concat([df]*1000)
#sacul
%timeit df.tags.str.contains('class3').map({True:'Yes',False:'No'})
The slowest run took 10.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.11 ms per loop
#Mine
%timeit np.array(['Yes','No'])[np.array(['class3' in x for x in df.tags]).astype(int)]
1000 loops, best of 3: 390 µs per loop
#Borealis
%timeit np.where(df['tags'].str.contains('class3'),'yes','no')
100 loops, best of 3: 2.46 ms per loop