Numpy 数据帧提取字符串_Numpy_Pandas_Dataframe_Text Extraction

Numpy 数据帧提取字符串

numpy pandas dataframe

Numpy 数据帧提取字符串,numpy,pandas,dataframe,text-extraction,Numpy,Pandas,Dataframe,Text Extraction,我的数据框有一个名为“a”的列，它可能包含“apple”和“orange”。我想要的是提取它们（如果它们存在），否则就标记为“其他” 我可以简单地在行上循环并提取它们。然而，我看到了numpy.where的一些用法，用于类似的目的，但只有两个类别 result = numpy.where(df['a'].str.contains('apple'), 'apple', 'others') 是否可以将其应用于3个类别的情况？换句话说，结果应该包含“apple”、“orange”或“others”条

我的数据框有一个名为“a”的列，它可能包含“apple”和“orange”。我想要的是提取它们（如果它们存在），否则就标记为“其他”

我可以简单地在行上循环并提取它们。然而，我看到了numpy.where的一些用法，用于类似的目的，但只有两个类别

result = numpy.where(df['a'].str.contains('apple'), 'apple', 'others')

是否可以将其应用于3个类别的情况？换句话说，结果应该包含“apple”、“orange”或“others”条目

有没有比简单循环更好的方法呢？

与以下一起使用：

用于：

只需查找带有np.INAD的apple或mango项，即可创建布尔掩码，然后可以将其与np.INAD一起使用，并将其余项设置为其他项。因此，我们会-

df['b'] = np.where(np.in1d(df.a,['apple','orange']),df.a,'others')

如果您可能希望使用那些名称作为较大字符串一部分的字符串，可以使用str.extract，我希望这没问题！然后用np.where，像这样-

strings = df.a.str.extract('(orange|apple)')
df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others')

样本运行-

In [294]: df
Out[294]: 
             a
0  apple-shake
1       orange
2  apple-juice
3        apple
4        mango
5       orange
6       banana

In [295]: strings = df.a.str.extract('(orange|apple)')

In [296]: df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others')

In [297]: df
Out[297]: 
             a       b
0  apple-shake   apple
1       orange  orange
2  apple-juice   apple
3        apple   apple
4        mango  others
5       orange  orange
6       banana  others

只需查找带有np.INAD的apple或mango项，即可创建布尔掩码，然后可以将其与np.INAD一起使用，并将其余项设置为其他项。因此，我们会-

df['b'] = np.where(np.in1d(df.a,['apple','orange']),df.a,'others')

如果您可能希望使用那些名称作为较大字符串一部分的字符串，可以使用str.extract，我希望这没问题！然后用np.where，像这样-

strings = df.a.str.extract('(orange|apple)')
df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others')

样本运行-

In [294]: df
Out[294]: 
             a
0  apple-shake
1       orange
2  apple-juice
3        apple
4        mango
5       orange
6       banana

In [295]: strings = df.a.str.extract('(orange|apple)')

In [296]: df['b'] = np.where(np.in1d(strings,['apple','orange']),strings,'others')

In [297]: df
Out[297]: 
             a       b
0  apple-shake   apple
1       orange  orange
2  apple-juice   apple
3        apple   apple
4        mango  others
5       orange  orange
6       banana  others

我希望结果是三种可能性之一：“苹果”、“橙色”或“其他”。我希望结果是三种可能性之一：“苹果”、“橙色”或“其他”。