Python 如果列包含字符串数组中的字符串，则使用_Python_Pandas_Dataframe

Python 如果列包含字符串数组中的字符串，则使用

python pandas dataframe

Python 如果列包含字符串数组中的字符串，则使用,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个像这样的数据框 Col1 Col2 0 22 Apple 1 43 Carrot 2 54 Orange 3 74 Spinach 4 14 Cucumber 我需要增加一个新的栏目，分类为“水果”、“蔬菜”或“叶子” 我为每个类别创建了一个列表 Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'} Veg =

我有一个像这样的数据框

     Col1     Col2    
0     22     Apple
1     43     Carrot 
2     54     Orange
3     74     Spinach
4     14     Cucumber

我需要增加一个新的栏目，分类为“水果”、“蔬菜”或“叶子” 我为每个类别创建了一个列表

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

结果应该是这样的

    Col1      Col2     Category 
0     22     Apple      Fruit
1     43     Carrot     Vegetable 
2     54     Orange     Fruit
3     74     Spinach    Leaf
4     14     Cucumber   Vegetable

我尝试了np。其中和包含，但这两个函数都给出：“in”需要字符串作为左操作数，而不是set

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)

df['Category'] = df['Col2'].map(d1)
print (df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

或使用：

与新词典一起使用

d1

：

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)

df['Category'] = df['Col2'].map(d1)
print (df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

或使用：

这是因为您没有创建列表，而是创建了一个集合，如错误所示。您可以尝试将set设置为列表，作为

.isin（）的参数。

：

输出：

  Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

这是因为您没有创建列表，而是创建了一个集合，如错误所示。您可以尝试将set设置为列表，作为

.isin（）的参数。

：

输出：

  Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

另一种方法是使用

for循环

：

df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})

Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']

mylist = []
for i in df['Col2']:
    if i in Fruit:
        mylist.append('Fruit')
    elif i in Vegetable:
        mylist.append('Vegetable')
    elif i in Leaf:
        mylist.append('Leaf')

df['Category'] = mylist

print(df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

另一种方法是使用

for循环

：

df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})

Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']

mylist = []
for i in df['Col2']:
    if i in Fruit:
        mylist.append('Fruit')
    elif i in Vegetable:
        mylist.append('Vegetable')
    elif i in Leaf:
        mylist.append('Leaf')

df['Category'] = mylist

print(df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

现在是我们回答问题的时候了！塞利乌斯：0.000997304916381836，耶斯雷尔：0.0009975433349609375走500k+3@CeliusStingher-谢谢，但要测试的行数是多少？我测试了1k、10k、100k，它们的性能几乎相同，但没有那么灵活或健壮。。。但在这种情况下，考虑到它的类别数量很少。。。直接

d1={**dict.fromkeys（Fru，'Fruit'），**dict.fromkeys（Veg，'vegeture'），**dict.fromkeys（Leaf，'Leaf'）}

也是一个选项。有些人可能会发现，从阅读角度看，它在做什么更为明显。。。是的，当然是动态的，这就是答案的伟大之处，现在是我们答案的时候了！塞利乌斯：0.000997304916381836，耶斯雷尔：0.0009975433349609375走500k+3@CeliusStingher-谢谢，但要测试的行数是多少？我测试了1k、10k、100k，它们的性能几乎相同，但没有那么灵活或健壮。。。但在这种情况下，考虑到它的类别数量很少。。。直接

d1={**dict.fromkeys（Fru，'Fruit'），**dict.fromkeys（Veg，'vegeture'），**dict.fromkeys（Leaf，'Leaf'）}

也是一个选项。有些人可能会发现，从阅读角度看，它在做什么更为明显。。。只是把它扔出去。是的，当然它是动态的，这是答案的优点是是的，我认为没有其他可能性，但我会编辑来考虑它，谢谢。是的，你是对的，我只是测试了它，并且

list（）

不是必需的。是的，我认为没有其他可能性，但我会编辑来考虑它，谢谢。是的，你是对的，我刚刚测试过，没有必要使用

list（）

。我喜欢你的方法，我尝试过，但使用了contain（这是我的错误），但原始数据有15个类别中的200多个项目，使用这种方法效率不高。是的，jezrael和Celius Stingher等专家的其他解决方案效率更高，我想确保在我花了一些时间写这段代码时发布我的解决方案：）我喜欢你的方法，我尝试了它，但使用了contain（这是我的错误），但原始数据在15个类别中有200多个项，使用这种方法将不会有效率是的，由专家如jezrael和Celius Stingher提出的其他解决方案更有效，我想确保在花时间编写代码时发布我的解决方案：）