Python 基于DataFrame中其他列的条件创建新列

Python 基于DataFrame中其他列的条件创建新列,python,pandas,dataframe,mapping,Python,Pandas,Dataframe,Mapping,我有这个数据框: +------+--------------+------------+ | ID | Education | Score | +------+--------------+------------+ | 1 | High School | 7.884 | | 2 | Bachelors | 6.952 | | 3 | High School | 8.185 | |

我有这个数据框:

+------+--------------+------------+
| ID   | Education    |      Score | 
+------+--------------+------------+
|    1 |  High School |      7.884 |     
|    2 |  Bachelors   |      6.952 |     
|    3 |  High School |      8.185 |   
|    4 |  High School |      6.556 | 
|    5 |  Bachelors   |      6.347 | 
|    6 |  Master      |      6.794 |   
+------+--------------+------------+
我想创建一个新的列,对分数列进行分类。我想给它贴上‘坏’、‘好’、‘非常好’的标签

可能是这样的:

+------+--------------+------------+------------+
| ID   | Education    |      Score | Labels     |
+------+--------------+------------+------------+
|    1 |  High School |      7.884 | Good       |
|    2 |  Bachelors   |      6.952 | Bad        |
|    3 |  High School |      8.185 | Very good  |   
|    4 |  High School |      6.556 | Bad        |
|    5 |  Bachelors   |      6.347 | Bad        |
|    6 |  Master      |      6.794 | Bad        |
+------+--------------+------------+------------+
我该怎么做


提前感谢

我想这是您希望映射到标签的分数。 您可以定义一个映射函数,将分数作为输入,然后返回标签:

def map_score(score):
  if score >= 8:
    return "Very good"
  elif score >= 7:
    return "Good"
  else:
    return "Bad"

df["Labels"] = df["Score"].apply(lambda score: map_score(score))
将熊猫作为pd导入
#初始化列表列表
数据=[[1,'高中',7.884],[2,'学士',6.952],[3,'高中',8.185],[4,'高中',6.556],[5,'学士',6.347],[6,'硕士',6.794]]
#创建数据帧
df=pd.DataFrame(数据,列=['ID','Education','Score'])

df['Labels']=['Bad'if x这是我的解决方案。我尽量避免使用
if else
并使解决方案更加灵活

其主要思想是创建带有最小值和最大值的
标签的
数据框
,然后为每个分数值找到正确的标签

代码:

import pandas as pd


class Label(object):
    name = ''
    min = 0
    max = 100

    def __init__(self, name, min, max):
        self.name = name
        self.min = min
        self.max = max

    def data(self):
        return [self.name, self.min, self.max]


class Labels:
    labels = [
        Label('Bad', 0, 7).data(),
        Label('Good', 7, 8).data(),
        Label('Very good', 8, 100).data()]

    labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])

    def get_label(score):
        lbs = Labels.labels_df
        tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
        return tlab.Label.values[0]


class edu:
    hs = 'High School'
    b = 'Bachelors'
    m = 'Master'


df = pd.DataFrame({
        'ID': range(6),
        'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
        'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})

df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)

print(df)
   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad

只是一个提示:
df['labels']=np。选择([df['Score']
   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad