Python 基于DataFrame中其他列的条件创建新列_Python_Pandas_Dataframe_Mapping

Python 基于DataFrame中其他列的条件创建新列

python pandas dataframe mapping

Python 基于DataFrame中其他列的条件创建新列,python,pandas,dataframe,mapping,Python,Pandas,Dataframe,Mapping,我有这个数据框： +------+--------------+------------+ | ID | Education | Score | +------+--------------+------------+ | 1 | High School | 7.884 | | 2 | Bachelors | 6.952 | | 3 | High School | 8.185 | |

我有这个数据框：

+------+--------------+------------+
| ID   | Education    |      Score | 
+------+--------------+------------+
|    1 |  High School |      7.884 |     
|    2 |  Bachelors   |      6.952 |     
|    3 |  High School |      8.185 |   
|    4 |  High School |      6.556 | 
|    5 |  Bachelors   |      6.347 | 
|    6 |  Master      |      6.794 |   
+------+--------------+------------+

我想创建一个新的列，对分数列进行分类。我想给它贴上‘坏’、‘好’、‘非常好’的标签

可能是这样的：

+------+--------------+------------+------------+
| ID   | Education    |      Score | Labels     |
+------+--------------+------------+------------+
|    1 |  High School |      7.884 | Good       |
|    2 |  Bachelors   |      6.952 | Bad        |
|    3 |  High School |      8.185 | Very good  |   
|    4 |  High School |      6.556 | Bad        |
|    5 |  Bachelors   |      6.347 | Bad        |
|    6 |  Master      |      6.794 | Bad        |
+------+--------------+------------+------------+

我该怎么做

提前感谢

我想这是您希望映射到标签的分数。您可以定义一个映射函数，将分数作为输入，然后返回标签：

def map_score(score):
  if score >= 8:
    return "Very good"
  elif score >= 7:
    return "Good"
  else:
    return "Bad"

df["Labels"] = df["Score"].apply(lambda score: map_score(score))

将熊猫作为pd导入
#初始化列表列表
数据=[[1，'高中'，7.884]，[2，'学士'，6.952]，[3，'高中'，8.185]，[4，'高中'，6.556]，[5，'学士'，6.347]，[6，'硕士'，6.794]]
#创建数据帧
df=pd.DataFrame（数据，列=['ID'，'Education'，'Score']）
df['Labels']=['Bad'if x这是我的解决方案。我尽量避免使用if else
并使解决方案更加灵活
其主要思想是创建带有最小值和最大值的标签的数据框
，然后为每个分数值找到正确的标签
代码：
import pandas as pd


class Label(object):
    name = ''
    min = 0
    max = 100

    def __init__(self, name, min, max):
        self.name = name
        self.min = min
        self.max = max

    def data(self):
        return [self.name, self.min, self.max]


class Labels:
    labels = [
        Label('Bad', 0, 7).data(),
        Label('Good', 7, 8).data(),
        Label('Very good', 8, 100).data()]

    labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])

    def get_label(score):
        lbs = Labels.labels_df
        tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
        return tlab.Label.values[0]


class edu:
    hs = 'High School'
    b = 'Bachelors'
    m = 'Master'


df = pd.DataFrame({
        'ID': range(6),
        'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
        'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})

df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)

print(df)

   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad

只是一个提示：df['labels']=np。选择（[df['Score']
   ID    Education  Score      Label
0   0  High School  7.884       Good
1   1    Bachelors  6.952        Bad
2   2  High School  8.185  Very good
3   3  High School  6.556        Bad
4   4    Bachelors  6.347        Bad
5   5       Master  6.794        Bad