Python 基于DataFrame中其他列的条件创建新列
我有这个数据框:Python 基于DataFrame中其他列的条件创建新列,python,pandas,dataframe,mapping,Python,Pandas,Dataframe,Mapping,我有这个数据框: +------+--------------+------------+ | ID | Education | Score | +------+--------------+------------+ | 1 | High School | 7.884 | | 2 | Bachelors | 6.952 | | 3 | High School | 8.185 | |
+------+--------------+------------+
| ID | Education | Score |
+------+--------------+------------+
| 1 | High School | 7.884 |
| 2 | Bachelors | 6.952 |
| 3 | High School | 8.185 |
| 4 | High School | 6.556 |
| 5 | Bachelors | 6.347 |
| 6 | Master | 6.794 |
+------+--------------+------------+
我想创建一个新的列,对分数列进行分类。我想给它贴上‘坏’、‘好’、‘非常好’的标签
可能是这样的:
+------+--------------+------------+------------+
| ID | Education | Score | Labels |
+------+--------------+------------+------------+
| 1 | High School | 7.884 | Good |
| 2 | Bachelors | 6.952 | Bad |
| 3 | High School | 8.185 | Very good |
| 4 | High School | 6.556 | Bad |
| 5 | Bachelors | 6.347 | Bad |
| 6 | Master | 6.794 | Bad |
+------+--------------+------------+------------+
我该怎么做
提前感谢我想这是您希望映射到标签的分数。 您可以定义一个映射函数,将分数作为输入,然后返回标签:
def map_score(score):
if score >= 8:
return "Very good"
elif score >= 7:
return "Good"
else:
return "Bad"
df["Labels"] = df["Score"].apply(lambda score: map_score(score))
将熊猫作为pd导入
#初始化列表列表
数据=[[1,'高中',7.884],[2,'学士',6.952],[3,'高中',8.185],[4,'高中',6.556],[5,'学士',6.347],[6,'硕士',6.794]]
#创建数据帧
df=pd.DataFrame(数据,列=['ID','Education','Score'])
df['Labels']=['Bad'if x这是我的解决方案。我尽量避免使用if else
并使解决方案更加灵活
其主要思想是创建带有最小值和最大值的标签的数据框
,然后为每个分数值找到正确的标签
代码:
import pandas as pd
class Label(object):
name = ''
min = 0
max = 100
def __init__(self, name, min, max):
self.name = name
self.min = min
self.max = max
def data(self):
return [self.name, self.min, self.max]
class Labels:
labels = [
Label('Bad', 0, 7).data(),
Label('Good', 7, 8).data(),
Label('Very good', 8, 100).data()]
labels_df = pd.DataFrame(labels, columns=['Label', 'Min', 'Max'])
def get_label(score):
lbs = Labels.labels_df
tlab = lbs[(lbs.Min <= score) & (lbs.Max > score)]
return tlab.Label.values[0]
class edu:
hs = 'High School'
b = 'Bachelors'
m = 'Master'
df = pd.DataFrame({
'ID': range(6),
'Education': [edu.hs, edu.b, edu.hs, edu.hs, edu.b, edu.m],
'Score': [7.884, 6.952, 8.185, 6.556, 6.347, 6.794]})
df['Label'] = df.apply(lambda row: Labels.get_label(row['Score']), axis=1)
print(df)
ID Education Score Label
0 0 High School 7.884 Good
1 1 Bachelors 6.952 Bad
2 2 High School 8.185 Very good
3 3 High School 6.556 Bad
4 4 Bachelors 6.347 Bad
5 5 Master 6.794 Bad
只是一个提示:df['labels']=np。选择([df['Score']
ID Education Score Label
0 0 High School 7.884 Good
1 1 Bachelors 6.952 Bad
2 2 High School 8.185 Very good
3 3 High School 6.556 Bad
4 4 Bachelors 6.347 Bad
5 5 Master 6.794 Bad