Python 如何使用pandas在数据帧中筛选和创建新列
我试图筛选数据帧的3列,并为3列设置条件,然后返回一个二进制值,如果满足所有条件,则返回1,如果不满足条件,则返回0。下面是一个例子Python 如何使用pandas在数据帧中筛选和创建新列,python,pandas,dataframe,conditional-statements,Python,Pandas,Dataframe,Conditional Statements,我试图筛选数据帧的3列,并为3列设置条件,然后返回一个二进制值,如果满足所有条件,则返回1,如果不满足条件,则返回0。下面是一个例子 data = {'PassengerId': array([2255, 2257, 2258, 2256, 2257, 2258, 2255, 2258, 2257, 2257, 2255, 2255, 2257, 2256, 2257, 2256, 2255, 2258, 2258, 2256, 2256, 2257, 2258,
data = {'PassengerId': array([2255, 2257, 2258, 2256, 2257, 2258, 2255, 2258, 2257, 2257, 2255,
2255, 2257, 2256, 2257, 2256, 2255, 2258, 2258, 2256, 2256, 2257,
2258, 2258, 2257]),
'Pclass': array([3, 2, 2, 2, 4, 3, 3, 4, 3, 1, 1, 1, 1, 2, 4, 3, 1, 2, 4, 3, 2, 3,
1, 1, 2]),
'Age': array([40, 33, 32, 40, 48, 24, 33, 29, 29, 31, 45, 47, 28, 32, 54, 39, 28,
50, 40, 31, 51, 26, 41, 46, 27]),
'SibSp': array([11, 13, 12, 19, 22, 17, 23, 12, 12, 12, 12, 24, 16, 21, 12, 15, 20,
18, 10, 17, 20, 12, 17, 17, 10]),
'Comf' : array([236.66883531, 235.46750709, 235.64574546, 241.16838089,
239.40728836, 239.95592634, 236.67806901, 237.73350635,
238.74497849, 235.17486552, 235.8457374 , 236.85133744,
240.9359547 , 236.27703374, 237.81871052, 241.62788018,
241.29185342, 235.0058136 , 240.69989317, 238.8073828 ,
238.08841364, 236.55259788, 237.58108419, 239.66916186,
241.97479544]),
'Parch': array([232.37686437, 232.39153096, 230.56566556, 232.77980061,
232.19436342, 232.2165835 , 232.28145641, 231.26988217,
230.55287196, 232.26528521, 230.45185855, 230.87525326,
231.38775744, 232.80960083, 232.33105822, 232.65782351,
231.64457366, 230.45225829, 231.05404057, 232.38229998,
232.57354117, 232.08690375, 230.40414215, 230.14361969,
231.40414745]),
'Fare': array([238.80427104, 239.32031287, 238.02212358, 238.40333494,
238.85929097, 239.51666683, 239.87771029, 238.06772515,
238.22734658, 238.54682118, 238.68880278, 239.79658425,
238.2642908 , 239.22884058, 239.84423352, 239.69438831,
238.85871719, 238.64632848, 238.7085097 , 239.5700877 ,
239.06199698, 238.37341378, 239.16126748, 239.01280153,
239.77047796])}
df = pd.DataFrame(data)
我试图为第一行设置一个条件,即如果“Pclass”==1且“Comf”介于“Parch”和“Fare”之间,则创建一个新列“surved”,并分配1,否则分配0
然后对“Pclass”==2,3执行相同的操作
我想对熊猫进行此操作,但是欢迎使用此问题的所有解决方案。只需计算条件并转换为int
类型:
df=pd.DataFrame(data=data)
df=df.assign(幸存的=lambda x:x['Comf']。介于(x['Parch'],x['Fare'])之间。astype(int))
打印(df.to_string())
或使用=
df=pd.DataFrame(data=data)
df['Survived']=df['Comf']。介于(df['Parch'],df['Fare'])之间。aType(int)
打印(df.to_string())
输出:
PassengerId Pclass Age SibSp Comf Parch Fare Survived
0 2255 3 40 11 236.668835 232.376864 238.804271 1
1 2257 2 33 13 235.467507 232.391531 239.320313 1
2 2258 2 32 12 235.645745 230.565666 238.022124 1
3 2256 2 40 19 241.168381 232.779801 238.403335 0
4 2257 4 48 22 239.407288 232.194363 238.859291 0
5 2258 3 24 17 239.955926 232.216584 239.516667 0
6 2255 3 33 23 236.678069 232.281456 239.877710 1
7 2258 4 29 12 237.733506 231.269882 238.067725 1
8 2257 3 29 12 238.744978 230.552872 238.227347 0
9 2257 1 31 12 235.174866 232.265285 238.546821 1
10 2255 1 45 12 235.845737 230.451859 238.688803 1
11 2255 1 47 24 236.851337 230.875253 239.796584 1
12 2257 1 28 16 240.935955 231.387757 238.264291 0
13 2256 2 32 21 236.277034 232.809601 239.228841 1
14 2257 4 54 12 237.818711 232.331058 239.844234 1
15 2256 3 39 15 241.627880 232.657824 239.694388 0
16 2255 1 28 20 241.291853 231.644574 238.858717 0
17 2258 2 50 18 235.005814 230.452258 238.646328 1
18 2258 4 40 10 240.699893 231.054041 238.708510 0
19 2256 3 31 17 238.807383 232.382300 239.570088 1
20 2256 2 51 20 238.088414 232.573541 239.061997 1
21 2257 3 26 12 236.552598 232.086904 238.373414 1
22 2258 1 41 17 237.581084 230.404142 239.161267 1
23 2258 1 46 17 239.669162 230.143620 239.012802 0
24 2257 2 27 10 241.974795 231.404147 239.770478 0
乘客P类年龄SibSp Comf烤肉费幸存
0 2255 3 40 11 236.668835 232.376864 238.804271 1
1 2257 2 33 13 235.467507 232.391531 239.320313 1
2 2258 2 32 12 235.645745 230.565666 238.022124 1
3 2256 2 40 19 241.168381 232.779801 238.403335 0
4 2257 4 48 22 239.407288 232.194363 238.859291 0
5 2258 3 24 17 239.955926 232.216584 239.516667 0
6 2255 3 33 23 236.678069 232.281456 239.877710 1
7 2258 4 29 12 237.733506 231.269882 238.067725 1
8 2257 3 29 12 238.744978 230.552872 238.227347 0
9 2257 1 31 12 235.174866 232.265285 238.546821 1
10 2255 1 45 12 235.845737 230.451859 238.688803 1
11 2255 1 47 24 236.851337 230.875253 239.796584 1
12 2257 1 28 16 240.935955 231.387757 238.264291 0
13 2256 2 32 21 236.277034 232.809601 239.228841 1
14 2257 4 54 12 237.818711 232.331058 239.844234 1
15 2256 3 39 15 241.627880 232.657824 239.694388 0
16 2255 1 28 20 241.291853 231.644574 238.858717 0
17 2258 2 50 18 235.005814 230.452258 238.646328 1
18 2258 4 40 10 240.699893 231.054041 238.708510 0
19 2256 3 31 17 238.807383 232.382300 239.570088 1
20 2256 2 51 20 238.088414 232.573541 239.061997 1
21 2257 3 26 12 236.552598 232.086904 238.373414 1
22 2258 1 41 17 237.581084 230.404142 239.161267 1
23 2258 1 46 17 239.669162 230.143620 239.012802 0
24 2257 2 27 10 241.974795 231.404147 239.770478 0
如果要对所有行执行此操作,而不考虑PClass值,则可以使用
df["Survived"] = df["Comf"].between(df["Parch"], df["Fare"]).astype(int)
df["Survived"] = (df["Pclass"]==1 & df["Comf"].between(df["Parch"], df["Fare"])).astype(int)
但是,如果您想为特定的PClass执行此操作,则可以使用
df["Survived"] = df["Comf"].between(df["Parch"], df["Fare"]).astype(int)
df["Survived"] = (df["Pclass"]==1 & df["Comf"].between(df["Parch"], df["Fare"])).astype(int)
试试这个
步骤
indexesOfTrue=df[(df[“Pclass”]==1)和(df[“Comf”]>df[“Parch”])和(df[“Comf”]
使用loc填充索引
df.loc[indexesOfTrue,“幸存”]=1
填充不真实的索引
df.loc[~df.index.isin(ind),“幸存”]=0
输出
PassengerId Pclass Age SibSp Comf Parch Fare Survived
5 2258 3 24 17 239.955926 232.216584 239.516667 2
6 2255 3 33 23 236.678069 232.281456 239.877710 2
7 2258 4 29 12 237.733506 231.269882 238.067725 2
8 2257 3 29 12 238.744978 230.552872 238.227347 2
9 2257 1 31 12 235.174866 232.265285 238.546821 1
10 2255 1 45 12 235.845737 230.451859 238.688803 1
11 2255 1 47 24 236.851337 230.875253 239.796584 1
12 2257 1 28 16 240.935955 231.387757 238.264291 2
13 2256 2 32 21 236.277034 232.809601 239.228841 2
14 2257 4 54 12 237.818711 232.331058 239.844234 2
第2类、第3类..的结果应该是什么?如果“Comf”在同一列“幸存”中的“Parch”和“Fare”之间,则为布尔值?@AndrejKesely是的,第2、3类的结果。。。如果PClass=2,且“Comf”介于“Parch”和“Fare”分配1之间,则应为1,否则为0,然后移动到下一个PClass,即3,如果PClass=1,且“Comf”介于“Parch”和“Fare”分配1之间,则为0。这是一个行操作,PClass和Comf的条件是按行操作的。我不完全理解。。。PClass只是一列。你能编辑你的问题并把预期的结果放在那里吗?谢谢你的回答。。但是,我想得到第2、3类的结果。。。如果PClass=2,且“Comf”介于“Parch”和“Fare”分配1之间,则其应为1,否则为0,然后移动到下一个PClass,即3,如果PClass 3==1且“Comf”介于“Parch”和“Fare”分配1之间,则为0。PS.这是一个行操作,PClass和Comf的条件按行完成@HenryEckerI已经更新了我的答案并删除了PClass上的过滤器,但我很确定我的一般答案符合您的要求。您能指出输出中我的输出对于您的用例来说是意外的特定行吗?