Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用pandas在数据帧中筛选和创建新列_Python_Pandas_Dataframe_Conditional Statements - Fatal编程技术网

Python 如何使用pandas在数据帧中筛选和创建新列

Python 如何使用pandas在数据帧中筛选和创建新列,python,pandas,dataframe,conditional-statements,Python,Pandas,Dataframe,Conditional Statements,我试图筛选数据帧的3列,并为3列设置条件,然后返回一个二进制值,如果满足所有条件,则返回1,如果不满足条件,则返回0。下面是一个例子 data = {'PassengerId': array([2255, 2257, 2258, 2256, 2257, 2258, 2255, 2258, 2257, 2257, 2255, 2255, 2257, 2256, 2257, 2256, 2255, 2258, 2258, 2256, 2256, 2257, 2258,

我试图筛选数据帧的3列,并为3列设置条件,然后返回一个二进制值,如果满足所有条件,则返回1,如果不满足条件,则返回0。下面是一个例子

data = {'PassengerId': array([2255, 2257, 2258, 2256, 2257, 2258, 2255, 2258, 2257, 2257, 2255,
        2255, 2257, 2256, 2257, 2256, 2255, 2258, 2258, 2256, 2256, 2257,
        2258, 2258, 2257]),
 'Pclass': array([3, 2, 2, 2, 4, 3, 3, 4, 3, 1, 1, 1, 1, 2, 4, 3, 1, 2, 4, 3, 2, 3,
        1, 1, 2]),
 'Age': array([40, 33, 32, 40, 48, 24, 33, 29, 29, 31, 45, 47, 28, 32, 54, 39, 28,
        50, 40, 31, 51, 26, 41, 46, 27]),
 'SibSp': array([11, 13, 12, 19, 22, 17, 23, 12, 12, 12, 12, 24, 16, 21, 12, 15, 20,
        18, 10, 17, 20, 12, 17, 17, 10]),
 'Comf' : array([236.66883531, 235.46750709, 235.64574546, 241.16838089,
        239.40728836, 239.95592634, 236.67806901, 237.73350635,
        238.74497849, 235.17486552, 235.8457374 , 236.85133744,
        240.9359547 , 236.27703374, 237.81871052, 241.62788018,
        241.29185342, 235.0058136 , 240.69989317, 238.8073828 ,
        238.08841364, 236.55259788, 237.58108419, 239.66916186,
        241.97479544]),
 'Parch': array([232.37686437, 232.39153096, 230.56566556, 232.77980061,
        232.19436342, 232.2165835 , 232.28145641, 231.26988217,
        230.55287196, 232.26528521, 230.45185855, 230.87525326,
        231.38775744, 232.80960083, 232.33105822, 232.65782351,
        231.64457366, 230.45225829, 231.05404057, 232.38229998,
        232.57354117, 232.08690375, 230.40414215, 230.14361969,
        231.40414745]),
 'Fare': array([238.80427104, 239.32031287, 238.02212358, 238.40333494,
        238.85929097, 239.51666683, 239.87771029, 238.06772515,
        238.22734658, 238.54682118, 238.68880278, 239.79658425,
        238.2642908 , 239.22884058, 239.84423352, 239.69438831,
        238.85871719, 238.64632848, 238.7085097 , 239.5700877 ,
        239.06199698, 238.37341378, 239.16126748, 239.01280153,
        239.77047796])}

df = pd.DataFrame(data)
我试图为第一行设置一个条件,即如果“Pclass”==1且“Comf”介于“Parch”和“Fare”之间,则创建一个新列“surved”,并分配1,否则分配0

然后对“Pclass”==2,3执行相同的操作

我想对熊猫进行此操作,但是欢迎使用此问题的所有解决方案。

只需计算条件并转换为
int
类型:

df=pd.DataFrame(data=data)
df=df.assign(幸存的=lambda x:x['Comf']。介于(x['Parch'],x['Fare'])之间。astype(int))
打印(df.to_string())
或使用
=

df=pd.DataFrame(data=data)
df['Survived']=df['Comf']。介于(df['Parch'],df['Fare'])之间。aType(int)
打印(df.to_string())
输出:

PassengerId Pclass Age SibSp Comf Parch Fare Survived 0 2255 3 40 11 236.668835 232.376864 238.804271 1 1 2257 2 33 13 235.467507 232.391531 239.320313 1 2 2258 2 32 12 235.645745 230.565666 238.022124 1 3 2256 2 40 19 241.168381 232.779801 238.403335 0 4 2257 4 48 22 239.407288 232.194363 238.859291 0 5 2258 3 24 17 239.955926 232.216584 239.516667 0 6 2255 3 33 23 236.678069 232.281456 239.877710 1 7 2258 4 29 12 237.733506 231.269882 238.067725 1 8 2257 3 29 12 238.744978 230.552872 238.227347 0 9 2257 1 31 12 235.174866 232.265285 238.546821 1 10 2255 1 45 12 235.845737 230.451859 238.688803 1 11 2255 1 47 24 236.851337 230.875253 239.796584 1 12 2257 1 28 16 240.935955 231.387757 238.264291 0 13 2256 2 32 21 236.277034 232.809601 239.228841 1 14 2257 4 54 12 237.818711 232.331058 239.844234 1 15 2256 3 39 15 241.627880 232.657824 239.694388 0 16 2255 1 28 20 241.291853 231.644574 238.858717 0 17 2258 2 50 18 235.005814 230.452258 238.646328 1 18 2258 4 40 10 240.699893 231.054041 238.708510 0 19 2256 3 31 17 238.807383 232.382300 239.570088 1 20 2256 2 51 20 238.088414 232.573541 239.061997 1 21 2257 3 26 12 236.552598 232.086904 238.373414 1 22 2258 1 41 17 237.581084 230.404142 239.161267 1 23 2258 1 46 17 239.669162 230.143620 239.012802 0 24 2257 2 27 10 241.974795 231.404147 239.770478 0 乘客P类年龄SibSp Comf烤肉费幸存 0 2255 3 40 11 236.668835 232.376864 238.804271 1 1 2257 2 33 13 235.467507 232.391531 239.320313 1 2 2258 2 32 12 235.645745 230.565666 238.022124 1 3 2256 2 40 19 241.168381 232.779801 238.403335 0 4 2257 4 48 22 239.407288 232.194363 238.859291 0 5 2258 3 24 17 239.955926 232.216584 239.516667 0 6 2255 3 33 23 236.678069 232.281456 239.877710 1 7 2258 4 29 12 237.733506 231.269882 238.067725 1 8 2257 3 29 12 238.744978 230.552872 238.227347 0 9 2257 1 31 12 235.174866 232.265285 238.546821 1 10 2255 1 45 12 235.845737 230.451859 238.688803 1 11 2255 1 47 24 236.851337 230.875253 239.796584 1 12 2257 1 28 16 240.935955 231.387757 238.264291 0 13 2256 2 32 21 236.277034 232.809601 239.228841 1 14 2257 4 54 12 237.818711 232.331058 239.844234 1 15 2256 3 39 15 241.627880 232.657824 239.694388 0 16 2255 1 28 20 241.291853 231.644574 238.858717 0 17 2258 2 50 18 235.005814 230.452258 238.646328 1 18 2258 4 40 10 240.699893 231.054041 238.708510 0 19 2256 3 31 17 238.807383 232.382300 239.570088 1 20 2256 2 51 20 238.088414 232.573541 239.061997 1 21 2257 3 26 12 236.552598 232.086904 238.373414 1 22 2258 1 41 17 237.581084 230.404142 239.161267 1 23 2258 1 46 17 239.669162 230.143620 239.012802 0 24 2257 2 27 10 241.974795 231.404147 239.770478 0
如果要对所有行执行此操作,而不考虑PClass值,则可以使用

df["Survived"] = df["Comf"].between(df["Parch"], df["Fare"]).astype(int)
df["Survived"] = (df["Pclass"]==1 & df["Comf"].between(df["Parch"], df["Fare"])).astype(int)
但是,如果您想为特定的PClass执行此操作,则可以使用

df["Survived"] = df["Comf"].between(df["Parch"], df["Fare"]).astype(int)
df["Survived"] = (df["Pclass"]==1 & df["Comf"].between(df["Parch"], df["Fare"])).astype(int)
试试这个

步骤

  • 根据您的情况获取索引
  • indexesOfTrue=df[(df[“Pclass”]==1)和(df[“Comf”]>df[“Parch”])和(df[“Comf”]
    
  • 使用loc填充索引
  • df.loc[indexesOfTrue,“幸存”]=1

  • 填充不真实的索引
  • df.loc[~df.index.isin(ind),“幸存”]=0

    输出

    PassengerId  Pclass  Age  SibSp Comf       Parch        Fare  Survived
        5   2258    3   24  17  239.955926  232.216584  239.516667  2
        6   2255    3   33  23  236.678069  232.281456  239.877710  2
        7   2258    4   29  12  237.733506  231.269882  238.067725  2
        8   2257    3   29  12  238.744978  230.552872  238.227347  2
        9   2257    1   31  12  235.174866  232.265285  238.546821  1
        10  2255    1   45  12  235.845737  230.451859  238.688803  1
        11  2255    1   47  24  236.851337  230.875253  239.796584  1
        12  2257    1   28  16  240.935955  231.387757  238.264291  2
        13  2256    2   32  21  236.277034  232.809601  239.228841  2
        14  2257    4   54  12  237.818711  232.331058  239.844234  2
    

    第2类、第3类..的结果应该是什么?如果“Comf”在同一列“幸存”中的“Parch”和“Fare”之间,则为布尔值?@AndrejKesely是的,第2、3类的结果。。。如果PClass=2,且“Comf”介于“Parch”和“Fare”分配1之间,则应为1,否则为0,然后移动到下一个PClass,即3,如果PClass=1,且“Comf”介于“Parch”和“Fare”分配1之间,则为0。这是一个行操作,PClass和Comf的条件是按行操作的。我不完全理解。。。PClass只是一列。你能编辑你的问题并把预期的结果放在那里吗?谢谢你的回答。。但是,我想得到第2、3类的结果。。。如果PClass=2,且“Comf”介于“Parch”和“Fare”分配1之间,则其应为1,否则为0,然后移动到下一个PClass,即3,如果PClass 3==1且“Comf”介于“Parch”和“Fare”分配1之间,则为0。PS.这是一个行操作,PClass和Comf的条件按行完成@HenryEckerI已经更新了我的答案并删除了PClass上的过滤器,但我很确定我的一般答案符合您的要求。您能指出输出中我的输出对于您的用例来说是意外的特定行吗?