Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在多列上应用行函数_Python_Pandas_Function_Apply - Fatal编程技术网

Python 在多列上应用行函数

Python 在多列上应用行函数,python,pandas,function,apply,Python,Pandas,Function,Apply,我有一个dataframe和一个函数,我想通过pandas.apply应用于多个列。目前,我正在使用for循环来实现这一点,我想用一行代码来替换它 这是我的数据帧: d1 = {'id': [1, 1, 2], 'event': ['e', 'c', 'e'], 'var1': [1, 2, 2], 'time_difference': [0, 5, 2]} df1 = pd.DataFrame(data=d1) ... >> df1 id event var1 time

我有一个dataframe和一个函数,我想通过pandas.apply应用于多个列。目前,我正在使用for循环来实现这一点,我想用一行代码来替换它

这是我的数据帧:

d1 = {'id': [1, 1, 2], 'event': ['e', 'c', 'e'], 'var1': [1, 2, 2], 'time_difference': [0, 5, 2]}
df1 = pd.DataFrame(data=d1)
...
>> df1 
   id event  var1  time_difference
0   1     e     1                0
1   1     c     2                5
2   2     e     2                2
这是我要应用的函数:

def merge_based_on_timelimit(row):
    return row[column_of_interest] if row['time_difference'] <= 1\
        else pd.NA
目前,我正在通过for循环对所有感兴趣的列应用我的函数:

for column_of_interest in columns_of_interest:
    df1[column_of_interest] = df1.apply(merge_based_on_timelimit, axis=1)
但是,我正在寻找一种方法来跳过循环,而是将我的函数直接应用到所有感兴趣的列上。我该怎么做? 到目前为止,我尝试了以下方法:

df1[columns_of_interest] = df1[columns_of_interest].apply(merge_based_on_timelimit, axis=1)
这将返回以下错误:

...
redcap[columns_of_interest] = redcap[columns_of_interest].apply(merge_based_on_timelimit, axis=1)
...
KeyError: 'time_difference'

在我看来,这里没有必要使用
应用
,使用
>1的反向掩码设置值:

df1.loc[df1['time_difference'] > 1, columns_of_interest] = pd.NA

print (df1)
   id event  var1  time_difference
0   1     e     1                0
1   1  <NA>  <NA>                5
2   2  <NA>  <NA>                2
df1.loc[df1[‘时差’]>1,感兴趣的列]=pd.NA
打印(df1)
id事件var1时间差
01 e10
1   1                    5
2   2                    2
您的解决方案可以通过以下方式实现:

def merge_based_on_timelimit(row):
    #added s to column_of_interest
    return row[columns_of_interest] if row['time_difference'] <= 1\
        else pd.NA
columns_of_interest = ['event', 'var1']

#added column time_difference to list
df1[columns_of_interest] = df1[columns_of_interest + ['time_difference']].apply(merge_based_on_timelimit, axis=1)

print (df1)
   id event  var1  time_difference
0   1     e     1                0
1   1  <NA>  <NA>                5
2   2  <NA>  <NA>                2
def merge基于时间限制(行):
#将s添加到感兴趣的列

如果行['time\u difference',返回行[columns\u of\u interest],谢谢!我用了你的第一个答案,它比我做的要优雅得多。
def merge_based_on_timelimit(row):
    #added s to column_of_interest
    return row[columns_of_interest] if row['time_difference'] <= 1\
        else pd.NA
columns_of_interest = ['event', 'var1']

#added column time_difference to list
df1[columns_of_interest] = df1[columns_of_interest + ['time_difference']].apply(merge_based_on_timelimit, axis=1)

print (df1)
   id event  var1  time_difference
0   1     e     1                0
1   1  <NA>  <NA>                5
2   2  <NA>  <NA>                2