Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据循环中另一列的值将列的值更改为nan_Python_Pandas_Loops_Conditional Statements_Calculated Columns - Fatal编程技术网

Python 根据循环中另一列的值将列的值更改为nan

Python 根据循环中另一列的值将列的值更改为nan,python,pandas,loops,conditional-statements,calculated-columns,Python,Pandas,Loops,Conditional Statements,Calculated Columns,我有很多列的后缀是“mean”或“sum”。有时带有“mean”后缀的是NaN。当这种情况发生时,我想把后缀为sum的也变成NaN。我有很多变量,所以我需要(?)使用循环。我已经创建了一个假的数据框架,并在SO中添加了我在类似帖子的基础上尝试过的3件事情。不幸的是,一切都不起作用 original_data_set = (pd.DataFrame ( { 'customerId':[1,2] ,'usage_1_sum':[100, 200] ,'usage_1_mean

我有很多列的后缀是“mean”或“sum”。有时带有“mean”后缀的是NaN。当这种情况发生时,我想把后缀为sum的也变成NaN。我有很多变量,所以我需要(?)使用循环。我已经创建了一个假的数据框架,并在SO中添加了我在类似帖子的基础上尝试过的3件事情。不幸的是,一切都不起作用

original_data_set = (pd.DataFrame
(
{
    'customerId':[1,2]
    ,'usage_1_sum':[100, 200]
    ,'usage_1_mean':[np.nan,100]
    ,'usage_2_sum':[420,330]
    ,'usage_2_mean':[45,np.nan]
}
)
             )

print('original dataset')
original_data_set

desired_data_set = (pd.DataFrame
(
{
    'customerId':[1,2]
    ,'usage_1_sum':[np.nan, 200]
    ,'usage_1_mean':[np.nan,100]
    ,'usage_2_sum':[420,np.nan]
    ,'usage_2_mean':[45,np.nan]
}
)
             )

print('desired dataset')
desired_data_set



holder_set = original_data_set.copy()

for number in range(1,3):
    holder_set['usage_{}_sum'.format(number)] = (
        
        holder_set['usage_{}_sum'.format(number)]
        .where(holder_set['usage_{}_mean'.format(number)] == np.nan, np.nan
              )
                                                )

print('using an np.where statement changed all sum variables into NaN with no discretion')
holder_set


holder_set = original_data_set.copy()

for number in range(1,3):
    conditions = [holder_set['usage_{}_mean'.format(number)]==np.nan]
    outcome = [np.nan]
    holder_set['usage_{}_sum'.format(number)] = np.select(conditions, outcome, default=holder_set['usage_{}_sum'.format(number)])
    
    
print('using an np.select did not have any effect on the dataframe')
holder_set


holder_set = original_data_set.copy()

for number in range(1,3):
    holder_set.loc[holder_set['usage_{}_mean'.format(number)]==np.nan, 'usage_{}_sum'.format(number)] = 12

print('using a loc did not have any effect on the dataframe')
holder_set


假设
原始
数据帧为
df

df = pd.DataFrame({'customerId': [1, 2], 'usage_1_sum': [100, 200], 'usage_1_mean': [
                  np.nan, 100], 'usage_2_sum': [420, 330], 'usage_2_mean': [45, np.nan]})
使用,过滤以
\u mean
结尾的列,然后对于以
\u mean
结尾的列中的每一列,将
\u sum
列中的相应值更改为
NaN
,其中平均列中的值为
NaN

for col in df.columns[df.columns.str.endswith('_mean')]:
    df.loc[df[col].isna(), col.rstrip('_mean') + '_sum'] = np.nan
结果:

# print(df)
   customerId  usage_1_sum  usage_1_mean  usage_2_sum  usage_2_mean
0           1          NaN           NaN        420.0          45.0
1           2        200.0         100.0          NaN           NaN

也许可以尝试查看
DataFrame.where()
功能。您应该能够直接索引到问题区域,而无需自己编写循环。