Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为聚合创建临时列_Python_Pandas - Fatal编程技术网

Python 为聚合创建临时列

Python 为聚合创建临时列,python,pandas,Python,Pandas,假设我正在为聚合创建以下(临时)列: df['count_of_source_videos'] = np.where(df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE'), 1, 0) 然后是聚合部分: summary_df = df.groupby(['provider', 'id']).agg( num_source_videos = ('count_of_source_videos', 'su

假设我正在为聚合创建以下(临时)列:

df['count_of_source_videos'] = np.where(df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE'), 1, 0)
然后是聚合部分:

summary_df = df.groupby(['provider', 'id']).agg(
  num_source_videos = ('count_of_source_videos', 'sum'),
).reset_index()
使用上述方法,源视频的列
count\u
将永久保留在那里。有没有一种方法可以在不添加新列的情况下进行聚合?如果是,怎么做?

您可以对现有列使用
.rename()
,而不是创建新列:

df['count_of_source_videos'] = np.where(df['is_main_video'] &
                                 df['file_name'].str.contains('DIGITAL_SOURCE'),
                                 1, 0)
summary_df = (df.groupby(['provider', 'id'])['count_of_source_videos'].sum()
                .rename('num_source_videos').reset_index())
或一行:

summary_df = (df.assign(count_of_source_videos=
                        np.where(df['is_main_video'] &
                                 df['file_name'].str.contains('DIGITAL_SOURCE'),
                                 1, 0))
                .groupby(['provider', 'id'])['count_of_source_videos'].sum()
                .rename('num_source_videos').reset_index())
您可以对现有列使用
.rename()
,而不是创建新列:

df['count_of_source_videos'] = np.where(df['is_main_video'] &
                                 df['file_name'].str.contains('DIGITAL_SOURCE'),
                                 1, 0)
summary_df = (df.groupby(['provider', 'id'])['count_of_source_videos'].sum()
                .rename('num_source_videos').reset_index())
或一行:

summary_df = (df.assign(count_of_source_videos=
                        np.where(df['is_main_video'] &
                                 df['file_name'].str.contains('DIGITAL_SOURCE'),
                                 1, 0))
                .groupby(['provider', 'id'])['count_of_source_videos'].sum()
                .rename('num_source_videos').reset_index())
试试这个:

s = df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE')
summary_df = s.groupby([df.provider, df.id]).agg(num_source_videos = 'sum').reset_index()
如果您不想创建一个临时系列
s
,您可以将其链接起来,但它的可读性较差

summary_df = ((df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE'))
                      .groupby([df.provider, df.id])
                      .agg(num_source_videos = 'sum').reset_index())
试试这个:

s = df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE')
summary_df = s.groupby([df.provider, df.id]).agg(num_source_videos = 'sum').reset_index()
如果您不想创建一个临时系列
s
,您可以将其链接起来,但它的可读性较差

summary_df = ((df['is_main_video'] & df['file_name'].str.contains('DIGITAL_SOURCE'))
                      .groupby([df.provider, df.id])
                      .agg(num_source_videos = 'sum').reset_index())

我明白了,如果根本没有第一行,在
(…)
中执行函数,或者这是不可能的呢?@David542我这样做需要对已经存在的列求和。您可以使用
.assign
将其设置为“一行”。不过:我明白了,如果根本没有第一行并在
(…)
中执行该函数,会怎么样?或者这是不可能的?@David542我这样做的方式需要对已经存在的列进行求和。您可以使用
将其设置为“一行”。尽管: