Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ssis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 熊猫系列按百分比分组_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Pandas 熊猫系列按百分比分组

Pandas 熊猫系列按百分比分组,pandas,dataframe,pandas-groupby,Pandas,Dataframe,Pandas Groupby,我有一个数据框。我已使用将列状态按日期分组 y = news_dataframe.groupby(by=[news_dataframe['date'].dt.date,news_dataframe['status']])['status'].count() 我的输出是-- 现在我想按日期计算每个状态组的百分比。如何使用pandas dataframe实现这一点 计算两个不同的groupbys并将其中一个除以另一个: y_numerator = news_dataframe.groupby(by

我有一个数据框。我已使用将列状态按日期分组

y = news_dataframe.groupby(by=[news_dataframe['date'].dt.date,news_dataframe['status']])['status'].count()
我的输出是--


现在我想按日期计算每个状态组的百分比。如何使用pandas dataframe实现这一点

计算两个不同的groupbys并将其中一个除以另一个:

y_numerator = news_dataframe.groupby(by=[news_dataframe['date'].dt.date,news_dataframe['status']])['status'].count()


y_denominator = news_dataframe.groupby(by=news_dataframe['date'].dt.date)['status'].count()

y=y_numerator/y_denominator
试试这个:

# just fill the consecutive rows with this
df=df.ffill()
df.df1.columns=['date','status','count']
# getting the total value of count with date and status
df1=df.groupby(['date']).sum().reset_index()
#renaming it to total as it is the sum
df1.columns=['date','status','total']  

# now join the tables to find the total and actual value together
df2=df.merge(df1,on=['date']) 

#calculate the percentage
df2['percentage']=(df2.count/df2.total)*100
如果您需要一个衬垫,则其:

df['percentage']=(df.ffill()['count]/df.ffill().groupby(['date']).sum().reset_index().rename(columns={'count': 'total'}).merge(df,on=['date'])['total'])*100

我想这是最短的:

news_dataframe['date'] = news_dataframe['date'].dt.date
news_dataframe.groupby(['date','status'])['status'].count()/news_dataframe.groupby(['date'])['status'].count()

请尝试更好地对齐您的输出,以便我们了解您的意思。我已更新了输出。请尝试:
y['percentage']=y['count'].div(y['count'].sum())*100
news_dataframe['date'] = news_dataframe['date'].dt.date
news_dataframe.groupby(['date','status'])['status'].count()/news_dataframe.groupby(['date'])['status'].count()