Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/jpa/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python df按日期添加行,因此每个组在同一日期结束。F剩余行数_Python_Pandas_Group By_Row_Ffill - Fatal编程技术网

Python df按日期添加行,因此每个组在同一日期结束。F剩余行数

Python df按日期添加行,因此每个组在同一日期结束。F剩余行数,python,pandas,group-by,row,ffill,Python,Pandas,Group By,Row,Ffill,要使用地理绘图动画帧,我希望我的所有组在同一日期结束。这将避免最后一个框架将某些国家灰化。目前,根据日期,最新的数据点是“时间戳('2021-05-13 00:00:00')” 因此,在下一步中,我想根据所有国家添加新行,以便它们在df中的最新日期之前都有行。 可以使用ffill填充“每百人接种人数”和“每百人完全接种人数”列 数据: 因此,理想情况下,如果挪威比最新数据点“2021-05-13”少1天,则应添加新行,如下所示。df中的所有其他国家都应该这样做 范例 country i

要使用地理绘图动画帧,我希望我的所有组在同一日期结束。这将避免最后一个框架将某些国家灰化。目前,根据日期,最新的数据点是“时间戳('2021-05-13 00:00:00')”

因此,在下一步中,我想根据所有国家添加新行,以便它们在df中的最新日期之前都有行。 可以使用ffill填充“每百人接种人数”和“每百人完全接种人数”列

数据:

因此,理想情况下,如果挪威比最新数据点“2021-05-13”少1天,则应添加新行,如下所示。df中的所有其他国家都应该这样做

范例

    country iso_code    date    people_vaccinated_per_hundred   people_fully_vaccinated_per_hundred
12028   Norway  NOR 2021-05-02  0.00    NaN
12029   Norway  NOR 2021-05-03  0.00    NaN
12188   Norway  NOR ...         ...     ...
12188   Norway  NOR 2021-05-11  27.81   9.55
12189   Norway  NOR 2021-05-12  28.49   10.42

Add new row
12189   Norway  NOR 2021-05-13  28.49   10.42

一种直截了当的方法可能是创建国家和日期的笛卡尔乘积,然后加入其中,为每个缺少的日期和国家组合创建空值

countries = df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates = df.loc[:, 'date'].drop_duplicates()
all_countries_dates = countries.merge(dates, how='cross')

df.merge(all_countries_dates, how='right', on=['country', 'iso_code', 'date'])
使用以下数据集:

country       iso_code  date        people_vaccinated   people_fully_vaccinated
Norway        NOR       2021-05-09  0.00                1.00
Norway        NOR       2021-05-10  0.00                3.00
Norway        NOR       2021-05-11  27.81               9.55
Norway        NOR       2021-05-12  28.49               10.42
Norway        NOR       2021-05-13  28.49               10.42
United States USA       2021-05-09  23.00               3.00
United States USA       2021-05-10  23.00               3.00
此转换将为您提供:

country       iso_code  date        people_vaccinated   people_fully_vaccinated
Norway        NOR       2021-05-09  0.00                1.00
Norway        NOR       2021-05-10  0.00                3.00
Norway        NOR       2021-05-11  27.81               9.55
Norway        NOR       2021-05-12  28.49               10.42
Norway        NOR       2021-05-13  28.49               10.42
United States USA       2021-05-09  23.00               3.00
United States USA       2021-05-10  23.00               3.00
United States USA       2021-05-11  NaN                 NaN
United States USA       2021-05-12  NaN                 NaN
United States USA       2021-05-13  NaN                 NaN

在此之后,您可以使用fillna更改添加行的空值。

在早于pandas 1.1.5的版本中为交叉连接编写代码

    #creating a df with all unique countries and iso_codes
#creating a new table with all the dates in the original dataframe
countries = animation_covid_df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates_df = animation_covid_df.loc[:, ['date']].drop_duplicates()

#creating an index called row number to later merge the dates table with the countries table on
dates_df['row_number'] = dates_df.reset_index().index

number_of_dates = dates_df.max() #shows the number of dates or rows in the the dates table

#creating an equivilant number of rows for each country as there are dates in the dates_df 
indexed_country = countries.append([countries]*number_of_dates[1],ignore_index=True)
indexed_country = indexed_country.sort_values(['country', 'iso_code'], ascending=True)
#creating a new column called 'row_number' to join the indexed_country df with the dates_df
indexed_country['row_number'] = indexed_country.groupby(['country', 'iso_code']).cumcount()+1

#merging all the indexed countries with all the possible dates on the row number
indexed_country_date_df = indexed_country.merge(dates_df, on='row_number', how='left', suffixes=('_1', '_2'))

#setting the 'date' column in both tables to datetime so they can be merged on
animation_covid_df['date'] = pd.to_datetime(animation_covid_df['date'])
indexed_country_date_df['date'] = pd.to_datetime(indexed_country_date_df['date'])

谢谢你@rich_morty!这种逻辑起了作用。在kaggle中,他们使用熊猫版本1.1.5。因此,我无法使用您的确切代码,必须写出整个交叉连接。但交叉连接逻辑起了作用。你可以在我的kaggle项目中看到完整的代码,我真的很高兴它有帮助!!!