在python中每月重新采样数据

在python中每月重新采样数据,python,pandas,Python,Pandas,我在下面有一个大型csv文件示例 data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv') data.head() 我的大型CSV有200多个国家,从1800年到2040年,他们的数据每年记录一次,我的目标是将这些数据重新采样到每月并插值列,如下所示,我使用了阿富汗1800年来说明我期望的最终结果 预期输出: name date value Afghanistan Jan

我在下面有一个大型csv文件示例

data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv')
data.head()



我的大型CSV有200多个国家,从1800年到2040年,他们的数据每年记录一次,我的目标是将这些数据重新采样到每月并插值列,如下所示,我使用了阿富汗1800年来说明我期望的最终结果

预期输出:

name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
我希望我的所有数据都像上面的python一样重新采样,因为这是我的模型工作的唯一方法。 注:日期应采用上述格式

我试了好几次都没有成功

data['year'] = pd.to_datetime(data.year, format='%Y')
head(data)

错误:

name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined



错误:

name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined

错误:

name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
data.groupby(name).resample('1M',how='interpolate')
名称错误:未定义名称“name”

想法?

使用有条件地为缺少日期值的位置指定名称

提前填写缺失日期

data['date']=pd.to_datetime(data['date']).ffill()
按日期分组并重置回数据帧

data.set_index('date', inplace=True)
data['value'] = np.where( data.index.month== 1, 'start_value', data['value']) 
data['value'] = np.where( data.index.month== 12, 'End_value', data['value'])
data.groupby(data.index.month)['name', 'value'].ffill().reset_index().sort_values(by=['name','date'], ascending=True).drop_duplicates()

@开发者-我不熟悉插值或重采样,但我想尝试另一种方式。我实际生产的产品与您期望的产品类似:

import pandas as pd
import numpy as np
data = pd.DataFrame({'name':['Afghanistan', 'Albania', 'Zimbabwe','Afghanistan', 
                             'Albania', 'Zimbabwe'],
                     'year':[1800,1800,1800,2040,2040,2040],
                     'value' : [603,667,59,2415,2804,3210]
                     })
df_year_unique = pd.DataFrame(data['year'].drop_duplicates().reset_index(drop=True))
df_name_unique = pd.DataFrame(data['name'].drop_duplicates().reset_index(drop=True))
df_month_unique = pd.DataFrame({'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']})
df_name = pd.DataFrame(pd.concat([df_name_unique]*len(
df_month_unique)*len(df_year_unique),
ignore_index=True)).sort_values('name').reset_index(drop=True)
df_month = pd.DataFrame(pd.concat([df_month_unique]*len(
df_year_unique)*len(df_name_unique),
ignore_index=True))
df_year = pd.DataFrame(pd.concat([df_year_unique]*len(
df_month_unique)*len(df_name_unique),                          
ignore_index=True)).sort_values('year').reset_index(drop=True)
df_year_month = pd.merge(df_month, df_year, how='inner', left_index=True, 
right_index=True)
df_year_month_name = pd.merge(df_year_month, df_name, how='inner', left_index=True, 
right_index=True)
df = pd.merge(df_year_month_name, data, how='left', on=['name','year'])
df['value'] = np.where(df['Month'] != 'Dec', '.', df['value'])
df['value'] = np.where(df['Month'] == 'Jan', 'start_value', df['value'])
df['value'] = np.where(df['Month'] == 'Jul', '.This column is interpolated smoothly', 
df['value'])
df

data.groupby('name')。重采样('1M',how='interpolate')不是name它应该是'name'所有读数是从1月开始到12月结束,还是希望最小开始日期标记为start,最大开始日期标记为end?