
在python中每月重新采样数据,python,pandas,Python,Pandas,我在下面有一个大型csv文件示例 data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv') data.head() 我的大型CSV有200多个国家,从1800年到2040年,他们的数据每年记录一次,我的目标是将这些数据重新采样到每月并插值列,如下所示,我使用了阿富汗1800年来说明我期望的最终结果 预期输出: name date value Afghanistan Jan


data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv')



name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
我希望我的所有数据都像上面的python一样重新采样,因为这是我的模型工作的唯一方法。 注:日期应采用上述格式


data['year'] = pd.to_datetime(data.year, format='%Y')


name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined


name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined


name               date    value
Afghanistan        Jan     1800   start_value
Afghanistan        Feb     1800   .
Afghanistan        Mar     1800   . 
Afghanistan        May     1800   .
Afghanistan        Jun     1800   .
Afghanistan        Jul     1800   .This column is interpolated smoothly
Afghanistan        Aug     1800   .
Afghanistan        Sep     1800   .
Afghanistan        Oct     1800   .
Afghanistan        Nov     1800   .
Afghanistan        Dec     1800   603(end value in that year)
Traceback (most recent call last):   File "<pyshell#12>", line 1, in <module>
    head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    data.resample('1M', how='interpolate')
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
    base=base, key=on, level=level)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
    return tg._get_resampler(obj, kind=kind)
  File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined





data.set_index('date', inplace=True)
data['value'] = np.where( data.index.month== 1, 'start_value', data['value']) 
data['value'] = np.where( data.index.month== 12, 'End_value', data['value'])
data.groupby(data.index.month)['name', 'value'].ffill().reset_index().sort_values(by=['name','date'], ascending=True).drop_duplicates()


import pandas as pd
import numpy as np
data = pd.DataFrame({'name':['Afghanistan', 'Albania', 'Zimbabwe','Afghanistan', 
                             'Albania', 'Zimbabwe'],
                     'value' : [603,667,59,2415,2804,3210]
df_year_unique = pd.DataFrame(data['year'].drop_duplicates().reset_index(drop=True))
df_name_unique = pd.DataFrame(data['name'].drop_duplicates().reset_index(drop=True))
df_month_unique = pd.DataFrame({'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']})
df_name = pd.DataFrame(pd.concat([df_name_unique]*len(
df_month = pd.DataFrame(pd.concat([df_month_unique]*len(
df_year = pd.DataFrame(pd.concat([df_year_unique]*len(
df_year_month = pd.merge(df_month, df_year, how='inner', left_index=True, 
df_year_month_name = pd.merge(df_year_month, df_name, how='inner', left_index=True, 
df = pd.merge(df_year_month_name, data, how='left', on=['name','year'])
df['value'] = np.where(df['Month'] != 'Dec', '.', df['value'])
df['value'] = np.where(df['Month'] == 'Jan', 'start_value', df['value'])
df['value'] = np.where(df['Month'] == 'Jul', '.This column is interpolated smoothly', 
