在python中每月重新采样数据
我在下面有一个大型csv文件示例在python中每月重新采样数据,python,pandas,Python,Pandas,我在下面有一个大型csv文件示例 data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv') data.head() 我的大型CSV有200多个国家,从1800年到2040年,他们的数据每年记录一次,我的目标是将这些数据重新采样到每月并插值列,如下所示,我使用了阿富汗1800年来说明我期望的最终结果 预期输出: name date value Afghanistan Jan
data = pd.read_csv('C:/Users/Ene_E/Desktop/Data/data.csv')
data.head()
我的大型CSV有200多个国家,从1800年到2040年,他们的数据每年记录一次,我的目标是将这些数据重新采样到每月并插值列,如下所示,我使用了阿富汗1800年来说明我期望的最终结果 预期输出:
name date value
Afghanistan Jan 1800 start_value
Afghanistan Feb 1800 .
Afghanistan Mar 1800 .
Afghanistan May 1800 .
Afghanistan Jun 1800 .
Afghanistan Jul 1800 .This column is interpolated smoothly
Afghanistan Aug 1800 .
Afghanistan Sep 1800 .
Afghanistan Oct 1800 .
Afghanistan Nov 1800 .
Afghanistan Dec 1800 603(end value in that year)
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module>
head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data.resample('1M', how='interpolate')
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
base=base, key=on, level=level)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
我希望我的所有数据都像上面的python一样重新采样,因为这是我的模型工作的唯一方法。
注:日期应采用上述格式
我试了好几次都没有成功
data['year'] = pd.to_datetime(data.year, format='%Y')
head(data)
错误:
name date value
Afghanistan Jan 1800 start_value
Afghanistan Feb 1800 .
Afghanistan Mar 1800 .
Afghanistan May 1800 .
Afghanistan Jun 1800 .
Afghanistan Jul 1800 .This column is interpolated smoothly
Afghanistan Aug 1800 .
Afghanistan Sep 1800 .
Afghanistan Oct 1800 .
Afghanistan Nov 1800 .
Afghanistan Dec 1800 603(end value in that year)
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module>
head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data.resample('1M', how='interpolate')
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
base=base, key=on, level=level)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
错误:
name date value
Afghanistan Jan 1800 start_value
Afghanistan Feb 1800 .
Afghanistan Mar 1800 .
Afghanistan May 1800 .
Afghanistan Jun 1800 .
Afghanistan Jul 1800 .This column is interpolated smoothly
Afghanistan Aug 1800 .
Afghanistan Sep 1800 .
Afghanistan Oct 1800 .
Afghanistan Nov 1800 .
Afghanistan Dec 1800 603(end value in that year)
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module>
head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data.resample('1M', how='interpolate')
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
base=base, key=on, level=level)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
错误:
name date value
Afghanistan Jan 1800 start_value
Afghanistan Feb 1800 .
Afghanistan Mar 1800 .
Afghanistan May 1800 .
Afghanistan Jun 1800 .
Afghanistan Jul 1800 .This column is interpolated smoothly
Afghanistan Aug 1800 .
Afghanistan Sep 1800 .
Afghanistan Oct 1800 .
Afghanistan Nov 1800 .
Afghanistan Dec 1800 603(end value in that year)
Traceback (most recent call last): File "<pyshell#12>", line 1, in <module>
head(data) NameError: name 'head' is not defined
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data.resample('1M', how='interpolate')
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 8145, in resample
base=base, key=on, level=level)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1251, in resample
return tg._get_resampler(obj, kind=kind)
File "C:\Python27\lib\site-packages\pandas\core\resample.py", line 1381, in _get_resampler
"but got an instance of %r" % type(ax).__name__)
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
data.groupby(name).resample('1M', how='interpolate')
NameError: name 'name' is not defined
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
data.groupby(name).resample('1M',how='interpolate')
名称错误:未定义名称“name”
想法?使用有条件地为缺少日期值的位置指定名称 提前填写缺失日期
data['date']=pd.to_datetime(data['date']).ffill()
按日期分组并重置回数据帧
data.set_index('date', inplace=True)
data['value'] = np.where( data.index.month== 1, 'start_value', data['value'])
data['value'] = np.where( data.index.month== 12, 'End_value', data['value'])
data.groupby(data.index.month)['name', 'value'].ffill().reset_index().sort_values(by=['name','date'], ascending=True).drop_duplicates()
@开发者-我不熟悉插值或重采样,但我想尝试另一种方式。我实际生产的产品与您期望的产品类似:
import pandas as pd
import numpy as np
data = pd.DataFrame({'name':['Afghanistan', 'Albania', 'Zimbabwe','Afghanistan',
'Albania', 'Zimbabwe'],
'year':[1800,1800,1800,2040,2040,2040],
'value' : [603,667,59,2415,2804,3210]
})
df_year_unique = pd.DataFrame(data['year'].drop_duplicates().reset_index(drop=True))
df_name_unique = pd.DataFrame(data['name'].drop_duplicates().reset_index(drop=True))
df_month_unique = pd.DataFrame({'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']})
df_name = pd.DataFrame(pd.concat([df_name_unique]*len(
df_month_unique)*len(df_year_unique),
ignore_index=True)).sort_values('name').reset_index(drop=True)
df_month = pd.DataFrame(pd.concat([df_month_unique]*len(
df_year_unique)*len(df_name_unique),
ignore_index=True))
df_year = pd.DataFrame(pd.concat([df_year_unique]*len(
df_month_unique)*len(df_name_unique),
ignore_index=True)).sort_values('year').reset_index(drop=True)
df_year_month = pd.merge(df_month, df_year, how='inner', left_index=True,
right_index=True)
df_year_month_name = pd.merge(df_year_month, df_name, how='inner', left_index=True,
right_index=True)
df = pd.merge(df_year_month_name, data, how='left', on=['name','year'])
df['value'] = np.where(df['Month'] != 'Dec', '.', df['value'])
df['value'] = np.where(df['Month'] == 'Jan', 'start_value', df['value'])
df['value'] = np.where(df['Month'] == 'Jul', '.This column is interpolated smoothly',
df['value'])
df
data.groupby('name')。重采样('1M',how='interpolate')不是name它应该是'name'所有读数是从1月开始到12月结束,还是希望最小开始日期标记为start,最大开始日期标记为end?