Python 映射多个数据帧和填充列的值
假设我有以下三个数据帧: 数据帧1:Python 映射多个数据帧和填充列的值,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有以下三个数据帧: 数据帧1: df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]} df1 = pd.DataFrame(data=df1) df1 = df1.set_index('year') df1 year count 2010 1 2012 1 2014 1 2015 1 df2 = {'year': ['2010','2011','2016','2017'], 'c
df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1
year count
2010 1
2012 1
2014 1
2015 1
df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2
year count
2010 2
2011 1
2016 3
2017 1
df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3
year count
2010 4
2011 2
2012 5
2013 4
2014 4
2015 1
2017 1
数据帧2:
df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1
year count
2010 1
2012 1
2014 1
2015 1
df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2
year count
2010 2
2011 1
2016 3
2017 1
df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3
year count
2010 4
2011 2
2012 5
2013 4
2014 4
2015 1
2017 1
数据帧3:
df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1
year count
2010 1
2012 1
2014 1
2015 1
df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2
year count
2010 2
2011 1
2016 3
2017 1
df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3
year count
2010 4
2011 2
2012 5
2013 4
2014 4
2015 1
2017 1
现在我想要三个数据帧,包括所有年份和计数。例如,如果df1
缺少2011年、2013年、2016年、2017年,则这些年份将添加到df1的索引中,每个新添加的索引的计数为0
因此,对于df1,我的输出如下:
year count
2010 1
2012 1
2014 1
2015 1
2011 0
2013 0
2016 0
2017 0
df2和df3也是如此。谢谢。您可以使用:
另一个解决方案:
from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)
['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']
您可以使用: 另一个解决方案:
from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)
['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']
在
所有年份中使用reindex
In [257]: all_years = df1.index | df2.index | df3.index
In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
count
year
2010 1
2011 0
2012 1
2013 0
2014 1
2015 1
2016 0
2017 0
In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
count
year
2010 2
2011 1
2012 0
2013 0
2014 0
2015 0
2016 3
2017 1
在所有年份中使用reindex
In [257]: all_years = df1.index | df2.index | df3.index
In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
count
year
2010 1
2011 0
2012 1
2013 0
2014 1
2015 1
2016 0
2017 0
In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
count
year
2010 2
2011 1
2012 0
2013 0
2014 0
2015 0
2016 3
2017 1
我会选择union,你也可以使用unique,即
idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index]))
df1.reindex(idx).fillna(0)
count
year
2010 1.0
2012 1.0
2014 1.0
2015 1.0
2011 0.0
2016 0.0
2017 0.0
2013 0.0
我会选择union,你也可以使用unique,即
idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index]))
df1.reindex(idx).fillna(0)
count
year
2010 1.0
2012 1.0
2014 1.0
2015 1.0
2011 0.0
2016 0.0
2017 0.0
2013 0.0
也可以使用迭代:
# find missing years:
morelist = [ j # items which satisfy following criteria
# list of all numbers converted to strings:
for j in map(lambda x: str(x), range(2010, 2018, 1))
if j not in df1.index ] # those not in current index
# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),
columns=['count'],
index=morelist)
# add new dataframe to original:
df1 = pd.concat([df1, df2add])
print(df1)
输出:
count
2010 1
2012 1
2014 1
2015 1
2011 0
2013 0
2016 0
2017 0
也可以使用迭代:
# find missing years:
morelist = [ j # items which satisfy following criteria
# list of all numbers converted to strings:
for j in map(lambda x: str(x), range(2010, 2018, 1))
if j not in df1.index ] # those not in current index
# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),
columns=['count'],
index=morelist)
# add new dataframe to original:
df1 = pd.concat([df1, df2add])
print(df1)
输出:
count
2010 1
2012 1
2014 1
2015 1
2011 0
2013 0
2016 0
2017 0
plus1,但更好的是df1.reindex(idx,fill\u value=0)
Sir我有相同的,它使所有的答案非常相似,所以使用fillna
,我带着union解决方案来这里,一秒钟内丢失了它,所以转到unique.plus1,但更好的是df1.reindex(idx,fill\u value=0)
Sir我有相同的,这使得所有的答案都非常相似,所以使用了fillna
,我带着union solution来到这里,一秒钟内就丢失了它,所以转到了unique。