Python 映射多个数据帧和填充列的值_Python_Pandas_Dataframe

Python 映射多个数据帧和填充列的值

python pandas dataframe

Python 映射多个数据帧和填充列的值,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有以下三个数据帧：数据帧1: df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]} df1 = pd.DataFrame(data=df1) df1 = df1.set_index('year') df1 year count 2010 1 2012 1 2014 1 2015 1 df2 = {'year': ['2010','2011','2016','2017'], 'c

假设我有以下三个数据帧：

数据帧1:

df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1

year    count
2010    1
2012    1
2014    1
2015    1

df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2

year    count
2010    2
2011    1
2016    3
2017    1

df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3

year    count
2010    4
2011    2
2012    5
2013    4
2014    4
2015    1
2017    1

数据帧2:

df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1

year    count
2010    1
2012    1
2014    1
2015    1

df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2

year    count
2010    2
2011    1
2016    3
2017    1

df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3

year    count
2010    4
2011    2
2012    5
2013    4
2014    4
2015    1
2017    1

数据帧3:

df1 = {'year': ['2010','2012','2014','2015'], 'count': [1,1,1,1]}
df1 = pd.DataFrame(data=df1)
df1 = df1.set_index('year')
df1

year    count
2010    1
2012    1
2014    1
2015    1

df2 = {'year': ['2010','2011','2016','2017'], 'count': [2,1,3,1]}
df2 = pd.DataFrame(data=df2)
df2 = df2.set_index('year')
df2

year    count
2010    2
2011    1
2016    3
2017    1

df3 = {'year': ['2010','2011','2012','2013','2014','2015','2017'], 'count': [4,2,5,4,4,1,1]}
df3 = pd.DataFrame(data=df3)
df3 = df3.set_index('year')
df3

year    count
2010    4
2011    2
2012    5
2013    4
2014    4
2015    1
2017    1

现在我想要三个数据帧，包括所有年份和计数。例如，如果

df1

缺少2011年、2013年、2016年、2017年，则这些年份将添加到df1的索引中，每个新添加的索引的计数为0

因此，对于df1，我的输出如下：

year    count
2010    1
2012    1
2014    1
2015    1
2011    0
2013    0
2016    0
2017    0

df2和df3也是如此。谢谢。

您可以使用：

另一个解决方案：

from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)

['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']

您可以使用：

另一个解决方案：

from functools import reduce
idx = reduce(np.union1d,[df1.index, df2.index, df3.index])
print (idx)

['2010' '2011' '2012' '2013' '2014' '2015' '2016' '2017']

在

所有年份中使用reindex

In [257]: all_years = df1.index | df2.index | df3.index

In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
      count
year
2010      1
2011      0
2012      1
2013      0
2014      1
2015      1
2016      0
2017      0

In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
      count
year
2010      2
2011      1
2012      0
2013      0
2014      0
2015      0
2016      3
2017      1

在所有年份中使用reindex

In [257]: all_years = df1.index | df2.index | df3.index

In [258]: df1.reindex(all_years, fill_value=0)
Out[258]:
      count
year
2010      1
2011      0
2012      1
2013      0
2014      1
2015      1
2016      0
2017      0

In [259]: df2.reindex(all_years, fill_value=0)
Out[259]:
      count
year
2010      2
2011      1
2012      0
2013      0
2014      0
2015      0
2016      3
2017      1

我会选择union，你也可以使用unique，即
idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index])) 
df1.reindex(idx).fillna(0)

      count
year       
2010    1.0
2012    1.0
2014    1.0
2015    1.0
2011    0.0
2016    0.0
2017    0.0
2013    0.0

我会选择union，你也可以使用unique，即
idx = pd.Series(np.concatenate([df1.index,df2.index,df3.index])).unique()
# or idx = set(np.concatenate([df1.index,df2.index,df3.index])) 
df1.reindex(idx).fillna(0)

      count
year       
2010    1.0
2012    1.0
2014    1.0
2015    1.0
2011    0.0
2016    0.0
2017    0.0
2013    0.0

也可以使用迭代：
# find missing years:
morelist = [ j            # items which satisfy following criteria
             # list of all numbers converted to strings:
             for j in map(lambda x: str(x), range(2010, 2018, 1))
             if  j not in df1.index  ]      # those not in current index

# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),   
                      columns=['count'], 
                      index=morelist)

# add new dataframe to original:
df1 = pd.concat([df1, df2add]) 

print(df1)

输出：
      count
2010      1
2012      1
2014      1
2015      1
2011      0
2013      0
2016      0
2017      0

也可以使用迭代：
# find missing years:
morelist = [ j            # items which satisfy following criteria
             # list of all numbers converted to strings:
             for j in map(lambda x: str(x), range(2010, 2018, 1))
             if  j not in df1.index  ]      # those not in current index

# create a dataframe to be added:
df2add = pd.DataFrame(data=[0]*len(morelist),   
                      columns=['count'], 
                      index=morelist)

# add new dataframe to original:
df1 = pd.concat([df1, df2add]) 

print(df1)

输出：
      count
2010      1
2012      1
2014      1
2015      1
2011      0
2013      0
2016      0
2017      0

plus1，但更好的是df1.reindex（idx，fill\u value=0）
Sir我有相同的，它使所有的答案非常相似，所以使用fillna
，我带着union解决方案来这里，一秒钟内丢失了它，所以转到unique.plus1，但更好的是df1.reindex（idx，fill\u value=0）
Sir我有相同的，这使得所有的答案都非常相似，所以使用了fillna
，我带着union solution来到这里，一秒钟内就丢失了它，所以转到了unique。