Python 如何在timeseries上合并两个数据帧_Python_Pandas_Dataframe_Merge

Python 如何在timeseries上合并两个数据帧

python pandas dataframe merge

Python 如何在timeseries上合并两个数据帧,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我希望创建的是一个数据帧，它看起来像： amount months category 0 6460 2018-01-31 budgeted 1 7905 2018-01-31 actual 2 11509 2018-02-28 budgeted 3 21502 2018-02-28 actual ... ... amount_x months category_x amoun

我希望创建的是一个数据帧，它看起来像：

    amount  months      category    
0   6460    2018-01-31  budgeted    
1   7905    2018-01-31  actual  
2   11509   2018-02-28  budgeted    
3   21502   2018-02-28  actual 
...
...

    amount_x    months      category_x  amount_y    category_y
0   6460        2018-01-31  budgeted    7905        actual
1   11509       2018-02-28  budgeted    21502       actual
...
...

我拥有的示例代码和我正在使用的基本数据是：

import pandas as pd
import string
import altair as alt

from random import randint

# 
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months                  = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts           = [ "actual", "budgeted", "difference" ]

summary = []

summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )

index   = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )

summaryDF = pd.DataFrame( summary, index = index, columns = columns )

#
# From this point, I am trying to transform the summaryDF into something 
# I can use in a different context...
#

budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )

idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual   = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()

budgeted.index = budgetMonths
actual.index = budgetMonths

budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF   = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })

print( budgetedDF )
print( actualDF )

df3 = pd.merge( budgetedDF, actualDF, on = 'months' )
df3

df3看起来像：

    amount  months      category    
0   6460    2018-01-31  budgeted    
1   7905    2018-01-31  actual  
2   11509   2018-02-28  budgeted    
3   21502   2018-02-28  actual 
...
...

    amount_x    months      category_x  amount_y    category_y
0   6460        2018-01-31  budgeted    7905        actual
1   11509       2018-02-28  budgeted    21502       actual
...
...

我想我快要得到我想要的了…只需要最后的合并步骤。

使用

pd.concat“合并”这些数据帧
df3 = (pd.concat([budgetedDF, actualDF])
         .sort_index()
         .reset_index(drop=True)
)


但是，您可能更喜欢这种表示方式：
df3 = (pd.concat([budgetedDF, actualDF])
         .drop('months', 1)
         .set_index('category', append=True)
         .unstack()
)

df3
           amount         
category   actual budgeted
2018-01-31   3612     2183
2018-02-28   3357     8902
2018-03-30   2828     9956
2018-04-30   2990    14475
2018-05-31   4446    25385
2018-06-29  19119    29119
2018-07-31  27296    40869
2018-08-31  38443    43400
2018-09-28  47978    52686
2018-10-31  49612    63384
2018-11-30  49272    74107
2018-12-31  56907    83653

听起来你想要的是
pd.concat([budgetedDF, actualDF]).sort_values('months').reset_index(drop=True)

如何在普通月份值上进行合并？@coldspeed:当我在文章顶部阅读所需行为时，他们希望为月份的每个值保留单独的值。不，他们在脚本底部通过pd.merge（budgetedDF，actualDF，on='months'）
对月份进行合并。合并后，每个数据帧的结果都有一行。这正是我阅读时他们试图避免的行为。不，请参阅。他们希望在普通月份合并，然后合并结果作为单独的行存在。他们甚至使用了一个明确的指示符列。“你可能更喜欢这个表述”后面的concat行不是我最终需要的表述，正如我在文章顶部所指出的。原因是我需要一个表示法，该表示法与altair希望从数据帧生成图形的方式非常友好。@Eric当然，这是您的选择。让你知道这些是你的选择。我不知道牵牛星，所以我也不知道，谢谢。非常感谢。我仍然不确定你的答案与另一个答案有什么不同。@Eric他们实际上是一样的，但我在他们的答案中涵盖了比fuglede更多的基础。