Python 如何在timeseries上合并两个数据帧
我希望创建的是一个数据帧,它看起来像:Python 如何在timeseries上合并两个数据帧,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,我希望创建的是一个数据帧,它看起来像: amount months category 0 6460 2018-01-31 budgeted 1 7905 2018-01-31 actual 2 11509 2018-02-28 budgeted 3 21502 2018-02-28 actual ... ... amount_x months category_x amoun
amount months category
0 6460 2018-01-31 budgeted
1 7905 2018-01-31 actual
2 11509 2018-02-28 budgeted
3 21502 2018-02-28 actual
...
...
amount_x months category_x amount_y category_y
0 6460 2018-01-31 budgeted 7905 actual
1 11509 2018-02-28 budgeted 21502 actual
...
...
我拥有的示例代码和我正在使用的基本数据是:
import pandas as pd
import string
import altair as alt
from random import randint
#
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]
summary = []
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
index = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )
summaryDF = pd.DataFrame( summary, index = index, columns = columns )
#
# From this point, I am trying to transform the summaryDF into something
# I can use in a different context...
#
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )
idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()
budgeted.index = budgetMonths
actual.index = budgetMonths
budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })
print( budgetedDF )
print( actualDF )
df3 = pd.merge( budgetedDF, actualDF, on = 'months' )
df3
df3看起来像:
amount months category
0 6460 2018-01-31 budgeted
1 7905 2018-01-31 actual
2 11509 2018-02-28 budgeted
3 21502 2018-02-28 actual
...
...
amount_x months category_x amount_y category_y
0 6460 2018-01-31 budgeted 7905 actual
1 11509 2018-02-28 budgeted 21502 actual
...
...
我想我快要得到我想要的了…只需要最后的合并步骤。使用pd.concat“合并”这些数据帧
df3 = (pd.concat([budgetedDF, actualDF])
.sort_index()
.reset_index(drop=True)
)
但是,您可能更喜欢这种表示方式:
df3 = (pd.concat([budgetedDF, actualDF])
.drop('months', 1)
.set_index('category', append=True)
.unstack()
)
df3
amount
category actual budgeted
2018-01-31 3612 2183
2018-02-28 3357 8902
2018-03-30 2828 9956
2018-04-30 2990 14475
2018-05-31 4446 25385
2018-06-29 19119 29119
2018-07-31 27296 40869
2018-08-31 38443 43400
2018-09-28 47978 52686
2018-10-31 49612 63384
2018-11-30 49272 74107
2018-12-31 56907 83653
听起来你想要的是
pd.concat([budgetedDF, actualDF]).sort_values('months').reset_index(drop=True)
如何在普通月份值上进行合并?@coldspeed:当我在文章顶部阅读所需行为时,他们希望为月份的每个值保留单独的值。不,他们在脚本底部通过pd.merge(budgetedDF,actualDF,on='months')
对月份进行合并。合并后,每个数据帧的结果都有一行。这正是我阅读时他们试图避免的行为。不,请参阅。他们希望在普通月份合并,然后合并结果作为单独的行存在。他们甚至使用了一个明确的指示符列。“你可能更喜欢这个表述”后面的concat行不是我最终需要的表述,正如我在文章顶部所指出的。原因是我需要一个表示法,该表示法与altair希望从数据帧生成图形的方式非常友好。@Eric当然,这是您的选择。让你知道这些是你的选择。我不知道牵牛星,所以我也不知道,谢谢。非常感谢。我仍然不确定你的答案与另一个答案有什么不同。@Eric他们实际上是一样的,但我在他们的答案中涵盖了比fuglede更多的基础。