Python 熊猫-如何从同一列上的数据帧中获得差异

Python 熊猫-如何从同一列上的数据帧中获得差异,python,pandas,dataframe,Python,Pandas,Dataframe,除了“value”列之外,我还有两个相同的数据帧,需要根据year+name+month列获得“value”列上两个数据帧的差异,并将其附加到数据集 x1 = { "year": ["2018", "2018", "2018", "2018", "2018", "2018"], "name": ["abc", "xyz", "pqr", "stu", "hij", "efg"], "month": ["Jan-18", "Feb-18", "Mar-18", "Apr-1

除了“value”列之外,我还有两个相同的数据帧,需要根据year+name+month列获得“value”列上两个数据帧的差异,并将其附加到数据集

x1 = {
    "year": ["2018", "2018", "2018", "2018", "2018", "2018"],
    "name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
    "month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
    "value": [100, 200, 300, 400, 500, 600],
}
x2 = {
    "year": ["2019", "2019", "2019", "2019", "2019", "2019"],
    "name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
    "month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
    "value": [700, 300, 200, 500, 600, 100],
}
y1 = pd.DataFrame(x1).append(pd.DataFrame(x2), ignore_index=True)

print(y1)
结果应该类似于第12行和第13行

    year name   month  value
0   2018  abc  Jan-18    100
1   2018  xyz  Feb-18    200
...
...
6   2019  abc  Jan-18    700
7   2019  xyz  Feb-18    300
...
...
12   diff  abc  Jan-18    (700-100)
13   diff  xyz  Feb-18    (300-200)

检查
groupby
diff
后的
sort\u值

y2=y1.copy()
y2=y2.sort_values('year')
y2['value']=y2.groupby(['name','month']).value.diff()
y1=y1.append(y2.dropna().assign(year='diff'))
y1
    year name   month  value
0   2018  abc  Jan-18  100.0
1   2018  xyz  Feb-18  200.0
2   2018  pqr  Mar-18  300.0
3   2018  stu  Apr-18  400.0
4   2018  hij  May-18  500.0
5   2018  efg  Jun-18  600.0
6   2019  abc  Jan-18  700.0
7   2019  xyz  Feb-18  300.0
8   2019  pqr  Mar-18  200.0
9   2019  stu  Apr-18  500.0
10  2019  hij  May-18  600.0
11  2019  efg  Jun-18  100.0
6   diff  abc  Jan-18  600.0
7   diff  xyz  Feb-18  100.0
8   diff  pqr  Mar-18 -100.0
9   diff  stu  Apr-18  100.0
10  diff  hij  May-18  100.0
11  diff  efg  Jun-18 -500.0

首先,当您想将/concat两个数据帧附加在彼此之上时,请尝试使用
pd.concat

其次,我们可以使用
df.groupby.diff()
来计算组中的差异

y1 = pd.concat([x1, x2], ignore_index=True)

y1['difference'] = abs(y1.groupby(['name', 'month']).value.diff())

print(y1)
    year name   month  value  difference
0   2018  abc  Jan-18    100         NaN
1   2018  xyz  Feb-18    200         NaN
2   2018  pqr  Mar-18    300         NaN
3   2018  stu  Apr-18    400         NaN
4   2018  hij  May-18    500         NaN
5   2018  efg  Jun-18    600         NaN
6   2019  abc  Jan-18    700       600.0
7   2019  xyz  Feb-18    300       100.0
8   2019  pqr  Mar-18    200       100.0
9   2019  stu  Apr-18    500       100.0
10  2019  hij  May-18    600       100.0
11  2019  efg  Jun-18    100       500.0
您可以尝试以下方法:

df=X1.append(X2)
for i in X1.name:
    v1=X1.loc[(X1.name==i),'value']
    v2=X2.loc[(X2.name==i),'value']
    vdiff=v2-v1
    d=X1.loc[(X1.name==i),'month']
    df.append({'year':'diff','name':i,'month':d,'value':vdiff},
    ignore_index=True)
df=X1.append(X2)
for i in X1.name:
    v1=X1.loc[(X1.name==i),'value']
    v2=X2.loc[(X2.name==i),'value']
    vdiff=v2-v1
    d=X1.loc[(X1.name==i),'month']
    df.append({'year':'diff','name':i,'month':d,'value':vdiff},
    ignore_index=True)