Python 在Pivot和x27之间添加计算字段;列';基于';价值观';数据
我有一份报告,我正在工作,以显示两个季度之间的差异。我有一个SQL查询,我正在读取到一个数据帧中,然后进行数据透视 这是我的代码:Python 在Pivot和x27之间添加计算字段;列';基于';价值观';数据,python,pandas,pivot-table,Python,Pandas,Pivot Table,我有一份报告,我正在工作,以显示两个季度之间的差异。我有一个SQL查询,我正在读取到一个数据帧中,然后进行数据透视 这是我的代码: df = pd.read_sql_query(mtd_query, cnxn, params=[report_start, end_mtd, report_start, end_mtd, whse]) ##(m-1)//3 + 1 Determine which Quarter each month is ## Create the "Pe
df = pd.read_sql_query(mtd_query, cnxn, params=[report_start, end_mtd, report_start, end_mtd, whse])
##(m-1)//3 + 1 Determine which Quarter each month is
## Create the "Period" column by combining the Quater and the Month
df['QUARTER'] = (df['INV_MONTH'].astype(int) - 1)//3 + 1
df['PERIOD'] = df['INV_YEAR'].astype(str) + 'Q' + df['QUARTER'].astype(int).astype(str)
df['MARGIN'] = (df['PROFIT'].astype(float) / df['SALES'].astype(float))
df = df.drop('INV_MONTH', axis=1)
df = df.drop('INV_YEAR', axis=1)
df = pd.pivot_table(df, index=['REP', 'REP_NAME', 'CUST_NO', 'CUST_NAME', 'TOTALSALES'], columns=['PERIOD'], values=['SALES', 'PROFIT', 'MARGIN'], fill_value=0)
df = df.reorder_levels([1, 0], axis=1).sort_index(axis=1, ascending=False)
df = df.sortlevel(level=0, ascending=True)
我试图确定“期间”和“保证金”列之间的差异。我一直找不到任何方法来实现这一点。如有任何建议,我们将不胜感激
当前输出显示:
PERIOD 2017Q4 2017Q3 2017Q2 2017Q1 2016Q4
SALES PROFIT MARGIN SALES PROFIT MARGIN SALES PROFIT MARGIN SALES PROFIT MARGIN SALES PROFIT MARGIN
REP REP_NAME CUST_NO CUST_NAME TOTALSALES
1.0 Greensboro - House 245.0 TE CONNECTIVITY CORPORATION 103361.05 0.000000 0.000000 0.000000 434.500000 69.520000 0.160000 20391.666667 3262.666667 0.160000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1789.0 GOOD HOUSEKEEPER 50108.47 678.508182 80.170909 0.145883 585.301429 64.180476 0.121915 718.685000 92.033125 0.130453 720.729333 97.955333 0.134821 1237.308333 88.210000 0.099450
所需的输出如下所示:
PERIOD 2017Q4 2017Q3 2017Q2 2017Q1 2016Q4
SALES PROFIT MARGIN VARIANCE SALES PROFIT MARGIN VARIANCE SALES PROFIT MARGIN VARIANCE SALES PROFIT MARGIN VARIANCE SALES PROFIT MARGIN
REP REP_NAME CUST_NO CUST_NAME TOTALSALES
1.0 Greensboro - House 245.0 TE CONNECTIVITY CORPORATION 103361.05 0.000000 0.000000 0.000000 -.16 434.500000 69.520000 0.160000 0 20391.666667 3262.666667 0.160000 .16 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000
1789.0 GOOD HOUSEKEEPER 50108.47 678.508182 80.170909 0.145883 .023968 585.301429 64.180476 0.121915 -0.008537 718.685000 92.033125 0.130453 -.004368 720.729333 97.955333 0.134821 .035372 1237.308333 88.210000 0.099450
如下所示:
IIUC:
资料来源:
In [60]: df
Out[60]:
2016Q4 2017Q1 2017Q2 \
MARGIN PROFIT SALES MARGIN PROFIT SALES MARGIN PROFIT
0 0.0 0.00 0.000000 0.0 0.000000 0.0 0.16 3262.666667
1 NaN 88.21 1237.308333 NaN 97.955333 NaN NaN NaN
2017Q3 2017Q4
SALES MARGIN PROFIT SALES MARGIN PROFIT SALES
0 20391.666667 0.160000 69.520000 434.5 0.0 0.0 0.0
1 718.685000 0.121915 64.180476 NaN NaN NaN NaN
解决方案:
In [61]: tmp = (df.loc[:, pd.IndexSlice[:, 'MARGIN']]
...: .fillna(0)
...: .diff(axis=1)
...: .rename(columns=lambda x: 'VARIANCE' if x=='MARGIN' else x))
...:
In [62]: pd.concat([df, tmp], axis=1).sort_index(axis=1)
Out[62]:
2016Q4 2017Q1 2017Q2 \
MARGIN PROFIT SALES VARIANCE MARGIN PROFIT SALES VARIANCE MARGIN
0 0.0 0.00 0.000000 NaN 0.0 0.000000 0.0 0.0 0.16
1 NaN 88.21 1237.308333 NaN NaN 97.955333 NaN 0.0 NaN
2017Q3 \
PROFIT SALES VARIANCE MARGIN PROFIT SALES VARIANCE
0 3262.666667 20391.666667 0.16 0.160000 69.520000 434.5 0.000000
1 NaN 718.685000 0.00 0.121915 64.180476 NaN 0.121915
2017Q4
MARGIN PROFIT SALES VARIANCE
0 0.0 0.0 0.0 -0.160000
1 NaN NaN NaN -0.121915
我无法使用上述解决方案。
所以我这样做了:
df['MARGIN'] = (df['PROFIT'].astype(float) / df['SALES'].astype(float))
df['MARGIN'] = df['MARGIN'].astype(float)
df['PREV_MARGIN'] = df['MARGIN'].shift(-1)
df['VARIANCE'] = df['MARGIN'] - df['PREV_MARGIN']
df = df.drop('PREV_MARGIN', axis=1)
这为我提供了完成工作所需的数据 @MaxU Done。。。很抱歉,我需要手动修改:)请您为您的源(数据透视)df发布一个
print(df.to_dict('r'))
输出,因为我们需要相当长的时间来复制这个多索引、多列df…@MaxU done。再次感谢!它不喜欢df.loc,所以我尝试了df.iloc,现在我在切片上遇到了错误:TypeError:unorderable types:Slice()>=str()
,我不知道如何解决这个问题。尝试在边距列上执行.asype(float)
,但没有成功。
df['MARGIN'] = (df['PROFIT'].astype(float) / df['SALES'].astype(float))
df['MARGIN'] = df['MARGIN'].astype(float)
df['PREV_MARGIN'] = df['MARGIN'].shift(-1)
df['VARIANCE'] = df['MARGIN'] - df['PREV_MARGIN']
df = df.drop('PREV_MARGIN', axis=1)