Python 如何基于带条件的过滤数据帧派生列
上述代码产生两个数据帧 这是数据框中名为top data的数据示例:Python 如何基于带条件的过滤数据帧派生列,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,上述代码产生两个数据帧 这是数据框中名为top data的数据示例: import easygui as gui import pandas as pd filename = gui.fileopenbox(msg='Please choose the Excel workbook containing the bank data.') #select workbook containing FC and WF data colnames=['1','2','3','4','5','6','7'
import easygui as gui
import pandas as pd
filename = gui.fileopenbox(msg='Please choose the Excel workbook containing the bank data.') #select workbook containing FC and WF data
colnames=['1','2','3','4','5','6','7','8','9','10','11','12'] #define col names because variable number of col won't read unless max col# is defined
dfdata = pd.read_csv(filename,names=colnames) #set dataframe equal to csv file
key = dfdata["12"].isnull() #set criteria for splitting data equal to null value in column 12
dftopdata = dfdata.loc[key] #set new df equal to key criteria
dfbottomdata = dfdata.loc[~key] #set new df NOT equal to key criteria
dftopdata = dftopdata.dropna(axis=1, how='all', thresh=None, subset=None) #drop any column with all values = NaN
dftopdata = dftopdata.dropna(axis=0, how='all', thresh=None, subset=None) #drop any row with all values = NaN
header = dftopdata.iloc[1] #Creates a header variable at row index location 1
dftopdata = dftopdata[2:] #Resets dataframe equal to row 2 and beyond
dftopdata.rename(columns = header, inplace = True) #sets names of columns in the dataframe equal to header
header = dfbottomdata.iloc[0] #Creates a header variable at row index location 0
dfbottomdata = dfbottomdata[1:] #Resets dataframe equal to row 1 and beyond
dfbottomdata.rename(columns = header, inplace = True) #sets names of columns in the dataframe equal to header
这是来自数据框的数据样本,称为底部数据:
Routing Currency Account Number Account Name Opening Ledger Credits Amt Credits Num Debits Amt Debits Num Closing Ledger
123456789 USD 1111111112 A 717.57 100.00 1 100.72 3 716.85
123456789 USD 1111111113 B 1,350.30 NaN 0 28.53 1 1,321.77
123456789 USD 1111111114 C 26,570.34 320.52 1 42.17 1 26,848.69
123456789 USD 1111111115 D 1,031.95 2,000.00 1 703.95 2 2,328.00
123456789 USD 1111111116 E 1,000.00 600.00 2 72.03 2 1,527.97
我想在底部数据df中添加一个名为Balance的新列,该列包含每个银行帐户的余额
底部数据df中给定银行账户最早交易日期的余额应等于第一个数据框中该银行账户的期初分类账价值加上底部数据df中该行的任何贷方或借方
给定银行账户的每个后续交易应等于前一交易日期的余额加上底部数据df行中的任何贷方或借方
以下是我希望底部数据df在分析后的处理方式:
Date Routing Currency Account Number Account Name BAI Type BAI Code CR Amount DB Amount Serial Num Ref Num Description
12/10/2019 123456789 USD 1111111112 A Miscellaneous Fees 7 NaN 28.69 NaN 69650977 MTHLY ANALYSIS CHARGE
12/20/2019 123456789 USD 1111111112 A Misc Credit 1 100 NaN NaN 70069250 XFR TO DDA FR DDA 001111085716122019RF#1452300...
12/24/2019 123456789 USD 1111111112 A Misc Debit 4 NaN 69.08 NaN 70184768 ACCESSIBLEINSURA WEBPAYMENTPCOF PROPERTIES SERIES
12/24/2019 123456789 USD 1111111112 A Misc Debit 5 NaN 2.95 NaN 70184769 SEP INSURANC ACH WEBPAYMENTPCOF PROPERTIES SERIES
12/10/2019 123456789 USD 1111111113 B Miscellaneous Fees 6 NaN 28.53 NaN 69645166 MTHLY ANALYSIS CHARGE
但我不知道下一步该怎么办
我曾想过为每个银行账户创建一个数据框架,但这似乎效率低下
有人能给我指出正确的方向吗 假设dfbottomdata按日期、路由和帐号从最小值到最大值的升序排序,那么下面的代码应该可以工作:
从dftopdata中添加期末分类账价值
dfbottomdata=dfbottomdata.mergedftopdata[['Routing','Account Number','Open Ledger']],on=['Routing','Account Number']
dfbottomdata.renamecolumns={‘期初分类账’:‘余额’},inplace=True
将NaN替换为0进行计算
dfbottomdata['CR Amount'].fillna0,inplace=True
dfbottomdata['DB Amount'].fillna0,inplace=True
处理第一行的用例
dfbottomdata.loc[0',Balance']=dfbottomdata.loc[0',Balance']+dfbottomdata.loc[0',CR Amount']-dfbottomdata.loc[0',DB Amount']
迭代每一行,仅当前一行路由/AccountNumber匹配时应用逻辑
对于范围1中的i,lendfbottomdata:
如果dfbottomdata.loc[i-1',Routing']==dfbottomdata.loc[i',Routing']&dfbottomdata.loc[i-1',账号']==dfbottomdata.loc[i',账号']:
dfbottomdata.loc[i,'余额']=dfbottomdata.loc[i-1,'余额']+dfbottomdata.loc[i,'CR金额']-dfbottomdata.loc[i,'DB金额']
其他:
dfbottomdata.loc[i,'余额']=dfbottomdata.loc[i,'余额']+dfbottomdata.loc[i,'CR金额']-dfbottomdata.loc[i,'DB金额']
请将您的示例数据编辑到您的问题文本中,而不是作为图像,以便我们可以重新还原。您可以提供预期的输出数据框架吗?您到底做了什么来解决这个问题?你不能指望别人为你做任何事,对吧?我已经试着回答你所有的评论。如果我能进一步澄清,请告诉我!非常感谢你的回答!我现在正在研究它,看看它是否完成了我需要它做的事情。如果是的话,我会标记为已回答,如果它没有达到我希望的效果,我会提供澄清。它成功了!在你的回答中,我唯一需要改变的是从结帐到开帐。谢谢
Date Routing Currency Account Number Account Name BAI Type BAI Code CR Amount DB Amount Serial Num Ref Num Description Balance
12/10/2019 123456789 USD 1111111112 A Miscellaneous Fees 7 NaN 28.69 NaN 69650977 MTHLY ANALYSIS CHARGE 688.88
12/20/2019 123456789 USD 1111111112 A Misc Credit 1 100 NaN NaN 70069250 XFR TO DDA FR DDA 001111085716122019RF#1452300... 788.88
12/24/2019 123456789 USD 1111111112 A Misc Debit 4 NaN 69.08 NaN 70184768 ACCESSIBLEINSURA WEBPAYMENTPCOF PROPERTIES SERIES 719.80
12/24/2019 123456789 USD 1111111112 A Misc Debit 5 NaN 2.95 NaN 70184769 SEP INSURANC ACH WEBPAYMENTPCOF PROPERTIES SERIES 716.85
12/10/2019 123456789 USD 1111111113 B Miscellaneous Fees 6 NaN 28.53 NaN 69645166 MTHLY ANALYSIS CHARGE 1321.77