Python 如何更新';余额';基于数据框中其他列值的列
我有以下数据帧Python 如何更新';余额';基于数据框中其他列值的列,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有以下数据帧 Date Status Amount Balance 0 06-10-2000 Deposit 40.00 40.0 1 09-12-2002 Withdraw 1000.00 NaN 2 27-06-2001 Deposit 47.00 NaN 3 07-12-2021 Withdraw 100.00 NaN 4 06-10-2022 Deposit 120.00
Date Status Amount Balance
0 06-10-2000 Deposit 40.00 40.0
1 09-12-2002 Withdraw 1000.00 NaN
2 27-06-2001 Deposit 47.00 NaN
3 07-12-2021 Withdraw 100.00 NaN
4 06-10-2022 Deposit 120.00 NaN
5 06-10-2000 Deposit 40.00 NaN
6 09-12-2024 Withdraw 50.00 NaN
目标是根据存款还是取款更新余额,初始余额=起始金额。因此,将其硬编码为40.0
下面是我的代码,不知怎的,我没有得到预期的结果
预期结果:
Date Status Amount Balance
0 06-10-2000 Deposit 40.00 40.0
1 09-12-2002 Withdraw 1000.00 -960.0
2 27-06-2001 Deposit 47.00 -913.0
3 07-12-2021 Withdraw 100.00 -1013.0
4 06-10-2022 Deposit 120.00 -893.0
5 06-10-2000 Deposit 40.00 -853.0
6 09-12-2024 Withdraw 50.00 -903.0
代码中我做错了什么,代码如下
import pandas as pd
with open(r"transactions.txt", "r") as Account:
details = Account.read().split(",")
print("details of txt",details)
df=pd.DataFrame(details)
fg=df[0].str.extract('(?P<Date>.*) (?P<Status>.*) (?P<Amount>.*)')
print(fg)
fg['Amount'] = fg.Amount.str.replace('$','') #removing $ sign
#setting first row value of balance as 40, as equal to amount in 1st row
fg.loc[fg.index[0], 'Balance'] = 40.00
print(fg)
for index, row in fg.iterrows():
if index==0:
continue
if fg.loc[index,'Status']=='Deposit':
print("reached here")
fg.at[float(index),'Balance']=sum(fg.loc[float(index),'Amount'],fg.loc[float(index-1),'Balance'])
elif fg.loc[index,'Status']=='withdraw':
fg.at[float(index),'Balance']=fg.loc[float(index),'Amount']-fg.loc[float(index-1),'Balance']
print(fg)
将熊猫作为pd导入
以未结(r“transactions.txt”、“r”)作为账户:
详细信息=Account.read().split(“,”)
打印(“txt的详细信息”,详细信息)
df=pd.DataFrame(详细信息)
fg=df[0].str.extract('(?P.*)(?P.*)(?P.*))
打印(前景)
fg['Amount']=fg.Amount.str.replace('$,'')#删除$符号
#将余额的第一行值设置为40,等于第一行的金额
最终位置[最终索引[0],“余额”]=40.00
打印(前景)
对于索引,fg.iterrows()中的行:
如果索引==0:
持续
如果fg.loc[索引,'Status']=“存款”:
打印(“到达此处”)
fg.at[浮动(指数),“余额”]=总和(fg.loc[浮动(指数),“金额”],fg.loc[浮动(指数-1),“余额])
elif fg.loc[索引,'Status']=='draw':
fg.at[浮动(指数),“余额”]=fg.loc[浮动(指数),“金额”]-fg.loc[浮动(指数-1),“余额”]
打印(前景)
IIUC,np.where
和cumsum
df['Balance'] = np.where(df['Status'].eq('Deposit'),df['Amount'], df['Amount'] * -1)
df['Balance'] = df['Balance'].cumsum()
Date Status Amount Balance
0 06-10-2000 Deposit 40.0 40.0
1 09-12-2002 Withdraw 1000.0 -960.0
2 27-06-2001 Deposit 47.0 -913.0
3 07-12-2021 Withdraw 100.0 -1013.0
4 06-10-2022 Deposit 120.0 -893.0
5 06-10-2000 Deposit 40.0 -853.0
6 09-12-2024 Withdraw 50.0 -903.0
我无法复制解决方案。你的意思是说我根本不需要在数据帧上循环?如果不是,您的解决方案建议如何以及在哪里适合我的上述for循环和If部分?请澄清。@user10083444在pandas中,我们在跨列应用矢量化解决方案时避免循环,我使用了您的源数据,您可以在fg['Amount']=fg.Amount.str.replace('$,'')
之后删除所有内容,但请确保Amount是一个整数print(df.dtypes)
Understanderstand@datanovel。感谢您提供了这个优雅的解决方案。我想我以后将不再在数据帧上使用for循环。@user10083444这可能会让人困惑,因为当你学习python时,你会学习如何使用IF语句和for循环,但它们不适合pandas API,后者具有速度和性能。你知道np.
和cumsum
在哪里工作吗?是的,现在我知道了。谷歌搜索了一下,并阅读了scipy文档。也发现了这一极好的解释。