Python 当某些单元格值与其他值相加时,是否有解决方案
我正在使用python3和pandas包和SequenceMatcher来输出Python 当某些单元格值与其他值相加时,是否有解决方案,python,pandas,dataframe,sequencematcher,Python,Pandas,Dataframe,Sequencematcher,我正在使用python3和pandas包和SequenceMatcher来输出A.OUT-B.IN和A.OUT-C.IN的值。当程序计算最后两行的值时,始终显示ValueError:当使用iterable设置时,len键和值必须相等。我认为,因为代码和函数的部分有两个值,而不是一个。例如,当Iprint(df.loc[(df.StopName.isin([“B”])和(df.MO.str.contains(df.loc[index-1,“MO”]),“AmountOfInput”].sum()时
A.OUT-B.IN
和A.OUT-C.IN
的值。当程序计算最后两行的值时,始终显示ValueError:当使用iterable设置时,len键和值必须相等。我认为,因为代码和函数的部分有两个值,而不是一个。例如,当Iprint(df.loc[(df.StopName.isin([“B”])和(df.MO.str.contains(df.loc[index-1,“MO”]),“AmountOfInput”].sum()时
。它将显示具有相同(几乎90%相似性)MO的B
停止输入量的总和,该值为31526,另一个为第二个最后的值7074。这样就不能做SUB操作和弹出错误。我在谷歌上搜索并阅读了熊猫文档,但仍然找不到解决方案
这是数据帧
import pandas as pd
from difflib import SequenceMatcher
df = pd.DataFrame({'MO': ['510-20200701001', '510-20200701001', '510-20200701001', '510-20200701002', '510-20200701002', '510-20200701002', '510-20200701003', '510-20200701003', '510-20200701003', '510-20200701004', '510-20200701004', '510-20200701004', '510-20200701005', '510-20200701005', '510-20200701005', '510-20200701006', '510-20200701006', '510-20200701006', '510-20200701006_02', '510-20200701006_02', '510-20200701006_02', '510-20200701006_02_01', '510-20200701006_02_01'],
'StopName': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'B', 'C'],
'AmountOfInput': [21000, 22112, 22476, 12000, 12609, 12775, 15000, 15595, 15844, 600, 775, 790, 1000, 1149, 1176, 6000, 6225, 6289, 32180, 24452, 24859, 7074, 7271],
'AmountOfOutput': [22400, 22057, 22330, 12800, 12586, 12685, 16000, 15587, 15718, 800, 775, 783, 1200, 1139, 1162, 6400 ,6225, 6278, 32180, 24437, 24532, 6958, 7108],
'A.OUT-B.IN':['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'A.OUT-C.IN': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'Match':[True, True,True, True, True, True, True,True, True, True, True, True,True, True, True, True, True,True, True, True, True, False, False]})
这是过滤代码
for index, row in df.iterrows():
if index not in df.loc[df['Match'].isin([False]), "MO"].index:
if index in df.loc[(df.StopName.isin(["B"])) & (df["A.OUT-B.IN"].values == "")].index:
df.loc[index, "A.OUT-B.IN"] = df.loc[index-1, "AmountOfOutput"] - df.loc[index, "AmountOfInput"]
elif index in df.loc[(df.StopName.isin(["C"])) & (df["A.OUT-C.IN"].values == "")].index:
df.loc[index, "A.OUT-C.IN"] = df.loc[index-2, "AmountOfOutput"] - df.loc[index, "AmountOfInput"]
else:
ratio = SequenceMatcher(None, df.loc[index-1, "MO"], df.loc[index, "MO"]).ratio()
if ratio >= 0.9:
if "B" in df[df.MO.str.contains(df.loc[index, "MO"])]["StopName"].values:
df.loc[(df.StopName.isin(["B"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "A.OUT-B.IN"] = df.loc[(df.StopName.isin(["A"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "AmountOfOutput"].values - df.loc[(df.StopName.isin(["B"])) & (df.MO.str.contains(df.loc[index-1, "MO"])), "AmountOfInput"].sum()
df.loc[(df.StopName.isin(["B"])) & (df.MO.isin([df.loc[index, "MO"]])), "A.OUT-B.IN"] = df.loc[(df.StopName.isin(["B"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "A.OUT-B.IN"].values
elif "C" in df[df.MO.str.contains(df.loc[index, "MO"])]["StopName"].values:
df.loc[(df.StopName.isin(["C"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "A.OUT-C.IN"] = df.loc[(df.StopName.isin(["A"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "AmountOfOutput"].values - df.loc[(df.StopName.isin(["C"])) & (df.MO.str.contains(df.loc[index-1, "MO"])), "AmountOfInput"].sum()
df.loc[(df.StopName.isin(["C"])) & (df.MO.isin([df.loc[index, "MO"]])), "A.OUT-C.IN"] = df.loc[(df.StopName.isin(["C"])) & (df.MO.isin([df.loc[index-1, "MO"]])), "A.OUT-C.IN"].values