Python 复杂逻辑与熊猫动态规划

Python 复杂逻辑与熊猫动态规划,python,pandas,dynamic,logic,Python,Pandas,Dynamic,Logic,我的目标是将两列合并为第三列“优先级”步骤1。接下来,我在新的“优先级”列第2步中计算组合值的每个实例。然后,我筛选出组合值即“优先级”为1的实例(步骤3)。接下来,如果我们在步骤2中创建的列的组合值计数大于步骤4中的1,我将删除“WO_Stat”列中已取消的每一行 我相信前面的步骤我做的是正确的;在我的代码注释中,我在上面迷失的地方做了注释,这在9.24中运行良好,但不确定它是否有意义,还需要在下面进行操作 在下面的步骤中,我最需要帮助 步骤5:对于计数超过1的“优先级”中的值,删除其“订单数

我的目标是将两列合并为第三列“优先级”步骤1。接下来,我在新的“优先级”列第2步中计算组合值的每个实例。然后,我筛选出组合值即“优先级”为1的实例(步骤3)。接下来,如果我们在步骤2中创建的列的组合值计数大于步骤4中的1,我将删除“WO_Stat”列中已取消的每一行

我相信前面的步骤我做的是正确的;在我的代码注释中,我在上面迷失的地方做了注释,这在9.24中运行良好,但不确定它是否有意义,还需要在下面进行操作

在下面的步骤中,我最需要帮助

步骤5:对于计数超过1的“优先级”中的值,删除其“订单数量”小于16的行,仅当认为同一“优先级值”的另一个“订单数量”大于99时。请注意,每个优先级值可能最多有10个计数,因此如果订单数量为10,10,9,8200020003000300,您可能只会删除4个

如果您无法帮助处理逻辑,甚至只是帮助加快代码的运行速度,那么处理40k行数据几乎需要一个小时。也许我可以包括动态编程或格式列数据类型更好

import pandas as pd
import numpy as np
from numpy import NaN
df = pd.read_excel("ors_final.xlsx", encoding = "ISO-8859-1", dtype=object) #used to read xls file named vlookuped but now changed to ors_final as of 2.20.19
df['Priority']= df['Priority'].astype('str')
df['Cust_PO_Number']= df['Cust_PO_Number'].astype('str')
df['Cust_PO_Number']= df['Cust_PO_Number'].astype('str')
df['Item_Number']= df['Item_Number'].astype('str')
df['Sub_Priority']= df['Sub_Priority'].astype('str')
# creating second df
df_deleted = df.copy(deep=True)
df_deleted.drop(df.index,inplace=True)
# creating variable for small value first art
LOWER_THRESHOLD = 16
#
print("1. combine po number and item number")
for i, row in df.iterrows(): #iterate through each row with with row value and row content
    a = str(row['Cust_PO_Number'])
    b = str(row['Item_Number'])

    concat = a + b

    df.set_value(i, 'Priority', concat)
#worked 9.23
print('2. Count all the duplicates of the combined values above')
seen = {}
for i, row in df.iterrows(): # now we will count the combined values, not dict keys cant have dupe values
    c = row['Priority']

    if c not in seen: # have not seen the letter before, we need to establish this
        seen [c] = 0

    seen[c] += 1 # Seen concatted values once, add one.
for i, row in df.iterrows(): #put the recorded numbers in, now we loop thorugh each row to get the value of c to call it as it's key (dict) value
    c = row['Priority']

    times_seen = seen[c]

    df.set_value(i, 'Mfg_Co', times_seen)
print("3. Ignore instances of rowes  where concat is not one")
for i, row in df.iterrows():
      d = row['Mfg_Co']
      if d == 1.0:
          df.set_value(i,'Sub_Priority',True)
      else:
          df.set_value(i,'Sub_Priority',False)

print('4. Delete all rows where orders are cancelled but concated column is more than 1')
delete_these = []
for i, row in df.iterrows():
      f = row['WO_Stat']
      d = row['Sub_Priority']

      if str(f) == 'Cancelled' and d != True:
          delete_these.append(i)
          df_deleted = df_deleted.append(row) # this does not append dataframe yet looking into 9.23

df.drop(delete_these, axis=0, inplace=True)

#above this was working 9.24 but had not tested the data integrity , looked pretty good tho
over_numbers = {}
for i, row in df.iterrows(): #determine if its over a number, still working out kinks 9.24
      c = row['Priority']
      g = row['Order_Qty']

      if float(g) > float(99):
          over_numbers[c] = True
#little confused on below on
print('step 5')
for i, row in df.iterrows(): # storing the numbers over 99
    c = row['Priority']

    if c in over_numbers:
        df.set_value(i, 'Comments_Status',True)
    else:
        df.set_value(i,'Comments_Status',False)
#above, this was working fine 9.24 but not sure if it makes sense, also need to work on below
## 
delete_these = []

for i, row in df.iterrows(): # Remove all rows that have over_number = True and also number less than 16
    d = row['Sub_Priority'] # should this be changed?
    f = row['Comments_Status']

    if d <= LOWER_THRESHOLD and f is True: # so grouping 1st arts
        delete_these.append(i) # store row number to drop later
        df_deleted = df_deleted.append(row) # Add the row to other dataframe

df.drop(delete_these, axis=0, inplace=True)

#step 5 was not working as of 10.2, it was breaking out the first article data wrong

writer = pd.ExcelWriter('1start.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()

writer = pd.ExcelWriter('deleted1start.xlsx', engine='xlsxwriter')
df_deleted.to_excel(writer, sheet_name='Sheet1')
writer.save()
请查看上面的示例数据框和我的逐步目标: 步骤1:将A列和B列合并为A列和B列 第2步:计算“ColumnA_B”中值的每个实例 步骤3筛选出“ColumnA_B”中只有一个值实例的行 步骤4:删除“状态”列中已取消的每一行,仅删除其中已取消的行-在应用步骤三筛选器时,可能有一些行在A_B列中的值相同,但状态值不同 步骤5:对于“Column_A_B”过滤器仍处于打开状态,即过滤出计数为1,查看冗余值,因此当您对“Column_A_B_B”中的值进行计数时,将为2或更大,然后对于所述分组计数,查看“数量”列。如果此组的数量小于16且大于99,则仅删除“数量”为16的行。如果分组的所有数量都小于99,则不删除任何内容;如果所有“数量”值都大于99,则不删除任何内容

该逻辑的结果将是:

import pandas as pd

    goaldf = pd.DataFrame({'Column_A':['test1', 'test4', 'test6', 'test6', 'test7'],'Column_B':['WO1', 'WO6', 'WO6', 'WO6', 'WO7'],
                   'Column_A_B': ['test1W01','test4WO6','test6WO6','test6WO6', 'test7WO7'], 'Satus': ['Cancelled', 'Active', 'Open', 'Active', 'Active'],
                   'Qty': ['12', '3000', '14', '88', '1500']})

我支持@PeterLeimbigler的评论,但我会提出一些总体建议,以帮助您编写代码。我建议只有在绝对必要的情况下才使用iter,我发现它比标准的熊猫方法慢得多。下面是我要做的一些修改

#To concat two columns into one as a string type 
df["NewCol"] = df["Col1"].astype(str) + df["Col2"].astype(str) # assigns the concated values to the new column instead of iterating over each row, much faster this way

# To get assign count column with your data giving you a by row count of how many times NewCol's row value has been seen in total dataframe
df['Counts'] = df.groupby(['NewCol'])['NewCol'].transform('count') # The count ignores nan values 

# If your intent is to just compare two rows to get a count duplicate based on both columns, keep your data as ints and do this 
df['Counts'] = df.groupby(['col1', 'col2'])['coltocount'].transform('count')

# Alternate method to count values 
countcol1 = df['Col1'].value_counts 
counts = countcol1.to_dict() #converts to dict
df['Count'] = df['Col1'].map(counts) 

# To get true false values based on a specific column's data 
df["Truethiness"] = (df["ColToCompare"] == 1.0)  # This can be multiple conditions if need be. 

# To conditionally drop rows from a pandas dataframe
df = df.drop(df[<some condition>].index

# If you need to save the data from the conditional drop
df2 = df.drop(df[<Alternate condition of above>].index

您能否发布一些示例数据以及该数据产生的预期输出?没有这些,很难提供很多帮助。也请查看@PeterLeimbigler-非常感谢您的输入。我编辑了我的评论,并在下面添加了我认为您正在查找的示例输入数据和示例输出数据,以及您的示例提供的内容。我没有包含与示例df直接相关的代码,因为我不确定我的原始代码是否正确,所以我写出了该逻辑。请让我知道我是否可以进一步帮助您。@PeterLeimbigler非常抱歉,我理解您是否厌倦了我的来信,但我更新了测试df和目标df。很抱歉没有回答这个问题!感谢您提供示例数据和输出。但总的来说,我看不到一种干净、紧凑的方式将这些特定规则转换成代码。“johnnyb”的回答是关于我自己能够想出的所有改进…@PeterLeimbiglerq我理解并感谢您给我时间,实际上这只是第5步,我需要帮助筛选13以下的数量,当该组的数量也超过99时,我有另一个帖子:在样本数据上有直接代码,因此您可以轻松地在机器上运行它,并在空闲时粘贴,但如果您没有时间/不完全理解,我会理解。再次感谢您的反馈!
#To concat two columns into one as a string type 
df["NewCol"] = df["Col1"].astype(str) + df["Col2"].astype(str) # assigns the concated values to the new column instead of iterating over each row, much faster this way

# To get assign count column with your data giving you a by row count of how many times NewCol's row value has been seen in total dataframe
df['Counts'] = df.groupby(['NewCol'])['NewCol'].transform('count') # The count ignores nan values 

# If your intent is to just compare two rows to get a count duplicate based on both columns, keep your data as ints and do this 
df['Counts'] = df.groupby(['col1', 'col2'])['coltocount'].transform('count')

# Alternate method to count values 
countcol1 = df['Col1'].value_counts 
counts = countcol1.to_dict() #converts to dict
df['Count'] = df['Col1'].map(counts) 

# To get true false values based on a specific column's data 
df["Truethiness"] = (df["ColToCompare"] == 1.0)  # This can be multiple conditions if need be. 

# To conditionally drop rows from a pandas dataframe
df = df.drop(df[<some condition>].index

# If you need to save the data from the conditional drop
df2 = df.drop(df[<Alternate condition of above>].index