Python 读取文件以确定规则

Python 读取文件以确定规则,python,csv,Python,Csv,我有一个excel文件,其中包含用户定义的业务规则,如下所示: 列|名称|运算符|列|值1 |操作数|规则ID |结果 ABC |相等| 12 |和| 1 | 1 CDE |相等| 10 |和| 1 | 1 XYZ | Equal | AD | 1 | 1.5 ABC |相等| 11 |和| 2 | 1 CDE |相等| 10 | 2 | 1.2 等等。(仅出于格式化目的,请输入|符号) 输入文件(CSV)如下所示: ABC、CDE、XYZ 公元12年10月 公元11年10月 这里的目标是派

我有一个excel文件,其中包含用户定义的业务规则,如下所示:

列|名称|运算符|列|值1 |操作数|规则ID |结果
ABC |相等| 12 |和| 1 | 1
CDE |相等| 10 |和| 1 | 1
XYZ | Equal | AD | 1 | 1.5
ABC |相等| 11 |和| 2 | 1
CDE |相等| 10 | 2 | 1.2
等等。(仅出于格式化目的,请输入|符号)

输入文件(CSV)如下所示:

ABC、CDE、XYZ
公元12年10月
公元11年10月
这里的目标是派生一个名为Result的输出列,该列需要查找用户定义的业务规则excel

预期产出:

ABC,CDE,XYZ,Result
12,10,AD,1.5
11,10,AD,1.2
到目前为止,我尝试生成一个
if
语句,并尝试将整个
if/elif
语句分配给一个函数。这样我就可以把它传给下面的语句来应用规则

ouput_df['result'] = input_df.apply(result_func, axis=1)
当我有手动编码规则的功能时,其工作原理如下所示:

def result_func(input_df): 
    if (input_df['ABC'] == 12):
    return '1.25'
    elif (ip_df['ABC'] == 11):
    return '0.25'
    else:
    return '1'
这是处理这种情况的正确方法吗?如果是这样,我如何将整个动态生成的
If/elif
传递给函数?

code

import pandas as pd
import csv

# Load rules table
rules_table = []
with open('rules.csv') as csvfile:
  reader = csv.DictReader(csvfile, delimiter='|')
  for row in reader:
    rules_table.append([x.strip() for x in row.values()])

# Load CSV file into DataFrame
df = pd.read_csv('data.csv', sep=",")

def rules_eval(row, rules):
  " Steps through rules table for appropriate value "
  def operator_eval(op, col, value):
    if op == 'Equal':
      return str(row[col]) == str(value)
    else:
      # Curently only Equal supported
      raise ValueError(f"Unsupported Operator Value {op}, only Equal allowed")

  prev_rule = '~'
  for col, op, val, operand, rule, res in rules:
    # loop through rows of rule table
    if prev_rule != rule:
      # rule ID changed so we can follow rule chains again
      ignore_rule = False

    if not ignore_rule:
      if operator_eval(op, col, val):
        if operand != 'and':
          return res
      else:
        # Rule didn't work for an item in group
        # ignore subsequent rules with this id
        ignore_rule = True

    prev_rule = rule

  return None

df['results'] = df.apply(lambda row: rules_eval(row, rules_table), axis=1)
print(df)
输出

   ABC  CDE XYZ results
0   12   10  AD     1.5
1   11   10  AD     1.2
解释

df.apply-将
规则评估
函数应用于数据帧的每一行

通过将输出放入“结果”列

df['result'] = ...
处理规则优先级

改变

向rules_表中添加了优先级列,以便按优先级顺序处理具有相同RuleID的规则

优先级顺序由添加到堆中的元组顺序决定,当前

Priority, Column_Name, Operator, Column_Value, Operand, RuleID, Result
代码

规则表

输出


谢谢你的回复,DarrylG。但我这里的问题是读取excel文件以找出将其转换为If语句的规则。之后,如何将其传递给函数。例如,在我的第一篇文章中,如果您看到excel业务规则,将动态获取的if条件如下所示,if('ABC'==12和'CDE']==10和XYZ='AD'):return'1.25'elif('ABC'==11和'CDE'==10):return'1.2'。如果我能够通过读取excel动态生成这个。如何将此语句传递给函数?@pythoner——感谢您的解释。使用解析器
rules\u eval
更新了我的答案,该解析器遍历规则表以确定适当的值。非常感谢,您太棒了。我有几个问题,1。如果我的规则表是xlsx而不是csv,是否有等效的excel.DictReader?2.如果我引入一个新的字段-子规则来标识规则中的顺序,是否有一种方法可以基于子规则退出该规则,而不是检查“and”?@pythoner--1。建议尝试使用Excel工作簿(而不是CSV)。2.当然可以,但必须了解更多有关所需输入和行为的信息。@pythoner您是否在问如何有一个子规则列,例如子规则3,1,2,2,1,它与RuleID列一起,RuleID列的值为1,1,1,2,2?对于规则ID 1,我们有子规则序列3,1,2,对于规则2,我们有子规则序列2,1。我们希望按照子规则的顺序1、2、3应用1的规则。这就是您的意思吗?您可以查看一下,然后在业务规则和Python运算符之间创建一个映射:
ops={'Equal':operator.eq}
然后应用该函数。
import pandas as pd
import csv
from collections import namedtuple
from heapq import (heappush, heappop)

# Load CSV file into DataFrame
df = pd.read_csv('data.csv', sep=",")

class RulesEngine():
  ###########################################
  # Static members
  ###########################################
  # Named tuple for rules
  fieldnames = 'Column_Name|Operator|Column_Value1|Operand|RuleID|Priority|Result'
  Rule = namedtuple('Rule', fieldnames.replace('|', ' '))
  number_fields = fieldnames.count('|') + 1

  ###########################################
  # members
  ###########################################
  def __init__(self, table_file):
    # Load rules table
    rules_table = []
    with open(table_file) as csvfile:
      reader = csv.DictReader(csvfile, delimiter='|')
      for row in reader:
        fields = [self.convert(x.strip()) for x in row.values() if x is not None]
        if len(fields) != self.number_fields:
          # Incorrect number of values
          error = f"Rules require {self.number_fields} fields per row, was given {len(fields)}"
          raise ValueError(error)

        rules_table.append([self.convert(x.strip()) for x in row.values()])
        #rules_table.append([x.strip() for x in row.values()])
    self.rules_table = rules_table

  def convert(self, s):
    " Convert string to (int, float, or leave current value) "
    try:
      return int(s)
    except ValueError:
      try:
        return float(s)
      except ValueError:
        return s

  def operator_eval(self, row, rule):
    " Determines value for a rule "
    if rule.Operator == 'Equal':
      return str(row[rule.Column_Name]) == str(rule.Column_Value1)
    else:
      # Curently only Equal supported
      error = f"Unsupported Operator {rule.Operator}, only Equal allowed"
      raise ValueError(error)

  def get_rule_value(self, row, rule_queue):
    " Value of a rule or None if no matching rule "
    found_match = True
    while rule_queue:
      priority, rule_to_process = heappop(rule_queue)

      if not self.operator_eval(row, rule_to_process):
        found_match = False
        break

    return rule_to_process.Result if found_match else None

  def rules_eval(self, row):
    " Steps through rules table for appropriate value "
    rule_queue = []
    for index, r in enumerate(self.rules_table):
      # Create named tuple with current rule values
      current_rule = self.Rule(*r)

      if not rule_queue or \
          rule_queue[-1][1].RuleID == current_rule.RuleID:
        # note: rule_queue[-1][1].RuleID is previous rule
        # Within same rule group or last rule of group
        priority = current_rule.Priority

        # heap orders rules by pririty 
        #   (lowest numbers are processed first)
        heappush(rule_queue, (priority, current_rule))

        if index < len(self.rules_table)-1:
          continue    # not at last rule, so keep accumulating

      # Process rules in the rules queue
      rule_value = self.get_rule_value(row, rule_queue)
      if rule_value:
        return rule_value
      else:
        # Starting over with new rule group
        rule_queue = []
        priority = current_rule.Priority
        heappush(rule_queue, (priority, current_rule))

    # Process Final queue if not empty
    return self.get_rule_value(row, rule_queue)

# Init rules engine with rules from CSV file
rules_engine = RulesEngine('rules.csv')

df['results'] = df.apply(rules_engine.rules_eval, axis=1)
print(df)
ABC,CDE,XYZ
12,10,AD
11,10,AD
12,12,AA
Column_Name|Operator|Column_Value1|Operand|RuleID|Priority|Result
ABC        |   Equal|           12|    and|     1|        2|1
CDE        |   Equal|           10|    and|     1|        1|1
XYZ        |   Equal|           AD|    and|     1|        3|1.5
ABC        |   Equal|           11|    and|     2|        1|1
CDE        |   Equal|           10|    foo|     2|        2|1.2
ABC        |   Equal|           12|    foo|     3|        1|1.8
 ABC  CDE XYZ results
0   12   10  AD     1.5
1   11   10  AD     1.2
2   12   12  AA     1.8