Python 将命令式算法转换为；“成长”；将表分解为纯函数_Python_Functional Programming_Software Design_Imperative Programming

Python 将命令式算法转换为；“成长”；将表分解为纯函数

python functional-programming

Python 将命令式算法转换为；“成长”；将表分解为纯函数,python,functional-programming,software-design,imperative-programming,Python,Functional Programming,Software Design,Imperative Programming,我的程序是用Python3编写的，在很多地方它都是从一个（非常大的）类似于表的数字数据结构开始的，并按照某种算法向其中添加列。（每个地方的算法都不同。）我试图将其转换为纯函数式方法，因为我遇到了命令式方法的问题（难以重用、难以记忆临时步骤、难以实现“惰性”计算、由于依赖状态而容易出现错误等）表类实现为一个字典字典：外部字典包含行，由row_id索引；内部包含一行中的值，由列\u title索引。该表的方法非常简单： # return the value at the specified ro

我的程序是用Python3编写的，在很多地方它都是从一个（非常大的）类似于表的数字数据结构开始的，并按照某种算法向其中添加列。（每个地方的算法都不同。）

我试图将其转换为纯函数式方法，因为我遇到了命令式方法的问题（难以重用、难以记忆临时步骤、难以实现“惰性”计算、由于依赖状态而容易出现错误等）

表

类实现为一个字典字典：外部字典包含行，由

row_id

索引；内部包含一行中的值，由

列\u title

索引。该表的方法非常简单：

# return the value at the specified row_id, column_title
get_value(self, row_id, column_title)

# return the inner dictionary representing row given by row_id
get_row(self, row_id) 

# add a column new_column_title, defined by func
# func signature must be: take a row and return a value
add_column(self, new_column_title, func)

到目前为止，我只是向原始表中添加了列，每个函数都将整个表作为参数。当我转向纯函数时，我必须使所有参数都不可变。因此，初始表变得不可变。任何附加列都将创建为独立列，并仅传递给需要它们的函数。一个典型的函数将获取初始表和一些已经创建的列，并返回一个新列

我遇到的问题是如何实现独立列（

column

）

我可以给他们每人编一本字典，但似乎很贵。事实上，如果我需要对每个逻辑行中的10个字段执行操作，我将需要进行10次字典查找。最重要的是，每列将同时包含键和值，大小将增加一倍

我可以将

列

制作成一个简单的列表，并在其中存储从row_id到数组索引的映射的引用。这样做的好处是，可以在与同一初始表相对应的所有列之间共享此映射，并且一旦查找一次，它也适用于所有列。但这是否会产生其他问题

如果我这样做，我是否可以更进一步，将映射实际存储在初始表本身中？我可以将

列

对象中的引用放回创建它们的初始表吗？这似乎与我想象的功能性工作方法非常不同，但我看不出它会导致什么问题，因为一切都是不可变的

一般来说，函数式方法不赞成在返回值中保留对其中一个参数的引用吗？它似乎不会破坏任何东西（比如优化或惰性评估），因为这个参数已经知道了。但也许我遗漏了什么。

以下是我将如何做到这一点：

从表类派生表类

每一行都应该是元组的一个子类

现在您不能修改表->不变性，太好了！下一步可以把每个函数都看作是一个应用于要生成一个新的表：

f T -> T'

这应该理解为应用表T上的函数f来产生一个新的表“T”。您还可以尝试客观化对数据的实际处理表数据，并将其视为应用或添加到桌子

这里最棒的是，加法可以是减法，而不是给你一种简单的建模方法。当你进入这种心态时，你的代码变得很容易推理，因为你没有可以推理的状态把事情搞砸了

下面是一个如何实现和处理表的示例在Python中以纯粹的函数方式构造。嗯，Python不是在中学习FP的最佳语言，因为它使必须进行编程。我认为哈斯克尔、F#或厄兰是更好的选择

class Table(frozenset):
    def __new__(cls, names, rows):
        return frozenset.__new__(cls, rows)

    def __init__(self, names, rows):
        frozenset.__init__(self, rows)
        self.names = names

def add_column(rows, func):
    return [row + (func(row, idx),) for (idx, row) in enumerate(rows)]

def table_process(t, (name, func)):
    return Table(
        t.names + (name,),
        add_column(t, lambda row, idx: func(row))
        )

def table_filter(t, (name, func)):
    names = t.names
    idx = names.index(name)
    return Table(
        names,
        [row for row in t if func(row[idx])]
        )

def table_rank(t, name):
    names = t.names
    idx = names.index(name)
    rows = sorted(t, key = lambda row: row[idx])
    return Table(
        names + ('rank',),
        add_column(rows, lambda row, idx: idx)
        )

def table_print(t):
    format_row = lambda r: ' '.join('%15s' % c for c in r)
    print format_row(t.names)
    print '\n'.join(format_row(row) for row in t)

if __name__ == '__main__':
    from random import randint
    cols = ('c1', 'c2', 'c3')
    T = Table(
        cols,
        [tuple(randint(0, 9) for x in cols) for x in range(10)]
        )
    table_print(T)

    # Columns to add to the table, this is a perfect fit for a
    # reduce. I'd honestly use a boring for loop instead, but reduce
    # is a perfect example for how in FP data and code "becomes one."
    # In fact, this whole program could have been written as just one
    # big reduce.
    actions = [
        ('max', max),
        ('min', min),
        ('sum', sum),
        ('avg', lambda r: sum(r) / float(len(r)))
        ]
    T = reduce(table_process, actions, T)
    table_print(T)

    # Ranking is different because it requires an ordering, which a
    # table does not have.
    T2 = table_rank(T, 'sum')
    table_print(T2)

    # Simple where filter: select * from T2 where c2 < 5.
    T3 = table_filter(T2, ('c2', lambda c: c < 5))
    table_print(T3)

类表（frozenset）：
定义新（cls、名称、行）：
返回冻结集。\uuuu新建\uuuu（cls，行）
定义初始化（self、名称、行）：
frozenset.\uuuuu init\uuuuuu（自，行）
self.names=名称
def add_列（行，func）：
为枚举（行）中的（idx，行）返回[row+（func（行，idx），）]
def表_进程（t，（名称，func））：
返回表(
t、 名称+（名称，），
添加_列（t，lambda行，idx:func（行））
)
def表_过滤器（t，（名称，func））：
names=t.names
idx=名称。索引（名称）
返回表(
姓名，
[t if func（行[idx]）中的行对行]
)
def表_等级（t，名称）：
names=t.names
idx=名称。索引（名称）
行=已排序（t，key=lambda行：行[idx]）
返回表(
姓名+（‘排名’，），
添加列（行、lambda行、idx:idx）
)
def表_打印（t）：
format_row=lambda r:“”。联接（“%15s”%c表示r中的c）
打印格式_行（t.names）
打印'\n'.join（为t中的行设置行（行）格式）
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
从随机导入randint
cols=（'c1'，'c2'，'c3'）
T=表(
科尔斯，
[tuple（randint（0,9）表示cols中的x）表示范围（10）中的x]
)
表(T)
#要添加到表中的列，这非常适合
#减少。老实说，我会使用一个无聊的for循环，但是
#是FP中数据和代码如何“合二为一”的完美示例
#事实上，整个程序本来可以写成一个
#大减价。
行动=[
（'max'，max），
（'min'，min），
（‘sum’，sum），
（'avg'，lambda r:sum（r）/float（len（r）））
]
T=减少（表_流程、行动，T）
表(T)
#排名是不同的，因为它需要排序，而排序
#表中没有。
T2=表_秩（T，‘和’）
表(T2)
#简单where过滤器：从T2中选择*其中c2<5。
T3=表_过滤器（T2，（'c2'，λc:c<5））
表_打印（T3）

您是否考虑过使用Numpy阵列？这听起来就像是Numpy设计用来处理的事情：访问特定的列或行并将它们作为参数传递很容易，而且速度非常快，特别是对于纯数字操作。

Numpy.array

将非常有效

class Table(frozenset):
    def __new__(cls, names, rows):
        return frozenset.__new__(cls, rows)

    def __init__(self, names, rows):
        frozenset.__init__(self, rows)
        self.names = names

def add_column(rows, func):
    return [row + (func(row, idx),) for (idx, row) in enumerate(rows)]

def table_process(t, (name, func)):
    return Table(
        t.names + (name,),
        add_column(t, lambda row, idx: func(row))
        )

def table_filter(t, (name, func)):
    names = t.names
    idx = names.index(name)
    return Table(
        names,
        [row for row in t if func(row[idx])]
        )

def table_rank(t, name):
    names = t.names
    idx = names.index(name)
    rows = sorted(t, key = lambda row: row[idx])
    return Table(
        names + ('rank',),
        add_column(rows, lambda row, idx: idx)
        )

def table_print(t):
    format_row = lambda r: ' '.join('%15s' % c for c in r)
    print format_row(t.names)
    print '\n'.join(format_row(row) for row in t)

if __name__ == '__main__':
    from random import randint
    cols = ('c1', 'c2', 'c3')
    T = Table(
        cols,
        [tuple(randint(0, 9) for x in cols) for x in range(10)]
        )
    table_print(T)

    # Columns to add to the table, this is a perfect fit for a
    # reduce. I'd honestly use a boring for loop instead, but reduce
    # is a perfect example for how in FP data and code "becomes one."
    # In fact, this whole program could have been written as just one
    # big reduce.
    actions = [
        ('max', max),
        ('min', min),
        ('sum', sum),
        ('avg', lambda r: sum(r) / float(len(r)))
        ]
    T = reduce(table_process, actions, T)
    table_print(T)

    # Ranking is different because it requires an ordering, which a
    # table does not have.
    T2 = table_rank(T, 'sum')
    table_print(T2)

    # Simple where filter: select * from T2 where c2 < 5.
    T3 = table_filter(T2, ('c2', lambda c: c < 5))
    table_print(T3)