Python统计表的所有可能组合_Python_Matrix_Combinations_Permutation

Python统计表的所有可能组合

python matrix

Python统计表的所有可能组合,python,matrix,combinations,permutation,Python,Matrix,Combinations,Permutation,我有一张如下所示的表格： PotA PotB PotC PotD PotE A + + + + + B - ? + + ? C + + + + + D + - + - + E + + + + + 从这里开始，我必须找到所有可能的“+”、“-”和“？”的组合（PotA和PotB），（PotA和PotC），

我有一张如下所示的表格：

   PotA  PotB  PotC  PotD  PotE
A   +     +     +     +     +
B   -     ?     +     +     ?
C   +     +     +     +     +
D   +     -     +     -     +
E   +     +     +     +     +

从这里开始，我必须找到所有可能的“+”、“-”和“？”的组合（PotA和PotB），（PotA和PotC），等等，（PotA、PotB和PotC），最后是（PotA、PotB、PotC、PotD和PotE）。实际上，“Pot”行一直在进行，但为了简化，这里我只显示till PotE

要做到这一点，首先，我像下面这样读取文件，然后，为每个可能性的计数生成两个组合的所有可能可能性

def readDatafile():
    filename = ("data.txt")
    infile = open(filename,'r')

    for line in infile.readlines():
        line = line.strip()
        print (line)          # just to check the data, need to carry on from here.

"""Generate all possible permutations for later count"""
def doPermutations(items, n):
    if n == 0:
        yield ''
    else:
        for i in range(len(items)):
            for base in doPermutations(items, n - 1):
                yield str(items[i]) + str(base)

def makeAllPossibleList():
    nlength      = 2          # This should go outside of the function and will be same as the number of Pots
    lpossibility = ['+', '-', '?']
    litems       = []

    for i in doPermutations(lpossibility, int(nlength)):
        litems.append(i)

    for x in items:
        print (x)             # This generate all the possible items for combination of two

因此，最终结果如下：

Combination: Possibility Count
PotA, PotB: ++ 3
PotA, PotB: +- 1
PotA, PotB: +? 0
PotA, PotB: -+ 0
PotA, PotB: -- 0
PotA, PotB: -? 1
PotA, PotB: ?+ 0
PotA, PotB: ?- 0
PotA, PotB: ?? 0
PotA, PotC: ...
PotA, PotC: ...
.......
PotA, PotB, PotC, PotD, PotE: +++++ 3
PotA, PotB, PotC, PotD, PotE: ++++- 0
PotA, PotB, PotC, PotD, PotE: ++++? 0
.......

有没有什么好的python方法可以为这个问题获得正确的逻辑？我是否必须读取以标题作为键、以列作为列表值的数据

我找不到正确的逻辑。请给我一些帮助。

假设我了解您的需求，那么以下内容如何：

import itertools
import collections

def read_table(filename):
    with open(filename) as fp:
        header = next(fp).split()
        rows = [line.split()[1:] for line in fp if line.strip()]
        columns = zip(*rows)
    data = dict(zip(header, columns))
    return data

table = read_table("data.txt")
pots = sorted(table)

alphabet = "+-?"
for num in range(2, len(table)+1):
    for group in itertools.combinations(pots, num):
        patterns = zip(*[table[p] for p in group])
        counts = collections.Counter(patterns)
        for poss in itertools.product(alphabet, repeat=num):
            print ', '.join(group) + ':',
            print ''.join(poss), counts[poss]

产生：

PotA, PotB: ++ 3
PotA, PotB: +- 1
PotA, PotB: +? 0
PotA, PotB: -+ 0
PotA, PotB: -- 0
PotA, PotB: -? 1
PotA, PotB: ?+ 0
PotA, PotB: ?- 0
PotA, PotB: ?? 0
PotA, PotC: ++ 4
[...]
PotA, PotB, PotC, PotD, PotE: +++++ 3
PotA, PotB, PotC, PotD, PotE: ++++- 0
[...]

请注意，我假设您想要的输出是错误的，因为在这一行中：

PotA, PotB, PotC, PotD, PotE: ++++++ 2

左边有五列，右边有六个符号。

看看，特别是好的，我正在努力学习。或者可能有助于计票。这是一个绝对的耻辱，只有1票，甚至来自我…：（是的，这是一个错误，所以我修复了它。也感谢@DSM提供的答案。我会尝试。这个答案非常好！但是内存无法处理超过5列且字母模式匹配“+-？”（即使没有“？”）的输入数据）随着排列的进行。是否有任何补充代码（如

del

）来减少内存使用？或者，仅在给定的列数内进行排列？@Karyo如果内存有问题，您可以发布更大的数据集进行测试吗？您收到的错误是什么？@PeterGibson我的实际数据有23列和39个raw我的8Gb ram，程序运行到第6列的计算，几乎一整天，此时内存无法处理数据。为了测试错误，只需随机选择带+s和-s的列就可以了。对于少量列，代码运行得很好，但对于大量列，代码运行得不理想。