Python统计表的所有可能组合
我有一张如下所示的表格:Python统计表的所有可能组合,python,matrix,combinations,permutation,Python,Matrix,Combinations,Permutation,我有一张如下所示的表格: PotA PotB PotC PotD PotE A + + + + + B - ? + + ? C + + + + + D + - + - + E + + + + + 从这里开始,我必须找到所有可能的“+”、“-”和“?”的组合(PotA和PotB),(PotA和PotC),
PotA PotB PotC PotD PotE
A + + + + +
B - ? + + ?
C + + + + +
D + - + - +
E + + + + +
从这里开始,我必须找到所有可能的“+”、“-”和“?”的组合(PotA和PotB),(PotA和PotC),等等,(PotA、PotB和PotC),最后是(PotA、PotB、PotC、PotD和PotE)。实际上,“Pot”行一直在进行,但为了简化,这里我只显示till PotE
要做到这一点,首先,我像下面这样读取文件,然后,为每个可能性的计数生成两个组合的所有可能可能性
def readDatafile():
filename = ("data.txt")
infile = open(filename,'r')
for line in infile.readlines():
line = line.strip()
print (line) # just to check the data, need to carry on from here.
"""Generate all possible permutations for later count"""
def doPermutations(items, n):
if n == 0:
yield ''
else:
for i in range(len(items)):
for base in doPermutations(items, n - 1):
yield str(items[i]) + str(base)
def makeAllPossibleList():
nlength = 2 # This should go outside of the function and will be same as the number of Pots
lpossibility = ['+', '-', '?']
litems = []
for i in doPermutations(lpossibility, int(nlength)):
litems.append(i)
for x in items:
print (x) # This generate all the possible items for combination of two
因此,最终结果如下:
Combination: Possibility Count
PotA, PotB: ++ 3
PotA, PotB: +- 1
PotA, PotB: +? 0
PotA, PotB: -+ 0
PotA, PotB: -- 0
PotA, PotB: -? 1
PotA, PotB: ?+ 0
PotA, PotB: ?- 0
PotA, PotB: ?? 0
PotA, PotC: ...
PotA, PotC: ...
.......
PotA, PotB, PotC, PotD, PotE: +++++ 3
PotA, PotB, PotC, PotD, PotE: ++++- 0
PotA, PotB, PotC, PotD, PotE: ++++? 0
.......
有没有什么好的python方法可以为这个问题获得正确的逻辑?
我是否必须读取以标题作为键、以列作为列表值的数据
我找不到正确的逻辑。请给我一些帮助。假设我了解您的需求,那么以下内容如何:
import itertools
import collections
def read_table(filename):
with open(filename) as fp:
header = next(fp).split()
rows = [line.split()[1:] for line in fp if line.strip()]
columns = zip(*rows)
data = dict(zip(header, columns))
return data
table = read_table("data.txt")
pots = sorted(table)
alphabet = "+-?"
for num in range(2, len(table)+1):
for group in itertools.combinations(pots, num):
patterns = zip(*[table[p] for p in group])
counts = collections.Counter(patterns)
for poss in itertools.product(alphabet, repeat=num):
print ', '.join(group) + ':',
print ''.join(poss), counts[poss]
产生:
PotA, PotB: ++ 3
PotA, PotB: +- 1
PotA, PotB: +? 0
PotA, PotB: -+ 0
PotA, PotB: -- 0
PotA, PotB: -? 1
PotA, PotB: ?+ 0
PotA, PotB: ?- 0
PotA, PotB: ?? 0
PotA, PotC: ++ 4
[...]
PotA, PotB, PotC, PotD, PotE: +++++ 3
PotA, PotB, PotC, PotD, PotE: ++++- 0
[...]
请注意,我假设您想要的输出是错误的,因为在这一行中:
PotA, PotB, PotC, PotD, PotE: ++++++ 2
左边有五列,右边有六个符号。看看,特别是好的,我正在努力学习。或者可能有助于计票。这是一个绝对的耻辱,只有1票,甚至来自我…:(是的,这是一个错误,所以我修复了它。也感谢@DSM提供的答案。我会尝试。这个答案非常好!但是内存无法处理超过5列且字母模式匹配“+-?”(即使没有“?”)的输入数据)随着排列的进行。是否有任何补充代码(如
del
)来减少内存使用?或者,仅在给定的列数内进行排列?@Karyo如果内存有问题,您可以发布更大的数据集进行测试吗?您收到的错误是什么?@PeterGibson我的实际数据有23列和39个raw我的8Gb ram,程序运行到第6列的计算,几乎一整天,此时内存无法处理数据。为了测试错误,只需随机选择带+s和-s的列就可以了。对于少量列,代码运行得很好,但对于大量列,代码运行得不理想。