List 查找两个排序列表间共现的算法优化
目前,我的算法(估计)要运行十个多小时才能完成。它现在还在运行,只是为了更好地估计它到底有多糟糕 假设我有一组人p,每个人都有不同长度的事件排序列表,其中I是一个索引变量。我想创建一个图G,这样GPi,Pj=n,其中n是Pi和Pj之间的边权重,表示它们在某个静态范围内同时出现的次数r 我当前的算法是无意识的,并且是用Python实现的(既可读又明确),如下所示:(为了简洁起见,修改自) 我已经考虑过修改这个算法,这样当List 查找两个排序列表间共现的算法优化,list,loops,optimization,language-agnostic,combinations,List,Loops,Optimization,Language Agnostic,Combinations,目前,我的算法(估计)要运行十个多小时才能完成。它现在还在运行,只是为了更好地估计它到底有多糟糕 假设我有一组人p,每个人都有不同长度的事件排序列表,其中I是一个索引变量。我想创建一个图G,这样GPi,Pj=n,其中n是Pi和Pj之间的边权重,表示它们在某个静态范围内同时出现的次数r 我当前的算法是无意识的,并且是用Python实现的(既可读又明确),如下所示:(为了简洁起见,修改自) 我已经考虑过修改这个算法,这样当oB超过当前oA的范围时,循环就会继续到下一个oA 考虑到列表已排序,有没有更
oB
超过当前oA
的范围时,循环就会继续到下一个oA
考虑到列表已排序,有没有更好的方法来实现这一点?一个选项是将B.occurrences放在a中,以便您可以快速查询范围内的所有oB(oA-半径,oA+半径)
另一个选项是索引桶中的B.事件,例如[0,1],[1,2]等。然后通过选择索引为(oA-半径)到(oA+半径)的桶,可以快速找到范围内的所有oB(oA-半径,oA+半径)。桶是近似值,因此您仍然需要迭代验证第一个和最后一个选定桶中的所有oB。您的想法是,一旦您通过上边界,就移动到下一个
oA
。此外,如果a.ocurrances
和B.ocurrances
的范围与“半径”相比较大,那么,不从B的开头开始将更有效率。每次发生时:
print '>Generating combinations...',
pairs = combinations(people, 2)
print 'Done'
print 'Finding co-occurences'
radius = 5
for A, B in pairs:
i = 0
b = B.occurances
maxi = len(B.occurances) - 1
for oA in A.occurances:
lo = oA - radius
hi = oA + radius
while (b[i] > lo) and (i > 0): # while we're above the low end of the range
i = i - 1 # go towards the low end of the range
while (b[i] < lo) and (i < maxi): # while we're below the low end of the range
i = i + 1 # go towards the low end of the range
if b[i] >= lo:
while (b[i] <= hi): # while we're below the high end of the range
try: # increase edge weight
network.edge[A.common_name][B.common_name]['weight'] += 1
except:
network.add_edge(A.common_name, B.common_name, weight=1)
if i < maxi: # and go towards the high end of the range
i = i + 1
else:
break
print'>正在生成组合,
成对=组合(人,2)
打印“完成”
打印“查找共同事件”
半径=5
对于A,B成对:
i=0
发生
最大值=len(B.发生率)-1
对于A.事件中的oA:
lo=oA-半径
hi=oA+半径
而(b[i]>lo)和(i>0):#当我们处于该范围的低端之上时
i=i-1#走向范围的低端
而(b[i]=lo:
while(b[i]
print '>Generating combinations...',
pairs = combinations(people, 2)
print 'Done'
print 'Finding co-occurences'
radius = 5
for A, B in pairs:
i = 0
b = B.occurances
maxi = len(B.occurances) - 1
for oA in A.occurances:
lo = oA - radius
hi = oA + radius
while (b[i] > lo) and (i > 0): # while we're above the low end of the range
i = i - 1 # go towards the low end of the range
while (b[i] < lo) and (i < maxi): # while we're below the low end of the range
i = i + 1 # go towards the low end of the range
if b[i] >= lo:
while (b[i] <= hi): # while we're below the high end of the range
try: # increase edge weight
network.edge[A.common_name][B.common_name]['weight'] += 1
except:
network.add_edge(A.common_name, B.common_name, weight=1)
if i < maxi: # and go towards the high end of the range
i = i + 1
else:
break