面向对象的条件更具特殊性-Python
这是我上一个问题的延续,使用第二个答案- 我在比较DNA中的核苷酸位置,这些位置以[染色体,起始,终止]的形式存在,我有一个现有位置的列表和一个新位置的列表。其目的是将新位置列表与现有位置进行比较,并添加一个新位置元素(如果它是唯一的或与现有位置重叠)。如果重叠,则应报告。只有当新的position元素完全包含在现有元素中时,才应将其丢弃 感谢@Hugh Bothwell-我目前拥有以下代码:面向对象的条件更具特殊性-Python,python,oop,if-statement,conditional-statements,Python,Oop,If Statement,Conditional Statements,这是我上一个问题的延续,使用第二个答案- 我在比较DNA中的核苷酸位置,这些位置以[染色体,起始,终止]的形式存在,我有一个现有位置的列表和一个新位置的列表。其目的是将新位置列表与现有位置进行比较,并添加一个新位置元素(如果它是唯一的或与现有位置重叠)。如果重叠,则应报告。只有当新的position元素完全包含在现有元素中时,才应将其丢弃 感谢@Hugh Bothwell-我目前拥有以下代码: class ChromoSegments: def __init__(self, Chromo
class ChromoSegments:
def __init__(self, ChromoSegments_args=None):
#Creates an empty default dict list which can be added to in the style "chromo[start,end]", and will stay ordered
self.segments = defaultdict(list)
#If a list is passed to the constructor you can add values to your list depending on conditions in 'add_seg'
if ChromoSegments_args is not None:
for chromo,start,end in ChromoSegments_args:
try:
self.add_seg(chromo, start, end)
except ValueError:
pass
#function for adding positions to the expos list
def add_seg(self, chromo, start, end):
seg = self.segments[chromo]
val = (start, end)
ndx = bisect_left(seg, val)
if (ndx == 0 or seg[ndx - 1][1] < start):
if (ndx == len(seg) or end < seg[ndx][0]):
seg.insert(ndx, val)
else:
nstart, nend = seg[ndx]
raise ValueError("Hit ({}, {}, {}) \t\t\t overlaps with ({}, {}, {})".format(chromo, start, end, chromo, nstart, nend))
#collision with preceding element
else:
nstart, nend = seg[ndx - 1]
raise ValueError("Hit ({}, {}, {}) \t\t\t overlaps with ({}, {}, {})".format(chromo, start, end, chromo, nstart, nend))
def to_list(self):
keys = sorted(self.segments.keys())
return [(k, s, e) for k in keys for s,e in self.segments[k]]
def main():
expos = ChromoSegments(expos_list)
newpos = (newpos_list)
error_file = open("discarded_hits.txt", "w")
for seg in newpos:
try:
expos.add_seg(*seg)
except ValueError, e:
collision = str(e)
error_file.write(collision + "\n")
error_file.close()
#convert results back into text files of positions
updated_expos = expos.to_list()
updated_expos_file = open(sys.argv[2], "w")
for element in updated_expos:
c1 = str(element[0])
c2 = str(element[1])
c3 = str(element[2])
updated_expos_file.write(c1 + "\t" + c2 + "\t" + c3 + "\n")
updated_expos_file.close()
if __name__ == "__main__":
main()
因此,在一个成功的过程之后,我想要一个最终的曝光列表(记住第一个数字(目前在所有情况下为1)可以是任何数字:
expos_list = [[1, 12, 25], [1, 20, 40], [1, 60, 80], [1, 75, 90], [1, 100, 150]]
我试一下:
START = 1
END = 2
expos_list = [[1, 20, 40], [1, 60, 80]]
newpos_list = [[1, 12, 25], [1, 22, 38], [1, 75, 90], [1, 100, 150]]
# sort the new list
expos_list.extend(newpos_list)
expos_list.sort(lambda l1, l2: cmp(l1[0], l2[0]) or (l1[0] == l2[0] and (cmp(l1[1], l2[1]) or (l1[1] == l2[1] and (cmp(l1[2], l2[2]))))))
# remove and print the overlaps
i = 0
l = expos_list
while i < len(l) - 1:
if l[i][END] > l[i + 1][START]:
print 'overlap', l[i], l[i+1]
if l[i][START] <= l[i+1][START] and l[i][END] >= l[i+1][END]:
# i+1 is in i
l.pop(i+1)
elif l[i][START] >= l[i+1][START] and l[i][END] <= l[i+1][END]:
# i is in i+1
l.pop(i)
else:
# there is a partial overlap
i += 1
else:
i += 1
# overlap [1, 12, 25] [1, 20, 40]
# overlap [1, 20, 40] [1, 22, 38]
# [1, 22, 38] # removed
# overlap [1, 60, 80] [1, 75, 90]
# outcome
print expos_list
# [[1, 12, 25], [1, 20, 40], [1, 60, 80], [1, 75, 90], [1, 100, 150]]
START=1
结束=2
expos_list=[[1,20,40],[1,60,80]]
newpos_list=[[1,12,25]、[1,22,38]、[1,75,90]、[1100,150]]
#对新列表排序
expos\u list.extend(newpos\u list)
expos_list.sort(lambda l1,l2:cmp(l1[0],l2[0])或(l1[0]==l2[0]和(cmp(l1[1],l2[1])或(l1[1]==l2[1]和(cmp(l1[2],l2[2]));)
#删除并打印重叠部分
i=0
l=曝光列表
而il[i+1][START]:
打印“重叠”,l[i],l[i+1]
如果l[i][START]=l[i+1][END]:
#i+1在i中
l、 流行音乐(i+1)
elif l[i][START]>=l[i+1][START]和l[i][END]看起来像是一个一维几何相交问题。你可以在“网络”上找到解决方案。@asmason似乎你有混合的制表符/空格。我修复了代码格式,请检查正确性。@alko-是的,当我把它放在问题中时,它添加了我原始代码中没有的空格。如果我没有编辑,很抱歉哼哼,不正确!另外:Ber-由于细节原因,找不到任何真正有用的东西。尤其是它以这种方式使用字典。4个问题:庞大的代码示例是否有助于理解如何插入列表?你能缩短问题以便更多人阅读吗?如果我插入[2,20,40]会发生什么
转换为[[1,…],…,[1,…]]
?expos_list的元素是否总是在结果中粘在一起?Hi@User-我知道,它很长,但因为函数是链接的,所以我不想被问到变量来自何处等。核心是add_seg
函数,但我希望人们有上下文。我在上一个问题中因为没有上下文而受到批评ng足够特异…第三个-它被视为与第一个数字(染色体)一样唯一是不同的,所以会被添加到列表中。第4-粘在一起?不确定你的意思,但是-expos\u list
始终作为参考,并且平分左
保持列表的顺序以便不断比较感谢答案,但我不只是想将列表连接在一起,而是要有选择性地选择如何ex位置列表继续。使用上面的示例-从碱基20到40的染色体区域已添加到列表中。因此,从碱基22到38的区域已添加到列表中,您不希望重复。但是,如果存在重叠(提供新序列)这需要记录-例如区域12到25[1,12,25]
。这就是为什么我需要一个额外的条件声明,我想。谢谢!非常感谢这一点-我已经根据我的输入调整了它,它工作得很好。很好,很简单!为strand添加了额外的条件(末端可能比开始的数字低)染色体。
START = 1
END = 2
expos_list = [[1, 20, 40], [1, 60, 80]]
newpos_list = [[1, 12, 25], [1, 22, 38], [1, 75, 90], [1, 100, 150]]
# sort the new list
expos_list.extend(newpos_list)
expos_list.sort(lambda l1, l2: cmp(l1[0], l2[0]) or (l1[0] == l2[0] and (cmp(l1[1], l2[1]) or (l1[1] == l2[1] and (cmp(l1[2], l2[2]))))))
# remove and print the overlaps
i = 0
l = expos_list
while i < len(l) - 1:
if l[i][END] > l[i + 1][START]:
print 'overlap', l[i], l[i+1]
if l[i][START] <= l[i+1][START] and l[i][END] >= l[i+1][END]:
# i+1 is in i
l.pop(i+1)
elif l[i][START] >= l[i+1][START] and l[i][END] <= l[i+1][END]:
# i is in i+1
l.pop(i)
else:
# there is a partial overlap
i += 1
else:
i += 1
# overlap [1, 12, 25] [1, 20, 40]
# overlap [1, 20, 40] [1, 22, 38]
# [1, 22, 38] # removed
# overlap [1, 60, 80] [1, 75, 90]
# outcome
print expos_list
# [[1, 12, 25], [1, 20, 40], [1, 60, 80], [1, 75, 90], [1, 100, 150]]
lambda l1, l2: l1[0] < l2[0] or (l1[0] == l2[0] and (l1[1] < l2[1] or (l1[1] == l2[1] and (l1[2] <= l2[2]))))