Python CS50 DNA适用于小型.csv，但不适用于大型_Python_Cs50

Python CS50 DNA适用于小型.csv，但不适用于大型

python

Python CS50 DNA适用于小型.csv，但不适用于大型,python,cs50,Python,Cs50,我的CS50 pset6 DNA有问题。当我使用small.csv文件时，它得到了所有正确的值并给出了正确的答案，但当我使用大文件时，它却没有得到正确的答案。我已经用debug50进行了一个多星期的测试，但仍然无法解决这个问题。我假设问题在通过样本查找STR的循环中的某个地方，但我只是不知道在遍历它时它做错了什么如果您不熟悉CS50 DNA problemset，则代码应该查看DNA序列（argv[1]），并将其与包含人员DNA STR的CSV文件进行比较，以确定它属于哪个人（如果有）注；我

我的CS50 pset6 DNA有问题。当我使用

small.csv

文件时，它得到了所有正确的值并给出了正确的答案，但当我使用大文件时，它却没有得到正确的答案。我已经用debug50进行了一个多星期的测试，但仍然无法解决这个问题。我假设问题在通过样本查找STR的循环中的某个地方，但我只是不知道在遍历它时它做错了什么

如果您不熟悉CS50 DNA problemset，则代码应该查看DNA序列（

argv[1]

），并将其与包含人员DNA STR的CSV文件进行比较，以确定它属于哪个人（如果有）

注；我的代码在案例中失败；（Python dna.py databases/large.csv sequences/5.txt）如果这有帮助的话

from sys import argv
from csv import reader


#ensures correct number of arguments
if (len(argv) != 3):
    print("usage: python dna.py data sample")


#dict for storage
peps = {}
#storage for strands we look for.
types = []

#opens csv table
with open(argv[1],'r') as file:
    data = reader(file)
    line = 0
    number = 0
    for l in data:
        if line == 0:
            for col in l:
                if col[2].islower() and col != 'name':
                    break
                if col == 'name':
                    continue
                else:
                    types.append(col)
            line += 1
        else:
            row_mark = 0
            for col in l:
                if row_mark == 0:
                    peps[col] = []
                    row_mark += 1
                else:
                    peps[l[0]].append(col)



#convert sample to string
samples = ""

with open(argv[2], 'r') as sample:
    for c in sample:
        samples = samples + c




#DNA STR GROUPS
dna = { "AGATC" : 0,
        "AATG" : 0,
        "TATC" : 0,
        "TTTTTTCT" : 0,
        "TCTAG" : 0,
        "GATA" : 0,
        "GAAA" : 0,
        "TCTG" : 0 }

#go through all the strs in dna
for keys in dna:
    #the longest run of sequnace
    longest = 0
    #the current run of sequances
    run = 0
    size = len(keys)
    #look through sample for longest
    i = 0
    while i < len(samples):
        hold = samples[i:(i + size)]
        if hold == keys:
            run += 1
            #ensure the code does not go outside len of samples
            if ((i + size) < len(samples)):
                i = i + size
            continue
        if run > longest:
            longest = run
            run = 0
        i += 1
    dna[keys] = longest

#see who it is
positive = True
person = ''
for key in peps:
    positive = True
    for entry in types:
        x = types.index(entry)
        test = dna.get(entry)
        can = int(peps.get(key)[x])
        if (test != can):
            positive = False
    if positive == True:
        person = key
        break
if person != '':
    print(person)
else:
    print("No match")

从系统导入argv
从csv导入读取器
#确保参数数量正确
如果（len（argv）！=3）：
打印（“用法：python dna.py数据示例”）
#存储记录
peps={}
#储存我们寻找的线。
类型=[]
#打开csv表
打开（argv[1]，'r'）作为文件：
数据=读取器（文件）
直线=0
数字=0
对于l in数据：
如果行==0：
对于l中的列：
如果列[2].islower（）和列！='名称'：
打破
如果col==“name”：
持续
其他：
类型。追加（列）
行+=1
其他：
行标记=0
对于l中的列：
如果行标记==0：
政治公众人物[col]=[]
行标记+=1
其他：
政治公众人物[l[0]]。追加（列）
#将示例转换为字符串
samples=“”
以open（argv[2]，'r'）为样本：
对于样品中的c：
样本=样本+c
#DNA STR群
dna={“AGATC”：0，
“AATG”：0，
“TATC”：0，
“TTTT CT”：0，
“TCTAG”：0，
“GATA”：0，
“GAAA”：0，
“TCTG”：0}
#检查dna中的所有STR
对于dna中的密钥：
#Sekunace最长的跑步记录
最长=0
#当前运行的sequances
运行=0
尺寸=透镜（键）
#仔细检查一下样品
i=0
而i最长：
最长=运行
运行=0
i+=1
dna[钥匙]=最长的
#看看是谁
正=真
人=“”
对于PEP中的密钥：
正=真
对于输入类型：
x=类型。索引（条目）
测试=dna.get（条目）
can=int（peps.get（键）[x]）
如果（测试！=can）：
正=假
如果正==真：
人=钥匙
打破
如果人！=''：
印刷品（人）
其他：
打印（“不匹配”）

问题在这个while循环中。请仔细查看此代码

while i < len(samples):
    hold = samples[i:(i + size)]
    if hold == keys:
        run += 1
        #ensure the code does not go outside len of samples
        if ((i + size) < len(samples)):
            i = i + size
        continue
    if run > longest:
        longest = run
        run = 0
    i += 1

earik87绝对正确！我想补充一点，代码缺少一个=来处理所有情况，尤其是当您有冗余序列时

而i


while i < len(samples):
    hold = samples[i:(i + size)]
    if hold == keys:
        run += 1
        #ensure the code does not go outside len of samples
        if ((i + size) < len(samples)):
            i = i + size
        continue
    else: #only if there is no longer sequence match, check this.
        if run > longest:
            longest = run
            run = 0
        else: #if the number of sequence match is still smaller then longest, then make run zero.
            run = 0
    i += 1