Python 为什么会有"；列表索引超出范围“；错误？_Python_List_Python 2.7

Python 为什么会有"；列表索引超出范围“；错误？

python list python-2.7

Python 为什么会有"；列表索引超出范围“；错误？,python,list,python-2.7,Python,List,Python 2.7,所以我有一个文件列表，我想通读并打印出这些信息。它不断给我错误列表索引超出范围。不知道出了什么事。对于第2行，如果我添加匹配项[：10]，它可以用于前10个文件。但我需要它来处理所有文件。检查了一些旧的帖子，但仍然不能得到我的代码工作 re.findall在我编写这段代码时曾经工作过。不确定它是否不再工作。谢谢 import re, os topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an ob

所以我有一个文件列表，我想通读并打印出这些信息。它不断给我错误

列表索引超出范围

。不知道出了什么事。对于第2行，如果我添加

匹配项[：10]

，它可以用于前10个文件。但我需要它来处理所有文件。检查了一些旧的帖子，但仍然不能得到我的代码工作

re.findall

在我编写这段代码时曾经工作过。不确定它是否不再工作。谢谢

import re, os
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis.
matches = []
for root, dirnames, filenames in os.walk(topdir):
    for filename in filenames:
        if filename.endswith(('.txt','.pdf')):
            matches.append(os.path.join(root, filename))

capturedorgs = []
capturedfiles = []
capturedabstracts = []
orgAwards={}
for filepath in matches:
with open (filepath,'rt') as mytext:
    mytext=mytext.read()

    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
            capturedorgs.append(matchOrg)

    # code to capture files
    matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0]
    capturedfiles.append(matchFile)

    # code to capture abstracts
    matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)',mytext)[0]
    capturedabstracts.append(matchAbs)

    # total awarded money
    matchAmt=re.findall(r'Total\s+Amt\.\s+\:\s+\$(\d+)',mytext)[0]

    if matchOrg not in orgAwards:
        orgAwards[matchOrg]=[]
    orgAwards[matchOrg].append(int(matchAmt))

for each in capturedorgs:
    print(each,"\n")
for each in capturedfiles:
    print(each,"\n")
for each in capturedabstracts:
    print (each,"\n")

# add code to print what is in your other two lists
from collections import Counter
countOrg=Counter(capturedorgs)
print (countOrg)

for each in orgAwards:
print(each,sum(orgAwards[each]))

错误消息：

Traceback (most recent call last):
  File "C:\Python32\Assignment1.py", line 17, in <module>
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
IndexError: list index out of range

回溯（最近一次呼叫最后一次）：
文件“C:\Python32\Assignment1.py”，第17行，在
matchOrg=re.findall（r'NSF\s+Org\s+\：\s+（\w+），mytext）[0]
索引器：列表索引超出范围

问题在于：

matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]

显然，您有一个文件中根本没有这个。因此，当您尊重项

[0]

时，它不在那里

你需要处理这个案子

一种方法是，如果找不到，就根本不包括它：

for filepath in matches:
    with open (filepath,'rt') as mytext:
        mytext=mytext.read()

        matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)
        if len(matchOrg) > 0:
            capturedorgs.append(matchOrg[0])

此外，如果文件中可能有多个文件，您可能需要使用

extend（matchOrg）

，并且您希望捕获所有文件。

如果

findall

未找到匹配项，它将返回一个空列表

[]

；尝试从此空列表中提取第一项时出错，导致异常：

>>> import re
>>> i = 'hello'
>>> re.findall('abc', i)
[]
>>> re.findall('abc', i)[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

对于每个

re.findall

语句，您都必须这样做。

对于匹配项[]中的文件路径：

？我尝试了不同的方法，忘记了清除

[]

。更新了我的代码。抱歉，我复制了您的打字错误：

匹配项[]

。现在已修复。如果OP中的代码是您正在运行的。。。您仍然需要将

[0]

从

matchOrg=

行的末尾去掉…如果它不是您正在运行的内容。。。如果你能发布回溯，那会有帮助的。好吧，我发布的只是我代码的一部分。当我试图按照你的建议做些改变时，它把其他事情搞砸了。现在我没有采纳你的建议就发布了整篇文章，请告诉我该怎么做。谢谢！这是同样的问题-每次，你都这样做

re.findall（matcher，text）*[0]*

-您需要删除该结尾

[0]

，检查长度，并且在验证确实存在0元素后，只进行尊重和添加（或扩展）…它会对我的前10个和100个文件运行。我应该如何修复代码？

try:
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
    capturedorgs.append(matchOrg)
except IndexError:
    print('No organization match for {}'.format(filepath))