Python 使用.lower（）解析网站时，列表索引超出范围_Python_Python 2.7_Nlp

Python 使用.lower（）解析网站时，列表索引超出范围

python python-2.7 nlp

Python 使用.lower（）解析网站时，列表索引超出范围,python,python-2.7,nlp,Python,Python 2.7,Nlp,我正在解析一个网站，以便计算其中提到关键字的换行数。使用以下代码，一切运行正常： import time import urllib2 from urllib2 import urlopen import datetime website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515' topSplit = 'NEW YORK -- '

我正在解析一个网站，以便计算其中提到关键字的换行数。使用以下代码，一切运行正常：

import time
import urllib2
from urllib2 import urlopen
import datetime

website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515'
topSplit = 'NEW YORK -- '
bottomSplit = "<div class=\"knot-gallery\""

# Count mentions on newlines
def main():
    try:
        x = 0
        sourceCode = urllib2.urlopen(website).read()
        sourceSplit = sourceCode.split(topSplit)[1].split(bottomSplit)[0]
        content = sourceSplit.split('\n') # provides an array
        
        for line in content:
            if 'gain' in line:
                x += 1
        
        print x
    
    except Exception,e:
        print 'Failed in the main loop'
        print str(e)

main()

但这给了我一个错误：

在主循环中失败

列表索引超出范围

假设

.lower（）

正在丢弃索引，为什么会发生这种情况？

您只使用小写字符串（这就是

lower（）

所做的），但您正在尝试使用

topplit='newyork--'

进行拆分，它应该创建一个包含单个项的列表

然后尝试访问索引1上的列表，该列表将始终失败：

sourceCode.split(topSplit)[1]

为了说明这两种情况，请查看正则表达式在模块中的用法，下面是一个示例：

>>> string = "some STRING lol"
>>> re.split("string", string, flags=re.IGNORECASE)
['some ', ' lol']
>>> re.split("STRING", string, flags=re.IGNORECASE)
['some ', ' lol']

您正在使用仅小写的字符串（这就是

lower（）

所做的），但您正在尝试使用

topplit='newyork--'

进行拆分，这将创建一个包含单个项的列表

然后尝试访问索引1上的列表，该列表将始终失败：

sourceCode.split(topSplit)[1]

为了说明这两种情况，请查看正则表达式在模块中的用法，下面是一个示例：

>>> string = "some STRING lol"
>>> re.split("string", string, flags=re.IGNORECASE)
['some ', ' lol']
>>> re.split("STRING", string, flags=re.IGNORECASE)
['some ', ' lol']

您正在使用仅小写的字符串（这就是

lower（）

所做的），但您正在尝试使用

topplit='newyork--'

进行拆分，这将创建一个包含单个项的列表

然后尝试访问索引1上的列表，该列表将始终失败：

sourceCode.split(topSplit)[1]

为了说明这两种情况，请查看正则表达式在模块中的用法，下面是一个示例：

>>> string = "some STRING lol"
>>> re.split("string", string, flags=re.IGNORECASE)
['some ', ' lol']
>>> re.split("STRING", string, flags=re.IGNORECASE)
['some ', ' lol']

您正在使用仅小写的字符串（这就是

lower（）

所做的），但您正在尝试使用

topplit='newyork--'

进行拆分，这将创建一个包含单个项的列表

然后尝试访问索引1上的列表，该列表将始终失败：

sourceCode.split(topSplit)[1]

为了说明这两种情况，请查看正则表达式在模块中的用法，下面是一个示例：

>>> string = "some STRING lol"
>>> re.split("string", string, flags=re.IGNORECASE)
['some ', ' lol']
>>> re.split("STRING", string, flags=re.IGNORECASE)
['some ', ' lol']

回答得很好，根据您的建议，我使用了

topplit='newyork--'.lower（）

来运行它。我还将查看

re

模块，谢谢大家的提醒。回答很好，根据您的建议，我使用了

topplit='newyork--'。lower（）

来运行它。我还将查看

re

模块，谢谢大家的提醒。回答很好，根据您的建议，我使用了

topplit='newyork--'。lower（）

来运行它。我还将查看

re

模块，谢谢大家的提醒。回答很好，根据您的建议，我使用了

topplit='newyork--'。lower（）

来运行它。我还将查看

re

模块，谢谢大家的提醒。