Python 尝试对列表中的元素计数时出现元组问题？_Python_Python 2.7_Loops_Beautifulsoup_Tuples

Python 尝试对列表中的元素计数时出现元组问题？

python python-2.7 loops

Python 尝试对列表中的元素计数时出现元组问题？,python,python-2.7,loops,beautifulsoup,tuples,Python,Python 2.7,Loops,Beautifulsoup,Tuples,我试图统计政客在某些演讲中使用的缩略词的数量。我有很多演讲，但以下是一些URL示例： every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427', 'http://www.millercenter.org/president/obama/speeches/speech-4424', 'http://www.millercenter.org/president/obama/speec

我试图统计政客在某些演讲中使用的缩略词的数量。我有很多演讲，但以下是一些URL示例：

every_link_test = ['http://www.millercenter.org/president/obama/speeches/speech-4427',
 'http://www.millercenter.org/president/obama/speeches/speech-4424',
 'http://www.millercenter.org/president/obama/speeches/speech-4453',
 'http://www.millercenter.org/president/obama/speeches/speech-4612',
 'http://www.millercenter.org/president/obama/speeches/speech-5502']

我现在有一个非常粗略的计数器-它只计算所有链接中使用的收缩总数。例如，以下代码返回上述五个链接的

79101101182224

。但是，我想链接

文件名

，这是我在下面创建的一个变量，所以我会有类似

（speech_1,79），（speech_2,22），（speech_3,0），（speech_4,81），（speech_5,42）

。这样，我就可以跟踪每个语音中使用的收缩次数。我的代码出现以下错误：

AttributeError:“tuple”对象没有属性“split”

这是我的密码：

import urllib2,sys,os
from bs4 import BeautifulSoup,NavigableString
from string import punctuation as p
from multiprocessing import Pool
import re, nltk
import requests
reload(sys)

url = 'http://www.millercenter.org/president/speeches'
url2 = 'http://www.millercenter.org'

conn = urllib2.urlopen(url)
html = conn.read()

miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')

linklist = [tag.get('href') for tag in links if tag.get('href') is not None]

# remove all items in list that don't contain 'speeches'
linkslist = [_ for _ in linklist if re.search('speeches',_)]
del linkslist[0:2]

# concatenate 'http://www.millercenter.org' with each speech's URL ending
every_link_dups = [url2 + end_link for end_link in linkslist]

# remove duplicates
seen = set()
every_link = [] # no duplicates array
for l in every_link_dups:
    if l not in seen:
        every_link.append(l)
        seen.add(l)

def processURL_short_2(l):
    open_url = urllib2.urlopen(l).read()
    item_soup = BeautifulSoup(open_url)
    item_div = item_soup.find('div',{'id':'transcript'},{'class':'displaytext'})
    item_str = item_div.text.lower()

    splitlink = l.split("/")
    president = splitlink[4]
    speech_num = splitlink[-1]
    filename = "{0}_{1}".format(president, speech_num)
    return item_str, filename

every_link_test = every_link[0:5]
print every_link_test
count = 0
for l in every_link_test:
    content_1 = processURL_short_2(l)
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1        
    print count, filename

您应该将这些数据保存到数据结构中，如字典，而不是打印计数、文件名。由于

processURL\u short\u 2

已被修改为返回元组，因此需要将其解包

data = {} # initialize a dictionary
for l in every_link_test:
    content_1, filename = processURL_short_2(l) # unpack the content and filename
    for word in content_1.split():
        word = word.strip(p)
        if word in contractions:
            count = count + 1        
    data[filename] = count # add this to the dictionary as filename:count

这将为您提供一个类似于

{'obama_4424'：79，'obama_4453'：101，…}

的字典，允许您轻松存储和访问解析的数据。

正如错误消息所解释的，您不能使用拆分的方式使用它。split用于字符串

因此，您需要更改以下内容：

for word in content_1.split():

为此：

for word in content_1[0]:

我通过运行您的代码选择了

[0]

，我认为这将为您提供要搜索的文本块

@TigerhawkT3有一个很好的建议，你也应该在他们的回答中遵循：

好吧，python不会说谎<代码>拆分用于

字符串

。缩进已关闭，或者

存在一些变量命名问题。请注意，

content\u 1.split（）

周围的代码存在语法错误。content_1是一个元组。我在回答中提到了如何处理它。@idjaw-谢谢。我想你指的是

属性错误

，而不是

语法错误

，对吧？不管怎样，修正了。OP说他们已经得到了结果，所以我认为他们有工作代码要添加功能，而不是尝试添加该功能失败。是的，

AttributeError

。抱歉…一次做的事情太多了：p这就是我打算做的：将其保存到数据结构中。我想要它就像你的字典一样。谢谢你的帮助-我知道了。收缩确实存在于这样的列表中：

{“you's”：“you's/you's”}

大概在索引

内容1

之后，它仍然应该使用

.split（）

。或者避免索引，并将返回的元组解压到

内容、文件名或诸如此类的内容中。@CarterMasterson您也可以查看缩进吗？太离谱了。在您刚刚提供的数据样本之后，我再也不会出现语法错误了……但代码的对齐方式并不正确。你能在你的OP中确认它应该如何对齐吗？我在上面修复了它。使用代码中的什么更改，您不再会遇到语法错误？一切都在我这边进行。输出与您指定的不匹配。但是没有更多的错误。你还有困难吗？