Python+;BeautifulSoup纽约时报网页文章刮

Python+;BeautifulSoup纽约时报网页文章刮,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图提取任何《纽约时报》文章的内容,并将其放入字符串中以计算某些单词。所有文章内容都可以在HTML“p”标记中找到。我能够一个接一个地获取段落(在代码中进行注释),但我无法迭代变量段落,因为我不断得到以下错误: --------------------------------------------------------------------------- TypeError Traceback (most recent c

我试图提取任何《纽约时报》文章的内容,并将其放入字符串中以计算某些单词。所有文章内容都可以在HTML“p”标记中找到。我能够一个接一个地获取段落(在代码中进行注释),但我无法迭代变量段落,因为我不断得到以下错误:

 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-52-ccc2f7cf5763> in <module>()
     16 
     17 for i in paragraphs:
---> 18     article = article + paragraphs[i].get_text()
     19 
     20 print(article)

TypeError: list indices must be integers, not Tag
你想要:

for p in paragraphs:
    article = article + p.get_text()
或:


别忘了检查《纽约时报》的服务条款,尤其是如果你使用他们的文章不仅仅是为了学习。
for p in paragraphs:
    article = article + p.get_text()
for i in range(len(paragraphs)):
    article = article + paragraphs[i].get_text()
p_tags = soup.find_all(class_="story-body-text story-content")
# method 1
article = ''
for p_tag in p_tags:
    p_text = p_tag.get_text()
    article += p_text
print(article)

# method 2
article2 = ''.join(p_tag.get_text() for p_tag in p_tags)
print(article2)