Python 如何从两者访问文本<；p>；使用beautifulsoup4？_Python_Beautifulsoup

Python 如何从两者访问文本<；p>；使用beautifulsoup4？

python

Python 如何从两者访问文本<；p>；使用beautifulsoup4？,python,beautifulsoup,Python,Beautifulsoup,我想从这两个中获取文本，如何获取？对于第一个，我的代码正在工作，但无法获取第二个 <p> <a href="https://www.japantimes.co.jp/news/2019/03/19/world/crime-legal-world/emerging-online-threats-changing-homeland-securitys-role-merely-fighting-terrorism/"> Emerging

我想从这两个

中获取文本，如何获取？对于第一个

，我的代码正在工作，但无法获取第二个

  <p>
        <a href="https://www.japantimes.co.jp/news/2019/03/19/world/crime-legal-world/emerging-online-threats-changing-homeland-securitys-role-merely-fighting-terrorism/">
         Emerging online threats changing Homeland Security's role from merely fighting terrorism
        </a>
       </p>
      </hgroup>
     </header>
     <p>
      Homeland Security Secretary Kirstjen Nielsen said Monday that her department may have been founded to combat terrorism, but its mission is shifting to also confront emerging online threats.

    China, Iran and other countries are mimicking the approach that Russia used to interfere in the U.S. ...
      <a class="more_link" href="https://www.japantimes.co.jp/news/2019/03/19/world/crime-legal-world/emerging-online-threats-changing-homeland-securitys-role-merely-fighting-terrorism/">
       <span class="icon-arrow-2">
       </span>
      </a>
     </p>

您可以使用

.find_next（）

。但是，这不是全文：

from bs4 import BeautifulSoup
import requests


article = "https://www.japantimes.co.jp/tag/cybersecurity/page/1/"
page = requests.get(article)
soup = BeautifulSoup(page.text, 'html.parser')


article = soup.find('div', class_="content_col")

date = article.h3.find('span', class_= "right date")
date_text = date.text

headline = article.p.find('a')
headline_text = headline.text

content_text = article.p.find_next('p').text
print(date_text, headline_text ,content_text)

使用父id和p选择器，并将其索引到所需段落数的返回列表中。您可以在发布时使用时间标记

import requests 
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.japantimes.co.jp/news/2019/03/19/world/crime-legal-world/emerging-online-threats-changing-homeland-securitys-role-merely-fighting-terrorism/#.XJIQNDj7TX4')
soup = bs(r.content, 'lxml')
posted = soup.select_one('time').text
print(posted)
paras = [item.text.strip() for item in soup.select('#jtarticle p')]
print(paras[:2])

你想要完整的博客文章吗？我看到你在用这个代码中的前一个问题的答案，考虑接受你之前提出的问题的答案吗？

import requests 
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.japantimes.co.jp/news/2019/03/19/world/crime-legal-world/emerging-online-threats-changing-homeland-securitys-role-merely-fighting-terrorism/#.XJIQNDj7TX4')
soup = bs(r.content, 'lxml')
posted = soup.select_one('time').text
print(posted)
paras = [item.text.strip() for item in soup.select('#jtarticle p')]
print(paras[:2])