Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2008/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从抓取结果中删除html元素_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 从抓取结果中删除html元素

Python 从抓取结果中删除html元素,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在抓取印尼新闻网站。当我从每个新闻链接中抓取新闻文章时,上面有一些HTML元素。输出如下所示: 我想删除元素,这样输出就是文章。我已经使用了.strip(),但仍然不会影响输出。这是我的代码: import requests from bs4 import BeautifulSoup import pandas as pd import csv detik = requests.get('https://www.detik.com/terpopuler') beautify = Beau

我正在抓取印尼新闻网站。当我从每个新闻链接中抓取新闻文章时,上面有一些HTML元素。输出如下所示:

我想删除元素,这样输出就是文章。我已经使用了
.strip()
,但仍然不会影响输出。这是我的代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv

detik = requests.get('https://www.detik.com/terpopuler')
beautify = BeautifulSoup(detik.content, 'html5lib')

news = beautify.find_all('article', {'class','list-content__item'})
arti = []
for each in news:
  try:
    title = each.find('h3', {'class','media__title'}).text
    lnk = each.a.get('href')
    r = requests.get(lnk)
    soup = BeautifulSoup(r.text, 'html5lib')
    content = soup.find('div', {'class', 'detail__body-text itp_bodycontent'}).text.strip()
    
    print(title)
    print(lnk)

    arti.append({
      'Headline': title,
      'Content':content,
      'Link': lnk
    })
  except:
    continue
df = pd.DataFrame(arti)
df.to_csv('detik.csv', index=False)

任何帮助都将不胜感激

您可能正在处理无效标记。此线程可能有用: