使用BeautifulSoup删除Python中不需要的标记_Python_Web Scraping_Beautifulsoup

使用BeautifulSoup删除Python中不需要的标记

python web-scraping

使用BeautifulSoup删除Python中不需要的标记,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试创建一个简单的网页刮板，它将返回这个网站的名字列表：但是，我不知道如何以“干净”的格式获取它们。下面的代码返回名称，但所有的标记仍保留在其中 import requests import urllib.request from bs4 import BeautifulSoup URL = 'https://www.verywellfamily.com/top-1000-baby-girl-names-2757832' page = requests.get(URL) soup = B

我正在尝试创建一个简单的网页刮板，它将返回这个网站的名字列表：但是，我不知道如何以“干净”的格式获取它们。下面的代码返回名称，但所有的标记仍保留在其中

import requests
import urllib.request
from bs4 import BeautifulSoup

URL = 'https://www.verywellfamily.com/top-1000-baby-girl-names-2757832'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

names = soup.find(id='mntl-sc-block_1-0-13')

print(names)

此代码用于将其放入文件：

file_name = URL.rsplit('/',1)[1].rsplit('.')[0]
with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
    file.write((names))

file_name = URL.split(".com/")[1]

with open("./{}.txt".format(file_name), "wt", encoding="utf-8") as f:
    for count, tag in enumerate(
        soup.find(id="mntl-sc-block_1-0-13").get_text(separator="\n").split("\n")[2:],
        start=1,
    ):
        if not tag:
            continue
        f.write("{0} {1} \n".format(count, tag))

您可以在方法中使用

分隔符='\n'

参数：

编辑用于使用计数器打印（如网站）：

输出：

1 Olivia
2 Emma
3 Ava
4 Sophia
5 Isabella
6 Charlotte
...
998 Zendaya
999 Ariadne
1000 Dixie

编辑2写入文件：

file_name = URL.rsplit('/',1)[1].rsplit('.')[0]
with open('./{}.txt'.format(file_name), mode='wt', encoding='utf-8') as file:
    file.write((names))

file_name = URL.split(".com/")[1]

with open("./{}.txt".format(file_name), "wt", encoding="utf-8") as f:
    for count, tag in enumerate(
        soup.find(id="mntl-sc-block_1-0-13").get_text(separator="\n").split("\n")[2:],
        start=1,
    ):
        if not tag:
            continue
        f.write("{0} {1} \n".format(count, tag))

@门德尔格，这很奇怪。。。我收到了194行文字，它在我这边。修正了如果你还想用计数器打印请看我的编辑好的，谢谢！但我认为输出的数据类型有问题。我正在尝试将输出存储到一个文件中（我将代码编辑到问题中），并且不断收到以下错误：TypeError:write（）参数必须是str，而不是None。如果我试图将其定义为一个字符串，那么它只输出“None”，请参阅我的编辑以写入文件