Python 使用BeautifulSoup删除信息_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup删除信息

python web-scraping

Python 使用BeautifulSoup删除信息,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我需要获得有关以下字段的一些信息： Website Address Last Analysis Blacklist Status Domain Registration Server Location 从本网站：我使用requests和BeautifulSoup访问网站并获取信息： import requests from bs4 import BeautifulSoup r = requests.get('https://www.urlvoid.com/scan/gordonrams

我需要获得有关以下字段的一些信息：

Website Address 
Last Analysis
Blacklist Status
Domain Registration
Server Location

从本网站：

我使用requests和BeautifulSoup访问网站并获取信息：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.urlvoid.com/scan/gordonramsay.com/')
soup = BeautifulSoup(r.content, 'lxml')

但是，我无法选择这些字段。这些字段应作为单独的列添加到数据集中。对于如何获取该信息并添加为列的字段，您有什么建议吗

任何帮助都是非常受欢迎的

试试看：

tab = soup.select("table.table.table-custom.table-striped")
dat = tab[0].select('tr')
for d in dat:
    row = d.select('td')
    print(row[0].text,' ',row[1].text)

输出：

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom

如果只想输出5个特定条目，请使用以下命令：

tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
    item = t.select('td')
    if len(item)==2 and item[0].text in targets:
        print(item[0].text, ' ', item[1].text)

输出：

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom

尝试：

输出：

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom

如果只想输出5个特定条目，请使用以下命令：

tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
    item = t.select('td')
    if len(item)==2 and item[0].text in targets:
        print(item[0].text, ' ', item[1].text)

输出：

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom

杰克，非常感谢你的帮助。@Val很高兴你的帮助！嗨，Jack，有一个简单的问题：我如何迭代不同的查询？我尝试了以下方法：

querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get（'https://www.urlvoid.com/scan/“+x+”.it/”）soup=BeautifulSoup（r.content，'lxml'）tab=soup.select（“table.table.table.table自定义.表条带化”）dat=tab[0]。select（'tr'）对于dat中的d:row=d.select（'td'）打印（row[0].text'，row[1].text）c+=1

，但它不起作用，因为

dat=tab[0]。select（'tr'）

超出范围。@Val在

请求中。get（'https://www.urlvoid.com/scan/“+x+”.it/”）

，什么是

。最后是？例如，你在找bbc.com.it
？@Val-我在找；检查答案。非常感谢你的帮助，杰克。@Val很高兴它能帮上忙！嗨，Jack，有一个简单的问题：我如何迭代不同的查询？我尝试了以下方法：querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get（'https://www.urlvoid.com/scan/“+x+”.it/”）soup=BeautifulSoup（r.content，'lxml'）tab=soup.select（“table.table.table.table自定义.表条带化”）dat=tab[0]。select（'tr'）对于dat中的d:row=d.select（'td'）打印（row[0].text'，row[1].text）c+=1
，但它不起作用，因为dat=tab[0]。select（'tr'）
超出范围。@Val在请求中。get（'https://www.urlvoid.com/scan/“+x+”.it/”）
，什么是。最后是？例如，你在找bbc.com.it
？@Val-我在找；检查答案。