Python 使用BeautifulSoup删除信息
我需要获得有关以下字段的一些信息:Python 使用BeautifulSoup删除信息,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我需要获得有关以下字段的一些信息: Website Address Last Analysis Blacklist Status Domain Registration Server Location 从本网站: 我使用requests和BeautifulSoup访问网站并获取信息: import requests from bs4 import BeautifulSoup r = requests.get('https://www.urlvoid.com/scan/gordonrams
Website Address
Last Analysis
Blacklist Status
Domain Registration
Server Location
从本网站:
我使用requests和BeautifulSoup访问网站并获取信息:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.urlvoid.com/scan/gordonramsay.com/')
soup = BeautifulSoup(r.content, 'lxml')
但是,我无法选择这些字段。
这些字段应作为单独的列添加到数据集中。
对于如何获取该信息并添加为列的字段,您有什么建议吗
任何帮助都是非常受欢迎的 试试看:
tab = soup.select("table.table.table-custom.table-striped")
dat = tab[0].select('tr')
for d in dat:
row = d.select('td')
print(row[0].text,' ',row[1].text)
输出:
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Domain Information WHOIS Lookup | DNS Records | Ping
IP Address 89.206.225.168 Find Websites | IPVoid | Whois
Reverse DNS unallocated.star.net.uk
ASN AS6656 Star Technology Services Limited
Server Location (GB) United Kingdom
Latitude\Longitude 51.9864 / -4.5578 Google Map
City Star
Region Pembrokeshire
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Server Location (GB) United Kingdom
如果只想输出5个特定条目,请使用以下命令:
tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
item = t.select('td')
if len(item)==2 and item[0].text in targets:
print(item[0].text, ' ', item[1].text)
输出:
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Domain Information WHOIS Lookup | DNS Records | Ping
IP Address 89.206.225.168 Find Websites | IPVoid | Whois
Reverse DNS unallocated.star.net.uk
ASN AS6656 Star Technology Services Limited
Server Location (GB) United Kingdom
Latitude\Longitude 51.9864 / -4.5578 Google Map
City Star
Region Pembrokeshire
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Server Location (GB) United Kingdom
尝试:
输出:
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Domain Information WHOIS Lookup | DNS Records | Ping
IP Address 89.206.225.168 Find Websites | IPVoid | Whois
Reverse DNS unallocated.star.net.uk
ASN AS6656 Star Technology Services Limited
Server Location (GB) United Kingdom
Latitude\Longitude 51.9864 / -4.5578 Google Map
City Star
Region Pembrokeshire
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Server Location (GB) United Kingdom
如果只想输出5个特定条目,请使用以下命令:
tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
item = t.select('td')
if len(item)==2 and item[0].text in targets:
print(item[0].text, ' ', item[1].text)
输出:
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Domain Information WHOIS Lookup | DNS Records | Ping
IP Address 89.206.225.168 Find Websites | IPVoid | Whois
Reverse DNS unallocated.star.net.uk
ASN AS6656 Star Technology Services Limited
Server Location (GB) United Kingdom
Latitude\Longitude 51.9864 / -4.5578 Google Map
City Star
Region Pembrokeshire
Website Address Gordonramsay.com
Last Analysis 5 years ago | Rescan
Blacklist Status 0/34
Domain Registration 2000-02-03 | 20 years ago
Server Location (GB) United Kingdom
杰克,非常感谢你的帮助。@Val很高兴你的帮助!嗨,Jack,有一个简单的问题:我如何迭代不同的查询?我尝试了以下方法:
querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get('https://www.urlvoid.com/scan/“+x+”.it/”)soup=BeautifulSoup(r.content,'lxml')tab=soup.select(“table.table.table.table自定义.表条带化”)dat=tab[0]。select('tr')对于dat中的d:row=d.select('td')打印(row[0].text',row[1].text)c+=1
,但它不起作用,因为dat=tab[0]。select('tr')
超出范围。@Val在请求中。get('https://www.urlvoid.com/scan/“+x+”.it/”)
,什么是。最后是?例如,你在找bbc.com.it
?@Val-我在找;检查答案。非常感谢你的帮助,杰克。@Val很高兴它能帮上忙!嗨,Jack,有一个简单的问题:我如何迭代不同的查询?我尝试了以下方法:querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get('https://www.urlvoid.com/scan/“+x+”.it/”)soup=BeautifulSoup(r.content,'lxml')tab=soup.select(“table.table.table.table自定义.表条带化”)dat=tab[0]。select('tr')对于dat中的d:row=d.select('td')打印(row[0].text',row[1].text)c+=1
,但它不起作用,因为dat=tab[0]。select('tr')
超出范围。@Val在请求中。get('https://www.urlvoid.com/scan/“+x+”.it/”)
,什么是。最后是?例如,你在找bbc.com.it
?@Val-我在找;检查答案。