Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/google-sheets/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup删除信息_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup删除信息

Python 使用BeautifulSoup删除信息,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我需要获得有关以下字段的一些信息: Website Address Last Analysis Blacklist Status Domain Registration Server Location 从本网站: 我使用requests和BeautifulSoup访问网站并获取信息: import requests from bs4 import BeautifulSoup r = requests.get('https://www.urlvoid.com/scan/gordonrams

我需要获得有关以下字段的一些信息:

Website Address 
Last Analysis
Blacklist Status
Domain Registration
Server Location
从本网站:

我使用requests和BeautifulSoup访问网站并获取信息:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.urlvoid.com/scan/gordonramsay.com/')
soup = BeautifulSoup(r.content, 'lxml')
但是,我无法选择这些字段。 这些字段应作为单独的列添加到数据集中。 对于如何获取该信息并添加为列的字段,您有什么建议吗

任何帮助都是非常受欢迎的

试试看:

tab = soup.select("table.table.table-custom.table-striped")
dat = tab[0].select('tr')
for d in dat:
    row = d.select('td')
    print(row[0].text,' ',row[1].text)
输出:

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire
Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom
如果只想输出5个特定条目,请使用以下命令:

tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
    item = t.select('td')
    if len(item)==2 and item[0].text in targets:
        print(item[0].text, ' ', item[1].text)
输出:

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire
Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom
尝试:

输出:

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire
Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom
如果只想输出5个特定条目,请使用以下命令:

tab2 = soup.select("table.table.table-custom.table-striped tr")
targets = ['Website Address', 'Last Analysis', 'Blacklist Status', 'Domain Registration', 'Server Location']
for t in tab2:
    item = t.select('td')
    if len(item)==2 and item[0].text in targets:
        print(item[0].text, ' ', item[1].text)
输出:

Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Domain Information    WHOIS Lookup | DNS Records | Ping
IP Address   89.206.225.168   Find Websites  |  IPVoid  |  Whois
Reverse DNS   unallocated.star.net.uk
ASN   AS6656 Star Technology Services Limited
Server Location    (GB) United Kingdom
Latitude\Longitude   51.9864 / -4.5578    Google Map
City   Star
Region   Pembrokeshire
Website Address   Gordonramsay.com
Last Analysis   5 years ago  |   Rescan
Blacklist Status   0/34
Domain Registration   2000-02-03 | 20 years ago
Server Location    (GB) United Kingdom

杰克,非常感谢你的帮助。@Val很高兴你的帮助!嗨,Jack,有一个简单的问题:我如何迭代不同的查询?我尝试了以下方法:
querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get('https://www.urlvoid.com/scan/“+x+”.it/”)soup=BeautifulSoup(r.content,'lxml')tab=soup.select(“table.table.table.table自定义.表条带化”)dat=tab[0]。select('tr')对于dat中的d:row=d.select('td')打印(row[0].text',row[1].text)c+=1
,但它不起作用,因为
dat=tab[0]。select('tr')
超出范围。@Val在
请求中。get('https://www.urlvoid.com/scan/“+x+”.it/”)
,什么是
。最后是
?例如,你在找bbc.com.it
?@Val-我在找;检查答案。非常感谢你的帮助,杰克。@Val很高兴它能帮上忙!嗨,Jack,有一个简单的问题:我如何迭代不同的查询?我尝试了以下方法:
querys=['bbc.com'、'bbc.co.uk'、'thesun']用于查询中的x:query=xr=requests.get('https://www.urlvoid.com/scan/“+x+”.it/”)soup=BeautifulSoup(r.content,'lxml')tab=soup.select(“table.table.table.table自定义.表条带化”)dat=tab[0]。select('tr')对于dat中的d:row=d.select('td')打印(row[0].text',row[1].text)c+=1
,但它不起作用,因为
dat=tab[0]。select('tr')
超出范围。@Val在
请求中。get('https://www.urlvoid.com/scan/“+x+”.it/”)
,什么是
。最后是
?例如,你在找bbc.com.it
?@Val-我在找;检查答案。