python whois库对活动域名不返回任何内容
我正在尝试使用python whois库来收集一些网站的whois记录 问题是,我没有得到一些网站,如nih.gov,这是一个活跃的域名python whois库对活动域名不返回任何内容,python,dns,records,whois,Python,Dns,Records,Whois,我正在尝试使用python whois库来收集一些网站的whois记录 问题是,我没有得到一些网站,如nih.gov,这是一个活跃的域名 w = whois.whois("nih.gov") print w {u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_
w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}
我不明白问题出在哪里,我应该使用哪个库或如何使用它来涵盖所有情况?以下是一些代码
import sys
import socket
from datetime import datetime as dt
import time
def whois(ip):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("whois.arin.net", 43))
s.send(('n ' + ip + '\r\n').encode())
response = b""
# setting time limit in secondsmd
startTime = time.mktime(dt.now().timetuple())
timeLimit = 3
while True:
elapsedTime = time.mktime(dt.now().timetuple()) - startTime
data = s.recv(4096)
response += data
if (not data) or (elapsedTime >= timeLimit):
break
s.close()
print(response.decode())
def main():
domain = sys.argv[1];
ip = socket.gethostbyname(domain);
whois(ip)
main()
例如:
c:\Temp>py test.py www.google.com
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 216.58.192.0 - 216.58.223.255
CIDR: 216.58.192.0/19
NetName: GOOGLE
NetHandle: NET-216-58-192-0-1
Parent: NET216 (NET-216-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google LLC (GOGL)
RegDate: 2012-01-27
Updated: 2012-01-27
Ref: https://whois.arin.net/rest/net/NET-216-58-192-0-1
OrgName: Google LLC
OrgId: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
RegDate: 2000-03-30
Updated: 2017-12-21
Ref: https://whois.arin.net/rest/org/GOGL
OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName: Abuse
OrgAbusePhone: +1-650-253-0000
OrgAbuseEmail: network-abuse@google.com
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN
OrgTechHandle: ZG39-ARIN
OrgTechName: Google LLC
OrgTechPhone: +1-650-253-0000
OrgTechEmail: arin-contact@google.com
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
特别是www.nih.gov,我们得到:
c:\Temp>py test.py www.nih.gov
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 23.20.0.0 - 23.23.255.255
CIDR: 23.20.0.0/14
NetName: AMAZON-EC2-USEAST-10
NetHandle: NET-23-20-0-0-1
Parent: NET23 (NET-23-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS16509
Organization: Amazon.com, Inc. (AMAZO-4)
RegDate: 2011-09-19
Updated: 2014-09-03
Comment: The activity you have detected originates from a dynamic hosting environment.
Comment: For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment: For more information regarding EC2 see:
Comment: http://ec2.amazonaws.com/
Comment: All reports MUST include:
Comment: * src IP
Comment: * dest IP (your IP)
Comment: * dest port
Comment: * Accurate date/timestamp and timezone of activity
Comment: * Intensity/frequency (short log extracts)
Comment: * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref: https://whois.arin.net/rest/net/NET-23-20-0-0-1
OrgName: Amazon.com, Inc.
OrgId: AMAZO-4
Address: Amazon Web Services, Inc.
Address: P.O. Box 81226
City: Seattle
StateProv: WA
PostalCode: 98108-1226
Country: US
RegDate: 2005-09-29
Updated: 2017-01-28
Comment: For details of this service please see
Comment: http://ec2.amazonaws.com/
Ref: https://whois.arin.net/rest/org/AMAZO-4
OrgAbuseHandle: AEA8-ARIN
OrgAbuseName: Amazon EC2 Abuse
OrgAbusePhone: +1-206-266-4064
OrgAbuseEmail: abuse@amazonaws.com
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN
OrgTechHandle: ANO24-ARIN
OrgTechName: Amazon EC2 Network Operations
OrgTechPhone: +1-206-266-4064
OrgTechEmail: amzn-noc-contact@amazon.com
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN
OrgNOCHandle: AANO1-ARIN
OrgNOCName: Amazon AWS Network Operations
OrgNOCPhone: +1-206-266-4064
OrgNOCEmail: amzn-noc-contact@amazon.com
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
不同的选项
这是另一个选择
这段代码在脚本文件夹中创建了一个文件,其中包含来自不同服务的whois请求的HTML。你可以修改它以满足你的需要,我刚刚写了一些基础知识
import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys
def writeFile(text):
with io.open('whoisData.txt', "w", encoding="utf-8") as f:
f.write(text)
f.close()
def readHTML(domain):
url = 'https://www.whois.com/whois/' + domain
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
writeFile(text)
def main():
domain = sys.argv[1]
readHTML(domain)
main()
从(解析HTMLs)中获取了一些参考。这里有一些代码可以完成这项工作
import sys
import socket
from datetime import datetime as dt
import time
def whois(ip):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("whois.arin.net", 43))
s.send(('n ' + ip + '\r\n').encode())
response = b""
# setting time limit in secondsmd
startTime = time.mktime(dt.now().timetuple())
timeLimit = 3
while True:
elapsedTime = time.mktime(dt.now().timetuple()) - startTime
data = s.recv(4096)
response += data
if (not data) or (elapsedTime >= timeLimit):
break
s.close()
print(response.decode())
def main():
domain = sys.argv[1];
ip = socket.gethostbyname(domain);
whois(ip)
main()
例如:
c:\Temp>py test.py www.google.com
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 216.58.192.0 - 216.58.223.255
CIDR: 216.58.192.0/19
NetName: GOOGLE
NetHandle: NET-216-58-192-0-1
Parent: NET216 (NET-216-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google LLC (GOGL)
RegDate: 2012-01-27
Updated: 2012-01-27
Ref: https://whois.arin.net/rest/net/NET-216-58-192-0-1
OrgName: Google LLC
OrgId: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US
RegDate: 2000-03-30
Updated: 2017-12-21
Ref: https://whois.arin.net/rest/org/GOGL
OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName: Abuse
OrgAbusePhone: +1-650-253-0000
OrgAbuseEmail: network-abuse@google.com
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN
OrgTechHandle: ZG39-ARIN
OrgTechName: Google LLC
OrgTechPhone: +1-650-253-0000
OrgTechEmail: arin-contact@google.com
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
特别是www.nih.gov,我们得到:
c:\Temp>py test.py www.nih.gov
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#
NetRange: 23.20.0.0 - 23.23.255.255
CIDR: 23.20.0.0/14
NetName: AMAZON-EC2-USEAST-10
NetHandle: NET-23-20-0-0-1
Parent: NET23 (NET-23-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS16509
Organization: Amazon.com, Inc. (AMAZO-4)
RegDate: 2011-09-19
Updated: 2014-09-03
Comment: The activity you have detected originates from a dynamic hosting environment.
Comment: For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment: For more information regarding EC2 see:
Comment: http://ec2.amazonaws.com/
Comment: All reports MUST include:
Comment: * src IP
Comment: * dest IP (your IP)
Comment: * dest port
Comment: * Accurate date/timestamp and timezone of activity
Comment: * Intensity/frequency (short log extracts)
Comment: * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref: https://whois.arin.net/rest/net/NET-23-20-0-0-1
OrgName: Amazon.com, Inc.
OrgId: AMAZO-4
Address: Amazon Web Services, Inc.
Address: P.O. Box 81226
City: Seattle
StateProv: WA
PostalCode: 98108-1226
Country: US
RegDate: 2005-09-29
Updated: 2017-01-28
Comment: For details of this service please see
Comment: http://ec2.amazonaws.com/
Ref: https://whois.arin.net/rest/org/AMAZO-4
OrgAbuseHandle: AEA8-ARIN
OrgAbuseName: Amazon EC2 Abuse
OrgAbusePhone: +1-206-266-4064
OrgAbuseEmail: abuse@amazonaws.com
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN
OrgTechHandle: ANO24-ARIN
OrgTechName: Amazon EC2 Network Operations
OrgTechPhone: +1-206-266-4064
OrgTechEmail: amzn-noc-contact@amazon.com
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN
OrgNOCHandle: AANO1-ARIN
OrgNOCName: Amazon AWS Network Operations
OrgNOCPhone: +1-206-266-4064
OrgNOCEmail: amzn-noc-contact@amazon.com
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN
#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#
不同的选项
这是另一个选择
这段代码在脚本文件夹中创建了一个文件,其中包含来自不同服务的whois请求的HTML。你可以修改它以满足你的需要,我刚刚写了一些基础知识
import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys
def writeFile(text):
with io.open('whoisData.txt', "w", encoding="utf-8") as f:
f.write(text)
f.close()
def readHTML(domain):
url = 'https://www.whois.com/whois/' + domain
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
writeFile(text)
def main():
domain = sys.argv[1]
readHTML(domain)
main()
从(解析HTMLs)中获取了一些参考。比较,比如。这似乎是
whois
提供的所有信息。比较一下,比如。看来这是whois提供的所有信息。谢谢@oBit91,我会用的。效率很高。还有一件事,当我使用python whois库时,我可以提取whois响应的注册记录。我们可以通过response.decode()提取注册器名称吗?很抱歉,我现在无法访问我的笔记本电脑write。但它似乎不返回whois记录。我使用python test.py bbc.com进行测试,打印的值与whois.whois(“bbc.com”)输出的值不同,后者更有效。@ShahroozPooryousef我添加了第二个选项,使用在线web解析HTML文件。无论哪种方式,您都可以看到最适合自己的内容。whois.arin.net
将用于IP地址,因此您可以使用解析的主机名(www.google.com
,在您的示例中)来查询它,而不是使用域名,实际上可能无法解析。至于使用whois.com这只是众多提供whois访问权限的公司之一,你应该确保阅读他们的TOS,如果他们不收集或使用你的数据,你应该感到高兴。。。(换言之:最好是查询注册表whois服务器,而不是第三方)谢谢@oBit91,我将使用它。效率很高。还有一件事,当我使用python whois库时,我可以提取whois响应的注册记录。我们可以通过response.decode()提取注册器名称吗?很抱歉,我现在无法访问我的笔记本电脑write。但它似乎不返回whois记录。我使用python test.py bbc.com进行测试,打印的值与whois.whois(“bbc.com”)输出的值不同,后者更有效。@ShahroozPooryousef我添加了第二个选项,使用在线web解析HTML文件。无论哪种方式,您都可以看到最适合自己的内容。whois.arin.net
将用于IP地址,因此您可以使用解析的主机名(www.google.com
,在您的示例中)来查询它,而不是使用域名,实际上可能无法解析。至于使用whois.com这只是众多提供whois访问权限的公司之一,你应该确保阅读他们的TOS,如果他们不收集或使用你的数据,你应该感到高兴。。。(换言之:最好查询注册表whois服务器,而不是第三方)