Python3获取URL创建日期、更新日期和过期日期
我有一个URL数据框。我的编码旨在使用机器学习将url分类为良性或恶意。 我想使用基于主机的功能来获取url创建日期、上次更新日期和whois包的过期日期。但它显示出错误。 谁能帮我修一下吗 下面是我的代码和错误,如下所示Python3获取URL创建日期、更新日期和过期日期,python,Python,我有一个URL数据框。我的编码旨在使用机器学习将url分类为良性或恶意。 我想使用基于主机的功能来获取url创建日期、上次更新日期和whois包的过期日期。但它显示出错误。 谁能帮我修一下吗 下面是我的代码和错误,如下所示 # URL DataFrame URL Lable 0 http://ovaismirza-politicalthoughts.blogspot.com/ 0 1 http://www.bluemoontea.com/ 0 2 http://www.viett
# URL DataFrame
URL Lable
0 http://ovaismirza-politicalthoughts.blogspot.com/ 0
1 http://www.bluemoontea.com/ 0
2 http://www.viettiles.com/public/default/ckedit... 1
3 http://173.212.217.250/hescientiststravelled/o... 1
4 http://www.hole-in-the-wall.com/ 0
### Code
date = []
for i in range(len(df)):
item = df["URL"].loc[i]
domain = urlparse(item).netloc
cr = whois.query(domain).creation_date
up = whois.query(domain).last_updated
exp = whois.query(domain).expiration_date
if cr is not None and up is not None and exp is not None:
date.append(0)
else:
date.append(1)
### ErrorException
Traceback (most recent call last)
<ipython-input-26-0d7930e66020> in <module>
3 item = df["URL"].loc[i]
4 domain = urlparse(item).netloc
----> 5 cr = whois.query(domain).creation_date
6 up = whois.query(domain).last_updated
7 exp = whois.query(domain).expiration_date
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/__init__.py in query(domain, force, cache_file, slow_down, ignore_returncode)
48
49 while 1:
---> 50 pd = do_parse(do_query(d, force, cache_file, slow_down, ignore_returncode), tld)
51 if (not pd or not pd['domain_name'][0]) and len(d) > 2: d = d[1:]
52 else: break
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/_1_query.py in do_query(dl, force, cache_file, slow_down, ignore_returncode)
42 CACHE[k] = (
43 int(time.time()),
---> 44 _do_whois_query(dl, ignore_returncode),
45 )
46 if cache_file: cache_save(cache_file)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/_1_query.py in _do_whois_query(dl, ignore_returncode)
59 r = p.communicate()[0]
60 r = r.decode() if PYTHON_VERSION == 3 else r
---> 61 if not ignore_returncode and p.returncode != 0: raise Exception(r)
62 return r
63
Exception: whois: connect(): Operation timed out
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
refer: whois.verisign-grs.com
domain: COM
organisation: VeriSign Global Registry Services
address: 12061 Bluemont Way
address: Reston Virginia 20190
address: United States
contact: administrative
name: Registry Customer Service
organisation: VeriSign Global Registry Services
address: 12061 Bluemont Way
address: Reston Virginia 20190
address: United States
phone: +1 703 925-6999
fax-no: +1 703 948 3978
e-mail: info@verisign-grs.com
contact: technical
name: Registry Customer Service
organisation: VeriSign Global Registry Services
address: 12061 Bluemont Way
address: Reston Virginia 20190
address: United States
phone: +1 703 925-6999
fax-no: +1 703 948 3978
e-mail: info@verisign-grs.com
nserver: A.GTLD-SERVERS.NET 192.5.6.30 2001:503:a83e:0:0:0:2:30
nserver: B.GTLD-SERVERS.NET 192.33.14.30 2001:503:231d:0:0:0:2:30
nserver: C.GTLD-SERVERS.NET 192.26.92.30 2001:503:83eb:0:0:0:0:30
nserver: D.GTLD-SERVERS.NET 192.31.80.30 2001:500:856e:0:0:0:0:30
nserver: E.GTLD-SERVERS.NET 192.12.94.30 2001:502:1ca1:0:0:0:0:30
nserver: F.GTLD-SERVERS.NET 192.35.51.30 2001:503:d414:0:0:0:0:30
nserver: G.GTLD-SERVERS.NET 192.42.93.30 2001:503:eea3:0:0:0:0:30
nserver: H.GTLD-SERVERS.NET 192.54.112.30 2001:502:8cc:0:0:0:0:30
nserver: I.GTLD-SERVERS.NET 192.43.172.30 2001:503:39c1:0:0:0:0:30
nserver: J.GTLD-SERVERS.NET 192.48.79.30 2001:502:7094:0:0:0:0:30
nserver: K.GTLD-SERVERS.NET 192.52.178.30 2001:503:d2d:0:0:0:0:30
nserver: L.GTLD-SERVERS.NET 192.41.162.30 2001:500:d937:0:0:0:0:30
nserver: M.GTLD-SERVERS.NET 192.55.83.30 2001:501:b1f9:0:0:0:0:30
ds-rdata: 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CFC41A5766
whois: whois.verisign-grs.com
status: ACTIVE
remarks: Registration information: http://www.verisigninc.com
created: 1985-01-01
changed: 2017-10-05
source: IANA
Domain Name: VIETTILES.COM
Registry Domain ID: 1827514943_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.pavietnam.vn
Registrar URL: http://www.pavietnam.vn
Updated Date: 2018-09-07T01:13:32Z
Creation Date: 2013-09-14T04:35:12Z
Registry Expiry Date: 2019-09-14T04:35:12Z
Registrar: P.A. Viet Nam Company Limited
Registrar IANA ID: 1649
Registrar Abuse Contact Email: abuse@pavietnam.vn
Registrar Abuse Contact Phone: +84.19009477
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Name Server: NS1.PAVIETNAM.VN
Name Server: NS2.PAVIETNAM.VN
Name Server: NSBAK.PAVIETNAM.NET
DNSSEC: unsigned
URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
#URL数据帧
URL标签
0http://ovaismirza-politicalthoughts.blogspot.com/ 0
1.http://www.bluemoontea.com/ 0
2.http://www.viettiles.com/public/default/ckedit... 1.
3.http://173.212.217.250/hescientiststravelled/o... 1.
4.http://www.hole-in-the-wall.com/ 0
###代码
日期=[]
对于范围内的i(len(df)):
item=df[“URL”].loc[i]
domain=urlparse(item).netloc
cr=whois.query(域).creation\u日期
up=whois.query(域)。上次更新
exp=whois.query(域).expiration\u日期
如果cr不是None,up不是None,exp不是None:
日期。追加(0)
其他:
日期。附加(1)
###错误例外
回溯(最近一次呼叫最后一次)
在里面
3项目=df[“URL”].loc[i]
4域=urlparse(项).netloc
---->5 cr=whois.query(域).creation\u日期
6 up=whois.query(域)。上次更新
7 exp=whois.query(域).expiration\u日期
/查询中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site packages/whois/__init___u;.py(域、强制、缓存文件、减速、忽略返回代码)
48
49而1:
--->50 pd=do_parse(do_查询(d、force、cache_文件、slow_down、ignore_returncode)、tld)
51如果(非pd或非pd['domain_name'][0])和len(d)>2:d=d[1:]
52其他:休息
/do\u query中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois//u 1\u query.py(dl、强制、缓存文件、减速、忽略返回代码)
42高速缓存[k]=(
43 int(time.time()),
--->44_do_whois_查询(dl,忽略返回代码),
45 )
46如果缓存文件:缓存保存(缓存文件)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois//u 1\u query.py in\u do\u whois\u query(dl,忽略返回代码)
59 r=p.通信()[0]
60 r=r.decode()如果PYTHON_版本==3,则为r
--->61如果不忽略返回代码和返回代码!=0:引发异常(r)
62返回r
63
异常:whois:connect():操作超时
%谁是服务器
%有关IANA的更多信息,请访问http://www.iana.org
%此查询返回1个对象
参考:whois.verisign-grs.com
域名:COM
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
联系人:行政部
姓名:注册处客户服务
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
电话:+1703925-6999
传真号码:+1703 948 3978
电邮:info@verisign-grs.com
联系人:技术人员
姓名:注册处客户服务
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
电话:+1703925-6999
传真号码:+1703 948 3978
电邮:info@verisign-grs.com
N服务器:A.GTLD-SERVERS.NET 192.5.6.30 2001:503:a83e:0:0:2:30
n服务器:B.GTLD-SERVERS.NET 192.33.14.30 2001:503:231d:0:0:2:30
n服务器:C.GTLD-SERVERS.NET 192.26.92.30 2001:503:83eb:0:0:0:30
n服务器:D.GTLD-SERVERS.NET 192.31.80.30 2001:500:856e:0:0:0:30
N服务器:E.GTLD-SERVERS.NET 192.12.94.30 2001:502:1ca1:0:0:0:0:30
N服务器:F.GTLD-SERVERS.NET 192.35.51.30 2001:503:d414:0:0:0:0:30
n服务器:G.GTLD-SERVERS.NET 192.42.93.30 2001:503:eea3:0:0:0:0:30
nserver:H.GTLD-SERVERS.NET 192.54.112.30 2001:502:8cc:0:0:0:30
N服务器:I.GTLD-SERVERS.NET 192.43.172.30 2001:503:39c1:0:0:0:30
n服务器:J.GTLD-SERVERS.NET 192.48.79.30 2001:502:7094:0:0:0:0:30
nserver:K.GTLD-SERVERS.NET 192.52.178.30 2001:503:d2d:0:0:0:30
N服务器:L.GTLD-SERVERS.NET 192.41.162.30 2001:500:d937:0:0:0:0:30
N服务器:M.GTLD-SERVERS.NET 192.55.83.30 2001:501:b1f9:0:0:0:0:30
ds rdata:309098 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CFC41A5766
whois:whois.verisign-grs.com
状态:活动
备注:注册资料:http://www.verisigninc.com
创建日期:1985-01-01
变更日期:2017-10-05
资料来源:IANA
域名:VIETTILES.COM
注册域ID:1827514943_Domain_COM-VRSN
注册商WHOIS服务器:WHOIS.pavietnam.vn
注册人网址:http://www.pavietnam.vn
更新日期:2018-09-07T01:13:32Z
创建日期:2013-09-14T04:35:12Z
注册有效期:2019-09-14T04:35:12Z
注册官:P.A.越南有限公司
注册主任IANA ID:1649
注册处处长联络电邮:abuse@pavietnam.vn
登记员滥用联系电话:+84.19009477
域状态:ClientTransferProbledhttps://icann.org/epp#clientTransferProhibited
名称服务器:NS1.PAVIETNAM.VN
名称服务器:NS2.PAVIETNAM.VN
名称服务器:NSBAK.PAVIETNAM.NET
DNSSEC:未签名
ICANN不准确投诉表的URL:https://www.icann.org/wicf/
whois数据库的最新更新:2018-12-25T13:33:54Z