Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python3获取URL创建日期、更新日期和过期日期_Python - Fatal编程技术网

Python3获取URL创建日期、更新日期和过期日期

Python3获取URL创建日期、更新日期和过期日期,python,Python,我有一个URL数据框。我的编码旨在使用机器学习将url分类为良性或恶意。 我想使用基于主机的功能来获取url创建日期、上次更新日期和whois包的过期日期。但它显示出错误。 谁能帮我修一下吗 下面是我的代码和错误,如下所示 # URL DataFrame URL Lable 0 http://ovaismirza-politicalthoughts.blogspot.com/ 0 1 http://www.bluemoontea.com/ 0 2 http://www.viett

我有一个URL数据框。我的编码旨在使用机器学习将url分类为良性或恶意。 我想使用基于主机的功能来获取url创建日期、上次更新日期和whois包的过期日期。但它显示出错误。 谁能帮我修一下吗

下面是我的代码和错误,如下所示

# URL DataFrame
URL Lable
0   http://ovaismirza-politicalthoughts.blogspot.com/   0
1   http://www.bluemoontea.com/ 0
2   http://www.viettiles.com/public/default/ckedit...   1
3   http://173.212.217.250/hescientiststravelled/o...   1
4   http://www.hole-in-the-wall.com/    0


### Code    
date = []
for i in range(len(df)):
item = df["URL"].loc[i]
domain = urlparse(item).netloc
cr = whois.query(domain).creation_date
up = whois.query(domain).last_updated
exp = whois.query(domain).expiration_date
if cr is not None and up is not None and exp is not None:
    date.append(0)
else:
    date.append(1)

### ErrorException                                 
Traceback (most recent call last)
<ipython-input-26-0d7930e66020> in <module>
  3     item = df["URL"].loc[i]
  4     domain = urlparse(item).netloc
----> 5     cr = whois.query(domain).creation_date
  6     up = whois.query(domain).last_updated
  7     exp = whois.query(domain).expiration_date

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/__init__.py in query(domain, force, cache_file, slow_down, ignore_returncode)
 48 
 49         while 1:
---> 50                 pd = do_parse(do_query(d, force, cache_file, slow_down, ignore_returncode), tld)
 51                 if (not pd or not pd['domain_name'][0]) and len(d) > 2: d = d[1:]
 52                 else: break

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/_1_query.py in do_query(dl, force, cache_file, slow_down, ignore_returncode)
 42         CACHE[k] = (
 43                         int(time.time()),
---> 44                         _do_whois_query(dl, ignore_returncode),
 45         )
 46                 if cache_file: cache_save(cache_file)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois/_1_query.py in _do_whois_query(dl, ignore_returncode)
 59         r = p.communicate()[0]
 60         r = r.decode() if PYTHON_VERSION == 3 else r
---> 61         if not ignore_returncode and p.returncode != 0: raise Exception(r)
 62         return r
 63 

Exception: whois: connect(): Operation timed out
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object

refer:        whois.verisign-grs.com

domain:       COM

organisation: VeriSign Global Registry Services
address:      12061 Bluemont Way
address:      Reston Virginia 20190
address:      United States

contact:      administrative
name:         Registry Customer Service
organisation: VeriSign Global Registry Services
address:      12061 Bluemont Way
address:      Reston Virginia 20190
address:      United States
phone:        +1 703 925-6999
fax-no:       +1 703 948 3978
e-mail:       info@verisign-grs.com

contact:      technical
name:         Registry Customer Service
organisation: VeriSign Global Registry Services
address:      12061 Bluemont Way
address:      Reston Virginia 20190
address:      United States
phone:        +1 703 925-6999
fax-no:       +1 703 948 3978
e-mail:       info@verisign-grs.com

nserver:      A.GTLD-SERVERS.NET 192.5.6.30 2001:503:a83e:0:0:0:2:30
nserver:      B.GTLD-SERVERS.NET 192.33.14.30 2001:503:231d:0:0:0:2:30
nserver:      C.GTLD-SERVERS.NET 192.26.92.30 2001:503:83eb:0:0:0:0:30
nserver:      D.GTLD-SERVERS.NET 192.31.80.30 2001:500:856e:0:0:0:0:30
nserver:      E.GTLD-SERVERS.NET 192.12.94.30 2001:502:1ca1:0:0:0:0:30
nserver:      F.GTLD-SERVERS.NET 192.35.51.30 2001:503:d414:0:0:0:0:30
nserver:      G.GTLD-SERVERS.NET 192.42.93.30 2001:503:eea3:0:0:0:0:30
nserver:      H.GTLD-SERVERS.NET 192.54.112.30 2001:502:8cc:0:0:0:0:30
nserver:      I.GTLD-SERVERS.NET 192.43.172.30 2001:503:39c1:0:0:0:0:30
nserver:      J.GTLD-SERVERS.NET 192.48.79.30 2001:502:7094:0:0:0:0:30
nserver:      K.GTLD-SERVERS.NET 192.52.178.30 2001:503:d2d:0:0:0:0:30
nserver:      L.GTLD-SERVERS.NET 192.41.162.30 2001:500:d937:0:0:0:0:30
nserver:      M.GTLD-SERVERS.NET 192.55.83.30 2001:501:b1f9:0:0:0:0:30
ds-rdata:     30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CFC41A5766

whois:        whois.verisign-grs.com

status:       ACTIVE
remarks:      Registration information: http://www.verisigninc.com

created:      1985-01-01
changed:      2017-10-05
source:       IANA

   Domain Name: VIETTILES.COM
   Registry Domain ID: 1827514943_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.pavietnam.vn
   Registrar URL: http://www.pavietnam.vn
   Updated Date: 2018-09-07T01:13:32Z
   Creation Date: 2013-09-14T04:35:12Z
   Registry Expiry Date: 2019-09-14T04:35:12Z
   Registrar: P.A. Viet Nam Company Limited
   Registrar IANA ID: 1649
   Registrar Abuse Contact Email: abuse@pavietnam.vn
   Registrar Abuse Contact Phone: +84.19009477
   Domain Status: clientTransferProhibited       https://icann.org/epp#clientTransferProhibited
   Name Server: NS1.PAVIETNAM.VN
   Name Server: NS2.PAVIETNAM.VN
   Name Server: NSBAK.PAVIETNAM.NET
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
#URL数据帧
URL标签
0http://ovaismirza-politicalthoughts.blogspot.com/   0
1.http://www.bluemoontea.com/ 0
2.http://www.viettiles.com/public/default/ckedit...   1.
3.http://173.212.217.250/hescientiststravelled/o...   1.
4.http://www.hole-in-the-wall.com/    0
###代码
日期=[]
对于范围内的i(len(df)):
item=df[“URL”].loc[i]
domain=urlparse(item).netloc
cr=whois.query(域).creation\u日期
up=whois.query(域)。上次更新
exp=whois.query(域).expiration\u日期
如果cr不是None,up不是None,exp不是None:
日期。追加(0)
其他:
日期。附加(1)
###错误例外
回溯(最近一次呼叫最后一次)
在里面
3项目=df[“URL”].loc[i]
4域=urlparse(项).netloc
---->5 cr=whois.query(域).creation\u日期
6 up=whois.query(域)。上次更新
7 exp=whois.query(域).expiration\u日期
/查询中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site packages/whois/__init___u;.py(域、强制、缓存文件、减速、忽略返回代码)
48
49而1:
--->50 pd=do_parse(do_查询(d、force、cache_文件、slow_down、ignore_returncode)、tld)
51如果(非pd或非pd['domain_name'][0])和len(d)>2:d=d[1:]
52其他:休息
/do\u query中的Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois//u 1\u query.py(dl、强制、缓存文件、减速、忽略返回代码)
42高速缓存[k]=(
43 int(time.time()),
--->44_do_whois_查询(dl,忽略返回代码),
45         )
46如果缓存文件:缓存保存(缓存文件)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/whois//u 1\u query.py in\u do\u whois\u query(dl,忽略返回代码)
59 r=p.通信()[0]
60 r=r.decode()如果PYTHON_版本==3,则为r
--->61如果不忽略返回代码和返回代码!=0:引发异常(r)
62返回r
63
异常:whois:connect():操作超时
%谁是服务器
%有关IANA的更多信息,请访问http://www.iana.org
%此查询返回1个对象
参考:whois.verisign-grs.com
域名:COM
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
联系人:行政部
姓名:注册处客户服务
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
电话:+1703925-6999
传真号码:+1703 948 3978
电邮:info@verisign-grs.com
联系人:技术人员
姓名:注册处客户服务
组织机构:VeriSign全球注册服务
地址:布鲁蒙特路12061号
地址:弗吉尼亚州雷斯顿20190
地址:美国
电话:+1703925-6999
传真号码:+1703 948 3978
电邮:info@verisign-grs.com
N服务器:A.GTLD-SERVERS.NET 192.5.6.30 2001:503:a83e:0:0:2:30
n服务器:B.GTLD-SERVERS.NET 192.33.14.30 2001:503:231d:0:0:2:30
n服务器:C.GTLD-SERVERS.NET 192.26.92.30 2001:503:83eb:0:0:0:30
n服务器:D.GTLD-SERVERS.NET 192.31.80.30 2001:500:856e:0:0:0:30
N服务器:E.GTLD-SERVERS.NET 192.12.94.30 2001:502:1ca1:0:0:0:0:30
N服务器:F.GTLD-SERVERS.NET 192.35.51.30 2001:503:d414:0:0:0:0:30
n服务器:G.GTLD-SERVERS.NET 192.42.93.30 2001:503:eea3:0:0:0:0:30
nserver:H.GTLD-SERVERS.NET 192.54.112.30 2001:502:8cc:0:0:0:30
N服务器:I.GTLD-SERVERS.NET 192.43.172.30 2001:503:39c1:0:0:0:30
n服务器:J.GTLD-SERVERS.NET 192.48.79.30 2001:502:7094:0:0:0:0:30
nserver:K.GTLD-SERVERS.NET 192.52.178.30 2001:503:d2d:0:0:0:30
N服务器:L.GTLD-SERVERS.NET 192.41.162.30 2001:500:d937:0:0:0:0:30
N服务器:M.GTLD-SERVERS.NET 192.55.83.30 2001:501:b1f9:0:0:0:0:30
ds rdata:309098 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CFC41A5766
whois:whois.verisign-grs.com
状态:活动
备注:注册资料:http://www.verisigninc.com
创建日期:1985-01-01
变更日期:2017-10-05
资料来源:IANA
域名:VIETTILES.COM
注册域ID:1827514943_Domain_COM-VRSN
注册商WHOIS服务器:WHOIS.pavietnam.vn
注册人网址:http://www.pavietnam.vn
更新日期:2018-09-07T01:13:32Z
创建日期:2013-09-14T04:35:12Z
注册有效期:2019-09-14T04:35:12Z
注册官:P.A.越南有限公司
注册主任IANA ID:1649
注册处处长联络电邮:abuse@pavietnam.vn
登记员滥用联系电话:+84.19009477
域状态:ClientTransferProbledhttps://icann.org/epp#clientTransferProhibited
名称服务器:NS1.PAVIETNAM.VN
名称服务器:NS2.PAVIETNAM.VN
名称服务器:NSBAK.PAVIETNAM.NET
DNSSEC:未签名
ICANN不准确投诉表的URL:https://www.icann.org/wicf/
whois数据库的最新更新:2018-12-25T13:33:54Z