Python脚本,用于检查网站上的标记
我试图弄清楚如何编写一个网站监控脚本cron作业,最终打开一个给定的URL,检查标签是否存在,如果标签不存在,或者不包含预期的数据,然后将一些内容写入日志文件,或者发送电子邮件 标签应该是类似的或相对类似的Python脚本,用于检查网站上的标记,python,html,linux,scripting,crontab,Python,Html,Linux,Scripting,Crontab,我试图弄清楚如何编写一个网站监控脚本cron作业,最终打开一个给定的URL,检查标签是否存在,如果标签不存在,或者不包含预期的数据,然后将一些内容写入日志文件,或者发送电子邮件 标签应该是类似的或相对类似的 有人有什么想法吗?在我看来,你最好还是去看看。大概是这样的: import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen("http://yoursite.com") soup = Beauti
有人有什么想法吗?在我看来,你最好还是去看看。大概是这样的:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://yoursite.com")
soup = BeautifulSoup(page)
# See the docs on how to search through the soup. I'm not sure what
# you're looking for so my example stops here :)
在那之后,发邮件或写日志都是很标准的费用。你最好的选择是退房。大概是这样的:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://yoursite.com")
soup = BeautifulSoup(page)
# See the docs on how to search through the soup. I'm not sure what
# you're looking for so my example stops here :)
之后,通过电子邮件或日志发送该页面是非常标准的做法。以下未经测试的代码使用urllib2抓取页面并重新搜索
import urllib2,StringIO
pageString = urllib2.urlopen('**insert url here**').read()
m = re.search(r'**insert regex for the tag you want to find here**',pageString)
if m == None:
#take action for NOT found here
else:
#take action for found here
import pycurl,re,StringIO
b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
c.close()
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
#take action for NOT found here
else:
#take action for found here
以下未经测试的代码使用pycurl和StringIO抓取页面并重新搜索
import urllib2,StringIO
pageString = urllib2.urlopen('**insert url here**').read()
m = re.search(r'**insert regex for the tag you want to find here**',pageString)
if m == None:
#take action for NOT found here
else:
#take action for found here
import pycurl,re,StringIO
b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
c.close()
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
#take action for NOT found here
else:
#take action for found here
以下未经测试的代码使用urllib2抓取页面并重新搜索
import urllib2,StringIO
pageString = urllib2.urlopen('**insert url here**').read()
m = re.search(r'**insert regex for the tag you want to find here**',pageString)
if m == None:
#take action for NOT found here
else:
#take action for found here
import pycurl,re,StringIO
b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
c.close()
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
#take action for NOT found here
else:
#take action for found here
以下未经测试的代码使用pycurl和StringIO抓取页面并重新搜索
import urllib2,StringIO
pageString = urllib2.urlopen('**insert url here**').read()
m = re.search(r'**insert regex for the tag you want to find here**',pageString)
if m == None:
#take action for NOT found here
else:
#take action for found here
import pycurl,re,StringIO
b = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, '**insert url here**')
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
c.close()
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue())
if m == None:
#take action for NOT found here
else:
#take action for found here
这是一个未经测试的示例代码,用于记录和发送邮件:
#!/usr/bin/env python
import logging
import urllib2
import smtplib
#Log config
logging.basicConfig(filename='/tmp/yourscript.log',level=logging.INFO,)
#Open requested url
url = "http://yoursite.com/tags/yourTag"
data = urllib2.urlopen(url)
if check_content(data):
#Report to log
logging.info('Content found')
else:
#Send mail
send_mail('Content not found')
def check_content(data):
#Your BeautifulSoup logic here
return content_found
def send_mail(message_body):
server = 'localhost'
recipients = ['you@yourdomain.com']
sender = 'script@yourdomain.com'
message = 'From: %s \n Subject: script result \n\n %s' % (sender, message_body)
session = smtplib.SMTP(server)
session.sendmail(sender,recipients,message);
我将使用编写check_content函数这是一个未经测试的示例代码,用于记录和发送邮件:
#!/usr/bin/env python
import logging
import urllib2
import smtplib
#Log config
logging.basicConfig(filename='/tmp/yourscript.log',level=logging.INFO,)
#Open requested url
url = "http://yoursite.com/tags/yourTag"
data = urllib2.urlopen(url)
if check_content(data):
#Report to log
logging.info('Content found')
else:
#Send mail
send_mail('Content not found')
def check_content(data):
#Your BeautifulSoup logic here
return content_found
def send_mail(message_body):
server = 'localhost'
recipients = ['you@yourdomain.com']
sender = 'script@yourdomain.com'
message = 'From: %s \n Subject: script result \n\n %s' % (sender, message_body)
session = smtplib.SMTP(server)
session.sendmail(sender,recipients,message);
我将使用