Python 希望每天抓取一个网站并设置提醒_Python_Python 2.7_Web Scraping

Python 希望每天抓取一个网站并设置提醒

python python-2.7 web-scraping

Python 希望每天抓取一个网站并设置提醒,python,python-2.7,web-scraping,Python,Python 2.7,Web Scraping,我需要每天运行一个脚本来抓取以下站点（当脚本运行时，它会抓取当天的日历）（相当于单击“daily”按钮）我希望提取该特定日期的所有日期数据/事件，并过滤相关货币（如果合适），然后在每个事件发生前10分钟创建某种警报或弹出窗口到目前为止，我正在使用以下代码刮取网页，然后查看/打印变量“html”，但找不到我需要的日历信息 import sys from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWeb

我需要每天运行一个脚本来抓取以下站点（当脚本运行时，它会抓取当天的日历）（相当于单击“daily”按钮）

我希望提取该特定日期的所有日期数据/事件，并过滤相关货币（如果合适），然后在每个事件发生前10分钟创建某种警报或弹出窗口

到目前为止，我正在使用以下代码刮取网页，然后查看/打印变量“html”，但找不到我需要的日历信息

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  



class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  

  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  

url = 'http://www.fxempire.com/economic-calendar/'  
r = Render(url)  
html = r.frame.toHtml()

在我看来，从网页中抓取数据的最好方法是使用。下面是一个快速脚本，它将获取您想要的数据

import re
from urllib2 import urlopen
from bs4 import BeautifulSoup


# Get a file-like object using urllib2.urlopen
url = 'http://ecal.forexpros.com/e_cal.php?duration=daily'
html = urlopen(url)

# BS accepts a lot of different data types, so you don't have to do e.g.
# urlopen(url).read(). It accepts file-like objects, so we'll just send in html
# as a parameter.
soup = BeautifulSoup(html)

# Loop over all <tr> elements with class 'ec_bg1_tr' or 'ec_bg2_tr'
for tr in soup.find_all('tr', {'class': re.compile('ec_bg[12]_tr')}):
    # Find the event, currency and actual price by looking up <td> elements
    # with class names.
    event = tr.find('td', {'class': 'ec_td_event'}).text
    currency = tr.find('td', {'class': 'ec_td_currency'}).text
    actual = tr.find('td', {'class': 'ec_td_actual'}).text

    # The returned strings which are returned are unicode, so to print them,
    # we need to use a unicode string.
    print u'{:3}\t{:6}\t{}'.format(currency, actual, event)

重新导入
从urllib2导入urlopen
从bs4导入BeautifulSoup
#使用urllib2.urlopen获取类似文件的对象
url='1〕http://ecal.forexpros.com/e_cal.php?duration=daily'
html=urlopen（url）
#BS接受许多不同的数据类型，因此您不必这样做。
#urlopen（url）.read（）。它接受类似文件的对象，所以我们只发送html
#作为一个参数。
soup=BeautifulSoup（html）
#使用“ec_bg1_tr”或“ec_bg2_tr”类在所有元素上循环
对于汤中的tr.find_all（'tr'，{'class'：re.compile（'ec_bg[12]\u tr'））：
#通过查找元素查找事件、货币和实际价格
#使用类名。
event=tr.find（'td'，{'class'：'ec_td_event'}）
currency=tr.find（'td'，{'class'：'ec_td_currency'}）
actual=tr.find（'td'，{'class'：'ec_td_actual'}）
#返回的字符串是unicode，所以要打印它们，
#我们需要使用unicode字符串。
打印u'{:3}\t{:6}\t{}'。格式（货币、实际值、事件）

为了给你一些关于将来如何解决类似问题的提示，我已经写下了我在解决你的问题时使用的步骤。希望能有帮助

我用Chrome浏览器打开网页，右键单击并选择了

Inspect Element

通过查看元素选项卡，找到包含信息的

iframe

，并打开该url

也检查了这个页面，发现所有包含数据的元素都是

元素，并且具有类

ec\u bg1\u tr

或

ec\u bg2\u tr

我从以前与BS的接触中知道，它可以通过使用

soup.find_all（'tr'，{'class'：'ec_bg1_tr'）查找类ec_bg1_tr
的所有tr元素。我最初的想法是首先在这些元素上循环，然后在ec\u bg2\u tr
元素上循环
然后我想也许BS足够聪明，可以接受regexp作为输入，所以我检查了它们的，这似乎不应该是一个问题
按照文档中的方法，我尝试使用简单的regexp“ec_bg_12]\u tr”
卡青
你能告诉我们到目前为止你有什么吗？抱歉，现在已经更新了原始帖子，包括我试图使用的代码。这是一个非常好的解决方案，我现在正在使用它进行基本面分析，我还有其他工具，如ystockquote python库，我用它与我的一些代码一起对我的股票进行技术分析！这是很好的，可定制到max@Steinar Lima。谢谢！