Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/meteor/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python webscraper和父名称问题_Python_Web Scraping_Beautifulsoup_Python 3.3 - Fatal编程技术网

Python webscraper和父名称问题

Python webscraper和父名称问题,python,web-scraping,beautifulsoup,python-3.3,Python,Web Scraping,Beautifulsoup,Python 3.3,我试图检索div class=“ipo单元高度”中的日期以及公司名称,如2014年2月21日和澳大利亚圣丹斯能源公司。这里是网站的链接这里是html。这段代码包含第二个div class=“genTable thin floatL”style=“width:315px” 您可以基于css类创建divs的列表,这是使用请求和美化组3: import requests from BeautifulSoup import BeautifulSoup req = requests.get('http:

我试图检索div class=“ipo单元高度”中的日期以及公司名称,如2014年2月21日和澳大利亚圣丹斯能源公司。这里是网站的链接这里是html。这段代码包含第二个div class=“genTable thin floatL”style=“width:315px”


您可以基于css类创建
div
s的列表,这是使用
请求和
美化组3

import requests
from BeautifulSoup import BeautifulSoup

req = requests.get('http://nasdaq.com/markets/ipos')
soup = BeautifulSoup(req.content)

ipo_divs = soup.findAll('div', {'class':'genTable thin floatL'})[0]
c = ipo_divs.findAll('div', {'class':'ipo-cell-height'})

ipos = {c[i].text:c[i + 1].text for i in xrange(0, len(c) - 1, 2)}

一种方法是使用值为
ipo cell height
class
属性遍历所有
元素,使用正则表达式检查其文本是否与日期匹配,然后查找下一个
元素并打印两个元素的文本

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("http://www.nasdaq.com/markets/ipos/").read()
soup = BeautifulSoup(html)
for div in soup.find_all('div', attrs={'class':'ipo-cell-height'}):
    s = div.string
    if re.match(r'\d{1,2}/\d{1,2}/\d{4}$', s): 
        div_next = div.find_next('div')
        print('{} - {}'.format(s, div_next.string))
像这样运行:

python3 script.py
这将产生:

2/21/2014 - SUNDANCE ENERGY AUSTRALIA LTD
2/14/2014 - INOGEN INC
2/14/2014 - SEMLER SCIENTIFIC, INC.
10/9/2013 - SFX ENTERTAINMENT, INC
2/13/2014 - IIM GLOBAL CORP
2/12/2014 - Q2 HOLDINGS, INC.
2/12/2014 - RIMINI STREET, INC.
2/12/2014 - MARY FEED & SUPPLIES, INC.
2/11/2014 - 21ST CENTURY ONCOLOGY HOLDINGS, INC.
2/3/2014 - GRASSMERE ACQUISITION CORP
1/31/2014 - APTALIS HOLDINGS INC.
1/27/2014 - UNITED STATES CURRENCY FUNDS TRUST
1/22/2014 - CHRYSLER GROUP LLC
1/10/2014 - GCT SEMICONDUCTOR INC
python3 script.py
2/21/2014 - SUNDANCE ENERGY AUSTRALIA LTD
2/14/2014 - INOGEN INC
2/14/2014 - SEMLER SCIENTIFIC, INC.
10/9/2013 - SFX ENTERTAINMENT, INC
2/13/2014 - IIM GLOBAL CORP
2/12/2014 - Q2 HOLDINGS, INC.
2/12/2014 - RIMINI STREET, INC.
2/12/2014 - MARY FEED & SUPPLIES, INC.
2/11/2014 - 21ST CENTURY ONCOLOGY HOLDINGS, INC.
2/3/2014 - GRASSMERE ACQUISITION CORP
1/31/2014 - APTALIS HOLDINGS INC.
1/27/2014 - UNITED STATES CURRENCY FUNDS TRUST
1/22/2014 - CHRYSLER GROUP LLC
1/10/2014 - GCT SEMICONDUCTOR INC