Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/323.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python3.x进行网页抓取,我使用pytharm_Python_Python 3.x_Web Scraping - Fatal编程技术网

使用Python3.x进行网页抓取,我使用pytharm

使用Python3.x进行网页抓取,我使用pytharm,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我想通过网络抓取来获取孵化器信息,我使用python。但运行代码后什么也得不到。这是我的密码。需要你的帮助 import requests from requests.exceptions import RequestException import re def get_one_page(url): try: r = requests.get(url) if r.status_code == 200: return r.text return None

我想通过网络抓取来获取孵化器信息,我使用python。但运行代码后什么也得不到。这是我的密码。需要你的帮助

import requests
from requests.exceptions import RequestException
import re
def get_one_page(url):
try:
    r = requests.get(url)
    if r.status_code == 200:
        return r.text
    return None
except RequestException:
    return None
def parse_one_page(html):
    pattern = re.compile('f14px c-blue.*?><a.*?>(.*?)</a>.*?fn14px c-666>(.*?)</td>')
    items = re.findall(pattern, html)
    for item in items:
        yield {
           'name': item[0],
           'address': item[1]
        }
def main(offset):
    url = 'http://www.cnfuhuaqi.com/couveuse/0-0-0-0-0-d%.aspx' % offset
    html = get_one_page(url)
    for item in parse_one_page(html):
        print(item)
if __name__ == '__main__':
     for i in range(2, 72):
          main(i)
导入请求
从requests.exceptions导入RequestException
进口稀土
def get_one_页面(url):
尝试:
r=请求。获取(url)
如果r.status_code==200:
返回r.text
一无所获
除请求例外:
一无所获
def解析页面(html):
pattern=re.compile('f14px c-blue.*?>(.*).*?*?fn14px c-666>(.*))
items=re.findall(模式,html)
对于项目中的项目:
屈服{
“名称”:项[0],
“地址”:项目[1]
}
def干管(偏置):
url='1〕http://www.cnfuhuaqi.com/couveuse/0-0-0-0-0-d%.aspx%偏移量
html=获取一个页面(url)
对于parse_one_页面(html)中的项目:
打印(项目)
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu':
对于范围(2,72)内的i:
主要工程(一)
,使用html解析器,如。在本例中,您只需要使用
zjfw list con
类选择元素并提取其中的表。以下内容将提取图像src url、链接和2次迭代(2次和3次)的描述:


伯特兰,你能再给我讲讲我问的另一个问题吗?求你帮忙!
from bs4 import BeautifulSoup
import requests

incubators = []

def extract_data(url):
    print("get data from {}".format(url))
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    tables = soup.find_all("div", {"class":"zjfw-list-con"})[0].find_all("table")

    for table in tables:
        for subtable in table.find_all('table'):
            items = subtable.find('tr').find_all('td')
            item_tuple = (
                items[0].find('img')['src'],
                items[1].find('a')['href'],
                items[2].text.strip()
            )
            print(item_tuple)
            incubators.append(item_tuple)

url = 'http://www.cnfuhuaqi.com/couveuse/0-0-0-0-0-%d.aspx'

for i in range(2, 4):
    extract_data(url % i)

print("the full list : ")
for i in incubators:
    print(' '.join(i))