Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 如何使用类以特定元素为目标_Python 3.x_Web Scraping_Beautifulsoup_Scrapy - Fatal编程技术网

Python 3.x 如何使用类以特定元素为目标

Python 3.x 如何使用类以特定元素为目标,python-3.x,web-scraping,beautifulsoup,scrapy,Python 3.x,Web Scraping,Beautifulsoup,Scrapy,我正在尝试刮取这个名为startup India的网站,其中我刮取一家公司的URL和名称,但要刮取URL和名称,我必须以他们为目标,但我不知道哪种方法是正确的,请帮助 import logging from bs4 import BeautifulSoup import requests import csv import scrapy class WebCrawlerPipeline(object): def process_item(self, item, spider):

我正在尝试刮取这个名为startup India的网站,其中我刮取一家公司的URL和名称,但要刮取URL和名称,我必须以他们为目标,但我不知道哪种方法是正确的,请帮助


import logging
from bs4 import BeautifulSoup
import requests
import csv
import scrapy

class WebCrawlerPipeline(object):
    def process_item(self, item, spider):
        return item


class ProfileCrawlerPipeline(object):
    def open_spider(self, spider):
        self.urls = list()
        self.companies = list()
        pass

    def process_item(self, item, spider):
        item = dict(item)
        url = item.get('item')
        # yield scrapy.Request(url=url, callback=self.parse_content)
        # logging.info(url)
        r = requests.get(url).content
        soup = BeautifulSoup(r, 'html.parser')
        # url_txt = soup.select('div.container')
        container = soup.find("div", class_="container")
        logging.info(container)
        # # self.write_to_csv()

    def parse_content(self, response):
        logging.info(response.url)

    def close_spider(self, spider):
        pass

    def write_to_csv(self):
        pass


代码将受到欢迎

您不需要将BeautifulSoup与Scrapy一起使用


我建议您看看Scrapy教程并使用Xpath或CSS选择器:

我们建议使用更简单的爬虫框架。这里有一个例子。仅供参考,这是刮(刮,刮,刮,刮)不是刮我明白你的意思,即使我习惯刮,非常愿意使用刮,但问题是当我提出请求时,刮没有启动回调,因为我使用的是beautifulsoup4。