Python Scrapy-从mysql填充开始URL_Python_Mysql_Scrapy_Web Crawler

Python Scrapy-从mysql填充开始URL

python mysql scrapy web-crawler

Python Scrapy-从mysql填充开始URL,python,mysql,scrapy,web-crawler,Python,Mysql,Scrapy,Web Crawler,我正在尝试使用spider.py从MYSQL表中选择一个url来填充start\u。当我运行“scrapy runspider.py”时，我没有得到任何输出，只是它完成时没有错误我已经在python脚本中测试了SELECT查询，并使用MYSQL表中的entrys填充start_url spider.py from scrapy.spider import BaseSpider from scrapy.selector import Selector import MySQLdb class

我正在尝试使用spider.py从MYSQL表中选择一个url来填充start\u。当我运行“scrapy runspider.py”时，我没有得到任何输出，只是它完成时没有错误

我已经在python脚本中测试了SELECT查询，并使用MYSQL表中的entrys填充start_url

spider.py

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import MySQLdb


class ProductsSpider(BaseSpider):
    name = "Products"
    allowed_domains = ["test.com"]
    start_urls = []

    def parse(self, response):
        print self.start_urls

    def populate_start_urls(self, url):
        conn = MySQLdb.connect(
                user='user',
                passwd='password',
                db='scrapy',
                host='localhost',
                charset="utf8",
                use_unicode=True
                )
        cursor = conn.cursor()
        cursor.execute(
            'SELECT url FROM links;'
            )
    rows = cursor.fetchall()

    for row in rows:
        start_urls.append(row[0])
    conn.close()

在

\uuuu init\uuuu

中写入填充：

def __init__(self):
    super(ProductsSpider,self).__init__()
    self.start_urls = get_start_urls()

假设

get\u start\u url（）

返回URL。

更好的方法是重写该方法

这可以查询您的数据库，很像

填充\u开始\u URL

，并返回一系列对象

您只需将

populate_start_url

方法重命名为

start_requests

，并修改以下行：

for row in rows:
    yield self.make_requests_from_url(row[0])

谢谢你的回复。它成功了，我只需将

def-populate\u-start\u url（self，url）：

更改为

def-start\u请求（self）：

。我已将此标记为已接受，因为它与我发布的代码最接近。如果你有2200万个网站要浏览，你怎么做？我想你必须一次迭代1000次。您可以演示如何使用start\u请求进行迭代吗？