Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何分配';什么东西被刮掉了?_Python_Scrapy - Fatal编程技术网

Python 如何分配';什么东西被刮掉了?

Python 如何分配';什么东西被刮掉了?,python,scrapy,Python,Scrapy,我对Python和Scrapy还很陌生,这个网站到目前为止对我的项目来说是一个非常宝贵的资源,但现在我陷入了一个似乎非常简单的问题。我可能想错了。我想做的是在我的输出CSV中添加一列,列出从中获取每行数据的URL。换句话说,我希望表格如下所示: item1 item2 item_url a 1 http://url/a b 2 http://url/a c 3 http://url/b d

我对Python和Scrapy还很陌生,这个网站到目前为止对我的项目来说是一个非常宝贵的资源,但现在我陷入了一个似乎非常简单的问题。我可能想错了。我想做的是在我的输出CSV中添加一列,列出从中获取每行数据的URL。换句话说,我希望表格如下所示:

item1    item2    item_url
a        1        http://url/a
b        2        http://url/a
c        3        http://url/b
d        4        http://url/b    
class MySpider(CrawlSpider):
    name = "spider"

    # querying the database here...

    #getting the urls from the database and assigning them to the rows list
    rows = cur.fetchall()

    allowed_domains = ["www.domain.com"]

    start_urls = []

    for row in rows:

        #adding the urls from rows to start_urls
        start_urls.append(row)

        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            sites = hxs.select("a bunch of xpaths here...")
            items = []
            for site in sites:
                item = SettingsItem()
                # a bunch of items and their xpaths...
                # here is my non-working code
                item['url_item'] = row
                items.append(item)
            return items
我使用psycopg2获取存储在数据库中的一组URL,然后从中获取。代码如下所示:

item1    item2    item_url
a        1        http://url/a
b        2        http://url/a
c        3        http://url/b
d        4        http://url/b    
class MySpider(CrawlSpider):
    name = "spider"

    # querying the database here...

    #getting the urls from the database and assigning them to the rows list
    rows = cur.fetchall()

    allowed_domains = ["www.domain.com"]

    start_urls = []

    for row in rows:

        #adding the urls from rows to start_urls
        start_urls.append(row)

        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            sites = hxs.select("a bunch of xpaths here...")
            items = []
            for site in sites:
                item = SettingsItem()
                # a bunch of items and their xpaths...
                # here is my non-working code
                item['url_item'] = row
                items.append(item)
            return items

正如您所看到的,我想制作一个项目,它只获取解析函数当前所在的url。但当我运行spider时,它会给出“exceptions.NameError:未定义全局名称‘row’”。我认为这是因为Python不将row识别为XPathSelector函数中的变量,或者类似的东西?(就像我说的,我是新来的。)无论如何,我被卡住了,任何帮助都将不胜感激。

将开始请求生成放在以下位置,而不是放在课堂正文中:


不确定在哪一行获得异常。理想情况下,您应该在循环之外定义解析函数,并将row作为解析函数的第三个参数。您还可以提供输出:打印类型(行)嗨,谢谢回复。此行发生异常:item['url\u item']=row-它显示:exceptions.NameError:未定义全局名称“row”。此外,打印类型(行)将其显示为列表。谢谢你的建议,我会试试的。嗯,鉴于典型的Scrapy解析函数的结构,我不确定如何重新编写它,使它接受3个参数。