Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/magento/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python scrapy从网页中提取链接_Python_Scrapy - Fatal编程技术网

使用python scrapy从网页中提取链接

使用python scrapy从网页中提取链接,python,scrapy,Python,Scrapy,我是python的初学者,使用scrapy从以下网页中提取链接 我写的代码是 from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors import LinkExtractor from basketball.items import BasketballItem class BasketballSpider(CrawlSpider): name = 'basketb

我是python的初学者,使用scrapy从以下网页中提取链接

我写的代码是

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from basketball.items import BasketballItem

class BasketballSpider(CrawlSpider):

   name = 'basketball'
   allowed_domains = ['basketball-reference.com/']
   start_urls = ['http://www.basketball-reference.com/leagues/NBA_2015_games.html']
   rules = [Rule(LinkExtractor(allow=['http://www.basketball-reference.com/boxscores/^\w+$']), 'parse_item')]

   def parse_item(self, response):
       item = BasketballItem()
       item['url'] = response.url
       return item

我通过命令提示符运行此代码,但创建的文件没有任何链接。有人能帮忙吗

它找不到链接,请修复规则中的正则表达式:

rules = [
    Rule(LinkExtractor(allow='boxscores/\w+'))
]
此外,当调用
parse_item
时,您不必设置
回调
——这是默认设置


并且
allow
也可以设置为字符串。

开始URL与我试图提取的链接不同..例如。我从中提取链接的web pge是“”,我提取的LNIK是我需要对代码进行的格式更改。@anandsingh此答案有帮助吗?如果是,请将其标记为“已接受”,或者如果您可以发布对您有效的答案并接受。
rules = [
         Rule(LinkExtractor(allow='boxscores/\w+'), callback='parse_item')
]