Python 正则表达式:';例外情况。索引器:无此类组';
昨天我问了一个类似的问题,但我想我没有很清楚地解释我想做什么。我有以下代码:Python 正则表达式:';例外情况。索引器:无此类组';,python,regex,json,scrapy,Python,Regex,Json,Scrapy,昨天我问了一个类似的问题,但我想我没有很清楚地解释我想做什么。我有以下代码: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import Selector from scrapy.item import Item from sc
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.cmdline import execute
from scrapy.utils.markup import remove_tags
import time
import re
import json
import requests
class ExampleSpider(CrawlSpider):
name = "goal2"
allowed_domains = ["whoscored.com"]
start_urls = ["http://www.whoscored.com/Teams/32/"]
rules = [Rule(SgmlLinkExtractor(allow=('\Teams'),deny=(),), follow=False, callback='parse_item')]
def parse_item(self, response):
stagematch = re.compile("data:\s*{\s*url:\s*'stage-player-stat'\s*},\s*defaultParams:\s*{.*},",re.S)
stagematch2 = re.search(stagematch, response.body)
if stagematch2 is not None:
stagematch3 = stagematch2.group(1)
stageid = json.loads(stagematch3)
stageid = stageid[0]['StageId']
print stageid
有了这个,我尝试在这里解析一些脚本,其格式如下:
data:{
url: 'stage-player-stat'
},
defaultParams: {
stageId: 9155,
teamId: 32,
playerId: -1,
field: 2
},
我想从中提取stageId
的值,在本例中为9155。但是,这会引发以下错误:
stagematch3 = stagematch2.group(1)
exceptions.IndexError: no such group
我假设这是因为使用的正则表达式无效,但我看不出问题出在哪里。谁能告诉我哪里出了问题
谢谢
使用这个。参见演示
data:\s*{\s*url:\s*'stage-player-stat'\s*},\s*defaultParams:\s*{\s*(.*?),.*},