无法通过使用scrapy和css在中遍历进行刮取
html代码如下:无法通过使用scrapy和css在中遍历进行刮取,css,scrapy,Css,Scrapy,html代码如下: <td class="column-3"> (price per 1,000 images)<br> 0-1M images - <span class="price-data " data-amount="{"regional":{"asia-pacific-
<td class="column-3">
(price per 1,000 images)<br>
0-1M images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.5,"australia-east":0.5,"brazil-south":0.5,"canada-central":0.5,"central-india":0.5,"europe-north":0.5,"europe-west":0.5,"united-kingdom-south":0.5,"us-east":0.5,"us-east-2":0.5,"us-south-central":0.5,"us-west-2":0.5,"us-west-central":0.5}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.50</span> <br>
1M-5M images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.4,"australia-east":0.4,"brazil-south":0.4,"canada-central":0.4,"central-india":0.4,"europe-north":0.4,"europe-west":0.4,"united-kingdom-south":0.4,"us-east":0.4,"us-east-2":0.4,"us-south-central":0.4,"us-west-2":0.4,"us-west-central":0.4}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.40</span> <br>
5M+ images -
<span class="price-data " data-amount="{"regional":{"asia-pacific-southeast":0.325,"australia-east":0.325,"brazil-south":0.325,"canada-central":0.325,"central-india":0.325,"europe-north":0.325,"europe-west":0.325,"united-kingdom-south":0.325,"us-east":0.325,"us-east-2":0.325,"us-south-central":0.325,"us-west-2":0.325,"us-west-central":0.325}}" data-decimals="3" data-decimals-force="0" data-region-unavailable="N/A" data-has-valid-price="true">$0.325</span> <br>
</td>
网址:
如何遍历和刮取数据?我想将td标签拆分为countbr次,然后刮取。我不想使用xpath。我想通过css得到结果
dumb = 'Your response, or above text'
html_dumb = Selector(text=dumb)
td_vals = [x.strip().strip('- ') for x in
html_dumb.xpath("//td/text()").extract() if x.strip()] #got all td values
f_val = td_vals[0] # seperate the first one. here (price per 1,000 images)
td_vals = td_vals[1:]
span_vals = [x.strip() for x in html_dumb.xpath("//span/@data-amount").extract() if x.strip()] #got all span data, you can also get span text if you need
inner_json = {}
result = {}
for td_val, span_val in zip(td_vals, span_vals):
d[td_val] = json.loads(span_val) #building inner dictionary
result[f_val] = d #append in outer one
{u'每1000张图片的价格:{u'5M+图片:{u'区域':{u'英国-南部:0.325,u'欧洲-北部:0.325,u'巴西-南部:0.325,u'us-west-2':0.325,u'us-south-central:0.325,u'central-india':0.325,u'us-east':0.325,u'canada-central':0.325,u'europe-west:0.325,u'us-east-2':0.325,u'us-west-central':0.325,u'asia-pacific-south:0.325,u'east-australia':0.325:1M'{u'地区':{u'英国-南':0.5,u'欧洲-北':0.5,u'巴西-南':0.5,u'us-west-2':0.5,u'us-south-central':0.5,u'us-east':0.5,u'canada-central':0.5,u'europe-west':0.5,u'us-east-2':0.5,u'us-west-central':0.5,u'asia-pacific-south':0.5,u'australia-east':0.5,u's-east:5M''{u'regional':{u'united kingdom-south':0.4,u'europe-north':0.4,u'brazil-south':0.4,u'us-west-2':0.4,u'us-south-central':0.4,u'us-east':0.4,u'canada-central':0.4,u'europe-west':0.4,u'us-east-2':0.4,u'us-west-central':0.4,u'asia-pacific-south':0.4,u'australia-east'完全不清楚您想要的是什么。CSS无法“拆分”任何内容,也无法“查找匹配项”。请您澄清您试图检索的数据是什么?或者您只是问是否可以遍历br?我希望o/p如下:-{“每1000张图像的价格”:{“0-1M张图像”:{“德国中部”:0.009,“英国南部”:0.01,“欧洲北部”:0.008,“美国东部-2”:0.009,},“100万至500万张图片”:{‘德国中部’:0.009,‘英国南部’:0.01,‘欧洲北部’:0.008,‘美国东部’:0.009,‘亚太东部’:0.01,‘英国西部’:0.01},'5M+图像:{‘德国中部’:0.009,‘英国南部’:0.01,‘欧洲北部’:0.008}}