Python 如何出口刮痧';将结果转换为特定的JSON格式?
我使用Scrapy爬行和废弃StackOverflow.com。 这是so.pyPython 如何出口刮痧';将结果转换为特定的JSON格式?,python,json,python-2.7,scrapy,Python,Json,Python 2.7,Scrapy,我使用Scrapy爬行和废弃StackOverflow.com。 这是so.py import scrapy class StackOverflowSpider(scrapy.Spider): name = 'stackoverflow' start_urls = ['http://stackoverflow.com'] def parse(self, response): for href in response.css('.question-sum
import scrapy
class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
def parse_question(self, response):
yield {
'link': response.url,
}
预期结果:so.json(有效的json格式)
然后运行:
scrapy runspider so.py -o so.json
结果与预期不符。我被困在这里。尝试使用
FEED\u FORMAT=jsonlines
设置
scrapy runspider so.py -o so.json --set FEED_FORMAT=jsonlines
如果你想得到
[
"https://stackoverflow.com/questions/36421917/exponential-number-in-custom-number-format-of-excel",
"https://stackoverflow.com/questions/36421343/can-not-install-requirements-txt",
"https://stackoverflow.com/questions/36418815/difference-between-two-approaches-to-pass-parameters-to-web-server",
"https://stackoverflow.com/questions/36421743/sharing-an-oracle-database-connection-between-simultaneous-celery-tasks",
"https://stackoverflow.com/questions/36421941/jquery-add-css-style",
]
您应该编写自己的ItemExporter,这是运行上述命令后的结果:它仍然不符合我的预期结果。在应用修改后的答案后,我运行命令,然后得到结果:。这不是预期的结果。帮我解决这个问题!
[
"https://stackoverflow.com/questions/36421917/exponential-number-in-custom-number-format-of-excel",
"https://stackoverflow.com/questions/36421343/can-not-install-requirements-txt",
"https://stackoverflow.com/questions/36418815/difference-between-two-approaches-to-pass-parameters-to-web-server",
"https://stackoverflow.com/questions/36421743/sharing-an-oracle-database-connection-between-simultaneous-celery-tasks",
"https://stackoverflow.com/questions/36421941/jquery-add-css-style",
]