Python 删除文章链接，然后删除该链接以获得文章中的作者_Python_Web Scraping_Scrapy - Fatal编程技术网

Python 删除文章链接，然后删除该链接以获得文章中的作者

python web-scraping scrapy

Python 删除文章链接，然后删除该链接以获得文章中的作者,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,所以我使用scrapy--> 正如你在第一张图中看到的。我把文章放在左边。通过 [！[查看此图片][1][1] 一旦我得到这个链接，我如何继续刮，然后移动到文章的链接，然后刮在那篇文章的内容。图2 这是我的代码 import scrapy class QuotesSpider(scrapy.Spider): name = "japan" allowed_domains = ['www.japantimes.co.jp/'] start_urls =

所以我使用scrapy-->

正如你在第一张图中看到的。我把文章放在左边。通过

[！[查看此图片][1][1] 一旦我得到这个链接，我如何继续刮，然后移动到文章的链接，然后刮在那篇文章的内容。图2

这是我的代码

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "japan"
    allowed_domains = ['www.japantimes.co.jp/']
    start_urls = ['https://www.japantimes.co.jp/']

def parse(self, response):
    all_articles = response.xpath('//div[@class ="section_title small single_block"]]')


    for links in all_articles:
        the_link = links.xpath('.//a/@href').extract_first()

既然我有了链接，那么我该如何刮取每个链接中的数据呢？

要让你的爬行器发出新的请求，你需要

将请求对象提交给scrapy的引擎
from scrapy import Request
...
yield Request(url=URL_OF_THE_PAGE, callback=CALLBACK_PARSE_FUNCTION)


但是，在您的情况下，您应该确保链接
变量中有一个URL。修复后，请检查url是否为绝对url，如果不是，您可以使用response.urljoin（链接）
以您的响应url作为基本域来构建url。
谢谢，这非常有效。但是，当我在for循环中调用yield Request（url=url\u，属于页面，callback=callback\u PARSE\u函数）时，它会反复发送相同链接到回调函数。它不会显示所有不同的链接。当我查看.json文件时，它只通过了第一个链接，而不是全部链接。您可以创建一个新函数来解析此请求，并将该函数传递到请求的回调参数中。




[web scraping]相关文章推荐



                                                        
Web scraping 针对给定查询从多个网站聚合数据
web-scraping 
Web scraping 假装Firefox而不是Phantom.js
web-scrapingphantomjs 
Web scraping 刮皮不'；下载图片
web-scrapingweb-crawlerscrapy 
Web scraping 无法继续进行刮取或爬行
web-scrapingscrapy 
Web scraping 如何从登录站点提取数据
web-scraping 
Web scraping 我们如何从java/android移动应用程序中提取数据，哪个爬虫可以用于从移动应用程序中提取数据？sn
web-scrapingweb-crawler 
Web scraping 是否设置为使用IP池？
web-scrapingscrapy 
Web scraping import.io疑难解答
web-scraping 
Web scraping 来自CNES的Web垃圾处理表
web-scraping 
Web scraping 在automation anywhere中从网页提取表
web-scraping 
Web scraping “刮维基百科”；prop=linkshere"；更有效率？
web-scraping 
Web scraping scrapy不处理imdb关键字页面
web-scrapingscrapy 
Web scraping 使用Scrapy发送post请求
web-scrapingscrapy 
Web scraping 有没有办法获取twitch视频数据？
web-scrapingvideo-streaming 
Web scraping 从Investment.com到google sheets获取历史股价
web-scrapinggoogle-sheets 
                                       





随机文章推荐



                                                        
如何将PHPUnit自动完成添加到升华文本2？
phpunitsublimetext2 
PHPUnit Selenium的单击按钮和链接命令是什么？
phpunitselenium-webdriver 
Phpunit 如果特定测试失败，如何停止运行测试？
phpunit 
Phpunit 可以从mock方法返回mock吗？
phpunit 
Phpunit 在Laravel 8中测试时，会话错误返回null
phpunit


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
操作列表、查找索引、python
									Python
							 									Indexing
							 
如何使用python简单地处理请求
									Python
							 									Html
							 									Testing
							 
python中的强数
									Python
							 									Algorithm
							 									Python 3.x
							 
Python和json，如何打印数据[“key”]
									Python
							 									Json
							 									Web Scraping
							 
Python 将多个变量声明为相同的值
									Python
							 
Python 如何在一条线上安装一个短回路？
									Python
							 									Python 2.7
							 
从SQL高效地读取Python稀疏矩阵
									Python
							 									Mysql
							 
Python 使用正则表达式删除额外的空格
									Python
							 									Regex
							 
访问矩阵中的每一行并将其保存到新矩阵Python中
									Python
							 									Numpy
							 									Matrix
							 
Python 合并特定列中的数据
									Python
							 									Csv
							 									Pandas
							 
将cURL命令转换为Python请求命令
									Python
							 									Django
							 									Ssl
							 									Curl
							 
Python 如何保存在执行期间已更改的列表？
									Python
							 									List
							 
Python 数据帧的单元测试
									Python
							 									Unit Testing
							 									Pandas
							 									Dataframe
							 
Python 存储和访问URL
									Python
							 									Web Scraping
							 
使用Python基于通配符将文件从源递归复制到目标
									Python
							 
Python 错误：所有数组的长度必须相同。但它们的长度是一样的
									Python
							 									Pandas
							 									Dataframe
							 
Python TypeError:不支持%的操作数类型：'；范围'；和'；int'；
									Python
							 									Python 3.x
							 									If Statement
							 
使用for循环打印python字典中的键和/或值。寻找逻辑思维的解释谢谢：D
									Python
							 									Python 3.x
							 
将数字列表转换为Python中包含实际值的字符串
									Python
							 									List
							 
使用python循环从一个数据框在一个excel工作簿中创建多个excel工作表
									Python
							 									Excel
							 
Python2.7.3的jupyter笔记本问题
									Python
							 									Python 2.7
							 									Jupyter Notebook
							 
Python 为什么Pyspark在Windows上比在Linux上慢得多？
									Python
							 									Apache Spark
							 									Pyspark
							 
Python 如何在不挤压数字的情况下将长标题放在顶部？
									Python
							 									Matplotlib
							 
Python pyarrow.lib.ArrowNotImplementedError:从尚未支持的拼花地板文件读取结构列表：paygw_etp_typs:list<；数组：结构
									Python
							 									Pandas
							 									Validation
							 
Python 如何将列表保存到excel电子表格？
									Python
							 
如何在python中找到随机间隔的数据样本的平均值？
									Python
							 									Python 3.x
							 									Pandas
							 									Python 2.7
							 									Numpy
							 
Python 正在尝试获取打开URL的用户输入？蟒蛇硒
									Python
							 
Python 将测量数据集添加到卡尔曼滤波器
									Python
							 									Loops
							 
Python 对于文本输入，可点击和可见有什么不同？
									Python
							 									Selenium
							 									Selenium Webdriver
							 
Python CatBoost训练后功能信息
									Python
							 									Machine Learning
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Abap
Matplotlib
Scala
Text
Apache Camel
Jira
Statistics
Spring
Visual Studio Code
Memory Leaks
Web
Wso2
Qt4
Sql Server 2008
Dynamic
Authentication
Migration
Version Control
Glassfish
Cloud
Routes
Dask
Ibm Mobilefirst
Azure Devops
Azure Data Factory
Post
Charts
Dictionary
.net
Breeze
Entity Framework
Gulp
Events
Com
Hadoop
Jestjs
Shopify
Shell
D
Blackberry
Fluent Nhibernate
Github
Postgresql
Rest
Performance
Pyspark
Discord.js
Ipython
Xamarin.forms
Video Streaming
Editor
Twitter
Keyboard
Tkinter
Azure Service Fabric
Jsp
Join
Cocos2d Iphone
Artifactory
Office Js
Xaml
Multithreading
Google Colaboratory
Laravel 4
Antlr4
D3.js
Ruby On Rails 3.2
Salesforce
Wpf
Ssas
Doxygen
Login
Http
Neural Network
Gcc
Ruby
Browser
Parsing
Nlp
Character Encoding
Mpi
Rally
Qml
Debugging
Shiny
Ibm Midrange
Objective C
Web Applications
Nuget
Coding Style
Knockout.js
Nest
Tsql
Cocos2d X
Scripting
Wcf
Dialogflow Es
Plugins
Hibernate
Windows 8
Snmp
Awk
Fullcalendar
Facebook
Modelica
Arrays
Core Data
Netlogo
Leaflet
Akka
Odata
Embedded
Wix
Junit
Android Studio
Linq
Qt
Chef Infra
Playframework 2.0
Hive
Google App Engine
Flash
Google Maps Api 3
Transactions
Ldap
Cluster Computing
Visual Studio 2013
Notifications
Openstack
Database
Clojure
Git
Design Patterns
Opencart
Doctrine
Subsonic
Stream
Mdx
Biztalk
Wxpython
Microservices
Spotify
Websphere
Struct
Julia
Twitter Bootstrap 3
Cors
Serialization
Kubernetes
Activerecord
Ionic Framework
Import
Gtk
Sharepoint 2007
Computer Science
Microsoft Graph Api
Winapi
Datetime
Machine Learning
Speech Recognition
Apache Pig
Nginx
Material Ui
Encryption
Error Handling
Network Programming
Lua
Racket
Pandas
File Io
Powershell
Geolocation
Meteor
Actionscript
Eclipse Rcp
Sprite Kit
Plsql
Makefile
Blazor
Smalltalk
Jakarta Ee
Tinymce
Listview
Io
Google Analytics
Prolog
Sql Server 2008 R2
Django
Tcl
Mercurial
Openlayers 3
Amazon S3
Module
Drop Down Menu
Architecture
Xpath
Autocomplete
Open Source
X86
Protocol Buffers
Passwords


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网