Python 网页垃圾多页问题_Python_Selenium_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 网页垃圾多页问题

python selenium web-scraping

Python 网页垃圾多页问题,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,您好，我正在尝试刮取以下链接“https://eprocure.gov.in/eprocure/app；jsessionid=9AD8A7A17E1B2868527E25799DBE45A2。eprocgep2？页面=前端最新的活动接收器&服务=页面“使用python中的bs4。对于第一个页面，一切似乎都正常。但是当我导航到下一个页面时，URL模式正在完全改变。下面是下一个页面URL模式："https://eprocure.gov.in/eprocure/app?component=%24Tab

您好，我正在尝试刮取以下链接“https://eprocure.gov.in/eprocure/app；jsessionid=9AD8A7A17E1B2868527E25799DBE45A2。eprocgep2？页面=前端最新的活动接收器&服务=页面“使用python中的bs4。对于第一个页面，一切似乎都正常。但是当我导航到下一个页面时，URL模式正在完全改变。下面是下一个页面URL模式："https://eprocure.gov.in/eprocure/app?component=%24TablePages.linkPage&page=FrontEndLatestActiveTenders&service=direct&session=T&sp=AFrontEndLatestActiveTenders%2Ctable&sp=2“。由于模式更改，我无法自动完成每页的刮取过程。但是当我尝试手动刮取第二页时，soup对象无法获取任何标记。但是在网络检查中，显示第二页的这些标记…有人能解决此问题吗刮取所有页面。。请分享您的解决方案
您可以发布您尝试过的代码吗。无法粘贴整个代码，您是否有邮件id或skype id…这将很有帮助您可以添加与问题相关的代码。特别是您计划如何浏览pages.productLinks=soup.findAll（'a'，attrs={'id'：re.compile（r'linkPage.*'）}）
页号列表=[1]
对于productLinks中的项：
页号列表。追加（int（item.text））
url列表=[“{}&pageNum={}”。格式（基本url，str page））对于页号列表中的页]
@ShirsenduMazumdar提供的链接没有任何问题！

[selenium]相关文章推荐

在SeleniumJava中通过链接定位webelement selenium

无法安装pear安装phpunit/phpunit\u Selenium selenium phpunit

如何使用selenium webdriver在鼠标悬停时打印菜单项的背景色 selenium selenium-webdriver

Selenium 在远程计算机中关闭浏览器后，脚本仍在运行 selenium

Selenium waitForAllLinks是什么意思？是否有webdriver的替代方案？ selenium selenium-webdriver

Selenium 如何读取conf.properties文件？ selenium selenium-webdriver

如何在Selenium WebDriver中定位select option元素？ selenium selenium-webdriver

Selenium IDE-验证网页上的文本 selenium

Selenium 无需选择元素的硒的基本自闭症 selenium

Selenium 如何在Android中点击文本视图应用程序内部的文本链接？ selenium automation appium

切换到Selenium中包含登录提示的框架 selenium

Selenium 如何在Capybara中滚动模式窗口 selenium rspec

Selenium 从锚节点获取所有子链接 selenium xpath

Selenium 测试运行结束时出现未指定的量角器错误 selenium protractor

Selenium 如何执行相同的cucumber功能或场景n次？ selenium selenium-webdriver cucumber testng

如何使用JAVA在selenium的本地系统中将远程驱动器路径设置为chrome下载路径 selenium selenium-webdriver

Selenium 硒随机失效 selenium selenium-webdriver

Selenium 如何将chrome扩展与特定选项卡关联？ selenium automated-tests robotframework

试图通过Selenium获取下载链接 selenium

Selenium使用默认chrome目录赢得'；不要加载cookies selenium google-chrome ubuntu

随机文章推荐

Azure service fabric 服务织物-IQueryable azure-service-fabric

Azure service fabric 捕获服务结构ReceiverEminederAsync中未处理的所有异常的正确方法是什么 azure-service-fabric

Azure service fabric azure服务结构的仅部署配置升级包 azure-service-fabric

Azure service fabric 服务结构和Windows Update azure-service-fabric

Azure service fabric 服务结构入口点类型="；设置"；不同的系统组定位 azure-service-fabric

Azure service fabric 邮箱中来自可靠参与者的邮件是否有状态？ azure-service-fabric

Azure service fabric Traefik是否可以（也应该）用于；文件"；服务结构集群中的模式？ azure-service-fabric

[python]相关推荐

如何使用python处理libxml2 parserror异常
Python Xml Parsing Error Handling

Python 循环，直到更改的单元格数量非常少
Python

Python词典
Python For Loop Dictionary

Python 无法在vbox pygtk中添加表
Python

Python lxml解析url符号和问题
Python

Python pexpect.spawn（'；sudo ssh somehost'；）每次都要求输入sudo密码
Python

Python PIP可以从命令行中找到特定版本，但不能从requirements.txt中找到
Python Pip

如何在Pycharm中使用已安装的python包？
Python Pycharm

如何在python中使用for循环求解非线性方程？
Python

Python Django登录正常，但注册不正常
Python Html Python 3.x Django

Python对数函数，并选择该日志线上方的点
Python Pandas Math

Python 在Pyspark数据帧上左连接并应用case逻辑
Python Sql Apache Spark Pyspark

Python 循环计数器以获得阶乘
Python Loops

当我激活我的环境之后，我的项目并没有开始显示为django，即使我安装了python和django，也没有被识别
Python Django Anaconda

Python 如何在查找可删除素数时使用更少的内存？
Python

硒多窗口刮。python
Python Selenium Web Scraping

Python AllenNLP中实例的访问字段值
Python Machine Learning Nlp

Python 无法重定向到注册页
Python Tkinter

Python “；groupby”；返回每个分组项的出现百分比
Python Python 3.x Pandas Dataframe

在r/python中查找id列之间的相似性
Python R

Python Django中出错：Django-3使用Django_test.URL中定义的URLconf，按以下顺序尝试了这些URL模式：
Python Django Rest Django Rest Framework

Python 根据其他列上的条件完成字符串列
Python Pandas Dataframe

Python 0’、‘减肥’、‘85’） trainer_5=（2025年，'Austin'，'Male'，'91234571'，'Weight Loss'，'65'） trainer_6=（2026年，'Tynia'，'Female'，'91234572'，'Weight Loss'，'85'） trainer_7=（2027年，'Oswald'，'Male'，'91234573'，'Weight Loss'，'55'） trainer_8=（2028年，“咏叹调”、“女性”、“91234574”、“减肥”、“45”
Python Database Sqlite

语音识别在Python中显示错误（位置参数错误）
Python Function Speech Recognition

Python aes加密算法中的土耳其字符问题
Python

Python Django重定向返回200但页面未更改
Python Django Http Redirect

Python 如何将文件夹中的图像用作Keras中的目标？
Python Tensorflow Machine Learning Keras Computer Vision

Python 为什么当我运行这个程序的时间超过0小时时，剩下的时间应该是59:59，它说1:-1:59？
Python Time

Python 拆分dataframe列并删除额外变量
Python Python 3.x Dataframe

Python 如何基于另一个数据帧从一个数据帧获取行
Python Pandas

Tags

Ionic Framework Asynchronous Sdk Google Chrome Devtools Django Models Navigation Email Linq To Sql Verilog Computer Science Web Services Tags Google Chrome Extension Raspberry Pi Vue.js Apache Asp.net Mvc Tsql Laravel 4 Java 8 Tabs Hyperlink Windbg Jira Ssis Tree Lotus Notes Tinymce Teamcity Loopbackjs Cassandra Wpf Extjs4 Soap Instagram Office365 Biztalk Javafx 2 Abap Qml Jpa Floating Point Windows Phone 8.1 Inno Setup Autocomplete Parameters Ms Office Cron Dependency Injection Character Encoding Openlayers Iis 7 Passwords Process Coding Style Hyperledger Fabric Atom Editor Asterisk Xampp Canvas Java Ip Ssl Angular Material Laravel 5 Uml Oracle Deep Learning Tcp Firebase Influxdb Random Function Doctrine Database Snowflake Cloud Data Platform Model Cordova Postman Hybris Cocos2d X Monitoring C++ Cli Bash Parse Platform Amazon S3 Linker Highcharts Jar Azure Dotnetnuke D Blockchain Apache Spark Inheritance Mercurial Vagrant Twilio Web Crawler Db2 Amazon Cloudformation Scripting Discord.py Laravel Sorting Sql Server 2008 Matrix Batch File Log4net Php Migration Content Management System Zurb Foundation Arangodb Java Me Common Lisp Elm Jasper Reports Facebook Path Maven 2 Authentication Machine Learning Hive Web Applications Vb6 Exception Numpy Sails.js Date Regex File Upload Google Calendar Api Memory Management Xpath Git Maps Join Seo Windows 10 Xquery Json Akka Css Parsing Localization Jms File Zend Framework2 Polymer Oauth 2.0 Ldap .net Core Macos Amp Html Dask Rally Appium Ionic2 Joomla Google Apps Script Csv Sharepoint 2010 Kibana Outlook Types Virtual Machine Reactjs Google Cloud Platform Download Http Magento E Commerce Enums Mapping Ethereum Imagemagick Chart.js Pycharm Google Bigquery Titanium Open Source Jboss Kendo Ui Caching Gridview Activemq Methods Drupal 7 Asp.net Core Mvc Wxpython Testng Sequelize.js Serial Port Maven Plugins Synchronization Spring Cloud Less Fonts Objective C

Copyright © 2024. All Rights Reserved by - Fatal编程技术网