Python Can';不要让脚本完成它';以传统的方式完成任务
我已经使用selenium编写了一个脚本,在其中实现了多处理。该脚本工作正常,我在控制台中看到了所有结果。但是,当执行完成时,我在IDE底部看不到任何这样的标志,这表明该过程已经完成 以下图片取自python的默认IDE和升华文本Python Can';不要让脚本完成它';以传统的方式完成任务,python,python-3.x,selenium,web-scraping,multiprocessing,Python,Python 3.x,Selenium,Web Scraping,Multiprocessing,我已经使用selenium编写了一个脚本,在其中实现了多处理。该脚本工作正常,我在控制台中看到了所有结果。但是,当执行完成时,我在IDE底部看不到任何这样的标志,这表明该过程已经完成 以下图片取自python的默认IDE和升华文本 执行完成后,如何终止流程?似乎流程终止得很好,但是,如果您想确保进程已终止,只需导入sys并在末尾包含sys.exit。我认为唯一的潜在问题是,因为您试图高效地为每个线程创建一个selenium驱动程序,所以您忽略了处理“退出”当所有提交的作业都已完成时,所有驱动程
执行完成后,如何终止流程?似乎流程终止得很好,但是,如果您想确保进程已终止,只需导入sys并在末尾包含
sys.exit
。我认为唯一的潜在问题是,因为您试图高效地为每个线程创建一个selenium驱动程序,所以您忽略了处理“退出”当所有提交的作业都已完成时,所有驱动程序以及那些驱动程序进程,特别是在IDE中运行时,很可能不会终止。我会作出以下更改:
Driver
,该类将装入驱动程序实例并将其存储在线程本地存储上,但也有一个析构函数,该析构函数将在删除线程本地存储时退出驱动程序:创建浏览器
现在变成:未来
结果后,添加以下行以删除线程本地存储并强制调用驱动程序
实例的析构函数(希望如此):gc.collect()
之后打印“Done”)。但是,在我的Windows桌面上,我确实看到以下记录的消息:
[1024/092605.493:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.562:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.579:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.592:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.634:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
...
[1024/092617.865:INFO:CONSOLE(118)] "The deviceorientation events are blocked by feature policy. See https://github.com/WICG/feature-policy/blob/master/features.md#sensor-features", source: https://z.moatads.com/chaseusdcm562975626226/moatad.js (118)
[1024/092617.949:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.015:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.456:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.479:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.570:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.738:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.849:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.928:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
这是我的输出:
ImportXML XPath issue using Google Sheets on a web scraping query
Scrapy meta or cb_kwargs not passing properly between multiple methods
How to seperate a list into table formate using python
How can I extract a table from wikipedia using Beautiful soup
Load a series of payload requests and perform pagination for each one of them
Pandas read_html not reading text properly
Getting text nested text in non-static webpage with httr in R [closed]
Scraping data with duplicate column headers [closed]
I keep getting [ TypeError: 'function' object is not iterable ] every time I try to iterate over the result of my function which returns an iterable [closed]
selnium and beutifulsoup scrapper very inconsistent
Web scraping the required content from a url link in R
Web-scrapping pop-up info generated by hovering over canvas element (Python/Selenium)
Daily leaderboard or price tracking data
Scrape PDF embedded in .php page
Beautiful Soup returning only the last URL of a txt file
Having trouble in scraping table data using beautiful soup
Authentication - Security Window - Rvest R
How can I read an iframe content inside another iframe using Puppeteer?
Xamarin.Forms: is there a way to update the style of web page displayed in a WebView with scraping?
Counter not working in for(i=0; ++i) loop node.js
Python: selenium can't read an specific table
Scraped json data want to output CSV file
Unable to scrape “shopee.com.my” top selling products page
How to click a menu item from mobile based website in selenium Python?
Does selenium in standalone mode has limitation for maximum number of sessions can be present at a time?
Error while capturing full website screen shot
How to retrieve SharePoint webpage code(html) or Scrape a sharepoint webpage?
API web data capture
Selenium select disappearing webelement
Python SQlite Query to select recently added data in the table
Webscraping with varying page numbers
How to extract contents between div tags with rvest and then bind rows
Why is the previous request aborting if I send a new request to the flask server? [closed]
Does anyone know how to click() on an href within data-bind using selenium? [closed]
Web Scraping on login sites with Python
How do I render image, title and link to template from views using one 'for loop'
Regex on List Comprehension Not Producing List But List of Lists Instead [duplicate]
How to get all tr id by using python selenium?
Scrapy - TypeError: can only concatenate str (not “list”) to str
I need to save scraped urls to a csv file in URI format. file won't write to csv
Scrapy keeps giving me the errot AttributeError: 'str' object has no attribute 'text'
How to scrape the different content with the same html attributes and values?
I can not scrape Google news with Beautiful soup. I am getting the error:TypeError: 'NoneType' object is not callable [closed]
selenium while loop error on load more button
Python- Selenium/BeautifulSoup PDF & Table scraper
Crawling all page with scrapy and FormRequest
Web Scrape COVID19 Data from Download Button in R
How do 3rd party app stores know when a new app is added to Google Play?
Scraping hidden leaderboard data from site
Cannot access a table shown in a Tableau Public Dashboard
更新2
如果你挂起一个结果,你可能会考虑使用超时:
if __name__ == '__main__':
base = "https://stackoverflow.com{}"
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=1&pagesize=50"
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = {executor.submit(get_title, link): link for link in get_links(URL)}
for future in future_to_url:
try:
print(future.result(30))
except concurrent.futures.TimeoutError as e:
url = future_to_url[future]
print('TimeoutError for URL', url)
del threadLocal
import gc
gc.collect() # a little extra insurance
print('done')
请注意,我不再在完成时使用
,
,因为我希望能够指定超时值,而不是无限期地等待结果。在这里,我指定了一个30秒的值,这应该足以让线程初始化驱动程序并获得第一个结果。如果您确实在等待结果,则应打印一条TimeoutError
消息并继续,您能否澄清您的问题?脚本运行正常,退出时使用代码0
-在我的例子中,执行完成时,我看不到您正在使用的类似行,因为在过程中,退出代码0
,因此,即使脚本完成,脚本似乎仍在运行。我只是想看看这条线,以确保我完成了它。也许问题是你如何运行它?我不知道您使用的是什么操作系统,但通过PyCharm和bash运行脚本会得到相同的输出。我在Linux上,一切看起来都很好。有时,没有(错误)消息是一个好消息。我在Win7上,32位。我在测试中使用了python的默认IDE和升华文本。虽然我应该使用您建议的带有block的outside行,但我使用了outside for loop like,但脚本似乎没有达到该行,因此在执行完成时它仍然会被卡住。谢谢。在我的脚本中使用这一行del threadLocal
outside和block-intheinmain
函数似乎已经解决了这个问题。谢谢您的解决方案@Booboo。您是否也使用了确保调用driver.quit()
的代码,即我建议的driver
类?
del threadLocal
import gc
gc.collect() # a little extra insurance
[1024/092605.493:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.562:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.579:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.592:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092605.634:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
...
[1024/092617.865:INFO:CONSOLE(118)] "The deviceorientation events are blocked by feature policy. See https://github.com/WICG/feature-policy/blob/master/features.md#sensor-features", source: https://z.moatads.com/chaseusdcm562975626226/moatad.js (118)
[1024/092617.949:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.015:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.456:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.479:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.570:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.738:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.849:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
[1024/092618.928:INFO:CONSOLE(0)] "Error with Feature-Policy header: Unrecognized feature: 'speaker'.", source: (0)
ImportXML XPath issue using Google Sheets on a web scraping query
Scrapy meta or cb_kwargs not passing properly between multiple methods
How to seperate a list into table formate using python
How can I extract a table from wikipedia using Beautiful soup
Load a series of payload requests and perform pagination for each one of them
Pandas read_html not reading text properly
Getting text nested text in non-static webpage with httr in R [closed]
Scraping data with duplicate column headers [closed]
I keep getting [ TypeError: 'function' object is not iterable ] every time I try to iterate over the result of my function which returns an iterable [closed]
selnium and beutifulsoup scrapper very inconsistent
Web scraping the required content from a url link in R
Web-scrapping pop-up info generated by hovering over canvas element (Python/Selenium)
Daily leaderboard or price tracking data
Scrape PDF embedded in .php page
Beautiful Soup returning only the last URL of a txt file
Having trouble in scraping table data using beautiful soup
Authentication - Security Window - Rvest R
How can I read an iframe content inside another iframe using Puppeteer?
Xamarin.Forms: is there a way to update the style of web page displayed in a WebView with scraping?
Counter not working in for(i=0; ++i) loop node.js
Python: selenium can't read an specific table
Scraped json data want to output CSV file
Unable to scrape “shopee.com.my” top selling products page
How to click a menu item from mobile based website in selenium Python?
Does selenium in standalone mode has limitation for maximum number of sessions can be present at a time?
Error while capturing full website screen shot
How to retrieve SharePoint webpage code(html) or Scrape a sharepoint webpage?
API web data capture
Selenium select disappearing webelement
Python SQlite Query to select recently added data in the table
Webscraping with varying page numbers
How to extract contents between div tags with rvest and then bind rows
Why is the previous request aborting if I send a new request to the flask server? [closed]
Does anyone know how to click() on an href within data-bind using selenium? [closed]
Web Scraping on login sites with Python
How do I render image, title and link to template from views using one 'for loop'
Regex on List Comprehension Not Producing List But List of Lists Instead [duplicate]
How to get all tr id by using python selenium?
Scrapy - TypeError: can only concatenate str (not “list”) to str
I need to save scraped urls to a csv file in URI format. file won't write to csv
Scrapy keeps giving me the errot AttributeError: 'str' object has no attribute 'text'
How to scrape the different content with the same html attributes and values?
I can not scrape Google news with Beautiful soup. I am getting the error:TypeError: 'NoneType' object is not callable [closed]
selenium while loop error on load more button
Python- Selenium/BeautifulSoup PDF & Table scraper
Crawling all page with scrapy and FormRequest
Web Scrape COVID19 Data from Download Button in R
How do 3rd party app stores know when a new app is added to Google Play?
Scraping hidden leaderboard data from site
Cannot access a table shown in a Tableau Public Dashboard
if __name__ == '__main__':
base = "https://stackoverflow.com{}"
URL = "https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=1&pagesize=50"
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = {executor.submit(get_title, link): link for link in get_links(URL)}
for future in future_to_url:
try:
print(future.result(30))
except concurrent.futures.TimeoutError as e:
url = future_to_url[future]
print('TimeoutError for URL', url)
del threadLocal
import gc
gc.collect() # a little extra insurance
print('done')