运行Python脚本，直到它匹配所需的结果_Python_Jquery_Web Scraping_Beautifulsoup_Automation

运行Python脚本，直到它匹配所需的结果

python jquery web-scraping automation

运行Python脚本，直到它匹配所需的结果,python,jquery,web-scraping,beautifulsoup,automation,Python,Jquery,Web Scraping,Beautifulsoup,Automation,我正试图从一个网站上抓取一些数据，这些数据在一段时间后会动态更新。这意味着我每次都会删除一些页面中不存在的HTML div 我想从中获取一个数字，复制它，并在需要时粘贴它到目前为止，我已经尝试过做类似的事情，这给了我本地的结果。但是当我在线从网站上抓取时，它会给我一个错误，因为HTML元素不存在我希望脚本即使在错误发生后也能运行，因为我确信如果它与元素匹配，它将完成它的工作我的代码： from urllib.request import urlopen from bs4 import B

我正试图从一个网站上抓取一些数据，这些数据在一段时间后会动态更新。这意味着我每次都会删除一些页面中不存在的HTML div

我想从中获取一个数字，复制它，并在需要时粘贴它

到目前为止，我已经尝试过做类似的事情，这给了我本地的结果。但是当我在线从网站上抓取时，它会给我一个错误，因为HTML元素不存在

我希望脚本即使在错误发生后也能运行，因为我确信如果它与元素匹配，它将完成它的工作

我的代码：


from urllib.request import urlopen
from bs4 import BeautifulSoup
from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen as uReq  # Web client
import re
import time
import pyperclip


while True:

    page_url = "https://www.example.com/"

    uClient = uReq(page_url)

    page_soup = soup(uClient.read(), "html.parser")


    numbers = page_soup.find('div',{'id':'number-id'}).find('span').get_text()
    time.sleep(5*60)

它给了我这个错误

  File "t.py", line 23, in <module>
    codes = page_soup.find('div',{'id':'number-id'}).find('span').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

文件“t.py”，第23行，在
codes=page_soup.find（'div'，{'id'：'number-id'}）.find（'span'）.get_text（）
AttributeError:“非类型”对象没有属性“获取文本”

有人能帮我解决这个问题吗？

你可以使用

试试和除了，例如
try:
    numbers = page_soup.find('div',{'id':'number-id'}).find('span').get_text()
except:
    pass

尽管在循环中使用尝试和除了，但通常不建议使用True
循环，因为您可能会进入无限循环。您可以通过添加中断
条件来解决此问题，例如
attempts=0

while True:
    if attempts==10:
        break
    page_url = "https://www.example.com/"

    uClient = uReq(page_url)

    page_soup = soup(uClient.read(), "html.parser")

    try:
        numbers = page_soup.find('div',{'id':'number-id'}).find('span').get_text()
        match = re.search('\d{5,}', numbers)
        card = match.group(0)
        pyperclip.copy(card)
        pyperclip.paste()
    except:
        attempts+=1


    time.sleep(5*60)

在打破循环之前，这将失败10次，而如果尝试次数==10:
，只要将更改为更大的数字（如果需要）。
我建议您一步一步地执行，并检查中间是否定义了所有内容。发件人：
numbers = page_soup.find('div',{'id':'number-id'}).find('span').get_text()

到
因为如果没有匹配项，soup将返回None，所以您尝试调用None.get_text（..），这是不对的
编辑：将代码更改为继续（不是span）而不是div
编辑：整个代码现在应该如下所示：
from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen  # Web client
import re
import time
import pyperclip

page_url = "https://www.example.com/"
while True:
    with urlopen(page_url) as response: # urlopen is a resource. with statement closes the resource after you stop using it.
        page_soup = soup(response.read(), "html.parser")

        if div := page_soup.find('div',{'id':'number-id'}):# see https://docs.python.org/3/whatsnew/3.8.html
            if span := div.find('span'): 
                numbers = span.get_text()
                match = re.search('\d{5,}', numbers)
                card = match.group(0)
                pyperclip.copy(card)
                pyperclip.paste()
                # break
    time.sleep(5*60)

尝试…除了：pass
@PedroLobito脚本现在工作正常。我已经在本地测试过了。它做我想做的事。但是，当尝试从网站上在线执行时，它运行良好，没有错误，但它没有给出结果，因为我尝试刮取的跨度每5分钟出现一次，我想使用一些jquery脚本。你怎么想，为什么当它出现在网站上时，它没有抓住那个跨度？我想要一个无限循环。让我试试你的建议。我试着使用你的代码。我犯了这个错误<代码>回溯（最后一次调用）：文件“t.py”，第28行，在match=re.search（'\d{5，}'，numbers）NameError:没有定义名称“numbers”
@aayus:当然有定义，当前一行失败时没有数字可以执行操作。@aayus:。但是，您的其他评论建议您可能需要从开始，脚本现在运行良好。我已经在本地测试过了。它做我想做的事。但是，当尝试从网站上在线执行时，它运行良好，没有错误，但它没有给出结果，因为我尝试刮取的跨度每5分钟出现一次，我想使用一些jquery脚本。你怎么想，为什么当它出现在网站上时，它没有捕捉到那个跨度呢？文件“t.py”，第26行继续^TabError：缩进中制表符和空格的使用不一致你能更新它而不出错吗？不知道。我有明确的密码。为什么制表符和空格的使用不一致？那么。。Stackoverflow的编辑器不允许在代码中使用制表符，或者至少我不知道如何使用它们。。。像你的想法一样，用tabsI替换缩进行的边距，让我修复缩进错误。@AljažMedič脚本现在工作正常。我已经在本地测试过了。它做我想做的事。但是，当尝试从网站上在线执行时，它运行良好，没有错误，但它没有给出结果，因为我尝试刮取的跨度每5分钟出现一次，我想使用一些jquery脚本。你怎么想，为什么当它出现在网站上时，它没有抓住那个跨度？
from bs4 import BeautifulSoup as soup  # HTML data structure
from urllib.request import urlopen  # Web client
import re
import time
import pyperclip

page_url = "https://www.example.com/"
while True:
    with urlopen(page_url) as response: # urlopen is a resource. with statement closes the resource after you stop using it.
        page_soup = soup(response.read(), "html.parser")

        if div := page_soup.find('div',{'id':'number-id'}):# see https://docs.python.org/3/whatsnew/3.8.html
            if span := div.find('span'): 
                numbers = span.get_text()
                match = re.search('\d{5,}', numbers)
                card = match.group(0)
                pyperclip.copy(card)
                pyperclip.paste()
                # break
    time.sleep(5*60)