Python 如何从HTML源中提取指定的数字并合并它们？_Python_Python 3.x_Selenium_Beautifulsoup

Python 如何从HTML源中提取指定的数字并合并它们？

python python-3.x selenium

Python 如何从HTML源中提取指定的数字并合并它们？,python,python-3.x,selenium,beautifulsoup,Python,Python 3.x,Selenium,Beautifulsoup,有一个网站提供了3个数字作为图片，你必须在指定的框中复制和写入，然后按“继续”。我想为我编写一个代码。我查看了HTML源代码，png文件的名称与数字相同，所以我只需要将它们提取、合并并写下来我使用Selenium制作了一个机器人，在我登录后访问该网站。它在指定区域填充了“123”作为测试，因此我知道如果我以某种方式获得了数字，如何记下这些数字。我使用Beatifulsoup将其转换为文本，但它给了我一个错误 File "C:\Users\user\Desktop\money.py", li

有一个网站提供了3个数字作为图片，你必须在指定的框中复制和写入，然后按“继续”。我想为我编写一个代码。我查看了HTML源代码，png文件的名称与数字相同，所以我只需要将它们提取、合并并写下来

我使用Selenium制作了一个机器人，在我登录后访问该网站。它在指定区域填充了“123”作为测试，因此我知道如果我以某种方式获得了数字，如何记下这些数字。我使用Beatifulsoup将其转换为文本，但它给了我一个错误

  File "C:\Users\user\Desktop\money.py", line 20
    soup = BeautifulSoup(driver)
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 20 of the file C:\Users\user\Desktop\money.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

Traceback (most recent call last):
  File "C:\Users\user\Desktop\money.py", line 20, in <module>
    soup = BeautifulSoup(driver)
  File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\bs4\__init__.py", line 287, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'WebDriver' has no len()

您忘记了

中的

（driver.page\u source，“lxml”）

至于numebers，如果您使用

BeautifulSoup

则使用其所有功能-使用

find_all（）

查找所有

。然后您可以从src
（即string）获取字符，作为string[-5]

如果有更多图像，则可以使用find_all（）
-ie.width=“35”

您应该阅读的文档-它有许多有用的功能


即使没有BeautifulSoup
也可以这样做，因为Selenium
有许多函数find\u elements\u by\u…
（即find\u elements\u by\u xpath（）
），而且它还有get\u attribute（）

阅读Selenium文档：
未经测试
all_items = driver.find_elements_by_tag_name('img')
for item in all_items:
    print('char:', item.get_attribute('src')[-5])

number = [item.get_attribute('src')[-5] for item in all_items]
number = "".join(number)

print('number:', number)

始终将完整的错误消息（从单词“Traceback”开始）作为文本（而不是屏幕截图）进行讨论（不是评论）。还有其他有用的信息。你忘记了中的.page\u source
（driver.page\u source，“lxml”）
非常感谢。我不知道我是否应该写一篇新文章，但我如何从中提取数字6。数字总是在1-9之间，我需要在一个指定的框中写下其中的3个。（我在帖子中对此写了一点）导致此警告的代码位于文件C:\Users\user\Desktop\money.py的第20行。要消除此警告，请将附加参数“features=“lxml”传递给BeautifulSoup构造函数。
这一行给出了提示。始终注意回溯消息。.我有“lxml”在代码中已经存在，但错误仍然存在，但在我添加@furas告诉我的内容后，它们都消失了。我很抱歉地说，我发现这个写得非常好的代码存在一些问题。我使用了它，url发回的比预期的多一些。它发回了几个字母。我通过一个代码来测试它们是否是数字来删除它们。我现在唯一的问题是it’每次都要刷新页面以获取新的数字并单击按钮。我正在考虑循环前一个问题的代码。感谢您快速而有用的回答。当您没有显示问题中的真实url时，就会发生这种情况-我们看不到真实的HTML，也无法创建解决所有问题的代码。正如我在回答中所说的，也许您可以使用其他url属性仅查找带有数字的图像。要刷新页面，可以再次使用driver.get（您的\u url）
或发送类似driver.execute的JavaScript代码（'document.location=your\u url'））不用担心，因为我已经找到了解决方案。多亏了您的帮助，我取得了很大进步，但仍有一些障碍。我可以将其描述为生成的数字是正确的，但在下一次“调查”中我的程序写下了以前的号码。我增加了一些睡眠时间，但没有效果。除非你能再次帮助我，否则我可以发表新帖子。在新页面上创建新问题-你将有更多的空间用于描述和代码。新的人会看到它，也许其他人会帮助你。
html = '''<img width="35" height="55" src="images/capchs/6.png">
<img width="35" height="55" src="images/capchs/3.png">
<img width="35" height="55" src="images/capchs/1.png">'''

from bs4 import BeautifulSoup as BS

soup = BS(html, 'lxml')

all_items = soup.find_all('img')
#all_items = soup.find_all('img', width='35')
for item in all_items:
    print('char:', item['src'][-5])

number = [item['src'][-5] for item in all_items]
number = "".join(number)

print('number:', number)

char: 6
char: 3
char: 1
number: 631

all_items = driver.find_elements_by_tag_name('img')
for item in all_items:
    print('char:', item.get_attribute('src')[-5])

number = [item.get_attribute('src')[-5] for item in all_items]
number = "".join(number)

print('number:', number)