Python 如何获得一个美丽的汤标签的内容？_Python_Beautifulsoup_Request

Python 如何获得一个美丽的汤标签的内容？

python

Python 如何获得一个美丽的汤标签的内容？,python,beautifulsoup,request,Python,Beautifulsoup,Request,我试图从各种AMC测试中提取问题。例如，考虑一下。为了得到问题文本，我只需要第一个元素中的正则字符串文本和第一个元素中的latex 到目前为止，我的代码是： res = requests.get('https://artofproblemsolving.com/wiki/index.php/2016_AMC_10B_Problems/Problem_1') soup = bs4.BeautifulSoup(res.text, 'html.parser') latex_equation = sou

我试图从各种AMC测试中提取问题。例如，考虑一下。为了得到问题文本，我只需要第一个元素中的正则字符串文本和第一个元素中的latex

到目前为止，我的代码是：

res = requests.get('https://artofproblemsolving.com/wiki/index.php/2016_AMC_10B_Problems/Problem_1')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
latex_equation = soup.select('p img')[0].get('alt')

当我得到乳胶方程时，它就起作用了，但之前的问题中有更多的部分是用双引号引起来的。有没有办法得到问题的另一部分，即价值是什么。我正在考虑使用正则表达式，但我想看看Beautiful Soup是否有一个功能可以为我实现它。

尝试使用：

输出：

What is the value of  $\frac{2a^{-1}+\frac{a^{-1}}{2}}{a}$

编辑：

BS4似乎有点问题。我花了一段时间才弄到这个。不要认为这些奇怪的间隔和所有的事情都是可行的。正则表达式是你最好的选择。让我知道这是否好。检查了前两个问题，结果很好。然而，AMC在几何方面确实存在一些图像问题，因此我认为它对这些问题不起作用

import bs4
import requests
import re

res = requests.get('https://artofproblemsolving.com/wiki/index.php/2016_AMC_10B_Problems/Problem_1')
soup = bs4.BeautifulSoup(res.content, 'html.parser').find('p')
elements = [i for i in soup.prettify().split("\n") if i][1:-2]
latex_reg = re.compile(r'alt="(.*?)"')
for n, i in enumerate(elements):
    mo = latex_reg.search(i)
    if mo:
        elements[n] = mo.group(1)
    elements[n] = re.sub(' +', ' ', elements[n]).lstrip()
    if elements[n][0] == "$":
        elements[n] = " "+elements[n]+" "

print(elements)
print("".join(elements))

.mw解析器的输出是什么？还有什么方法可以让它对所有的图像和p进行循环，因为最后还有更多的图像。@JamesHuang 1。选择类mw解析器输出2是一个很好的选择。请看我的编辑，我希望它能有所帮助，因为这个页面很难抓取。新的解决方案打印出正确的答案，但要分开打印，还要打印一些额外的内容。我不知道我应该怎样把文本按正确的顺序排列。我试着用这个叫做soup.children的东西，但它有点问题，因为它把所有的图像标签组合在一起。

soup = BeautifulSoup(requests.get(URL).content, "html.parser")

for text, tag in zip(soup.select(".mw-parser-output p"), soup.select("p img")):
    print(text.text.strip(), tag.get("alt"))

import bs4
import requests
import re

res = requests.get('https://artofproblemsolving.com/wiki/index.php/2016_AMC_10B_Problems/Problem_1')
soup = bs4.BeautifulSoup(res.content, 'html.parser').find('p')
elements = [i for i in soup.prettify().split("\n") if i][1:-2]
latex_reg = re.compile(r'alt="(.*?)"')
for n, i in enumerate(elements):
    mo = latex_reg.search(i)
    if mo:
        elements[n] = mo.group(1)
    elements[n] = re.sub(' +', ' ', elements[n]).lstrip()
    if elements[n][0] == "$":
        elements[n] = " "+elements[n]+" "

print(elements)
print("".join(elements))