Python 使用BeautifulSoup查找属于的_Python_Beautifulsoup

Python 使用BeautifulSoup查找属于的

python

Python 使用BeautifulSoup查找属于的,python,beautifulsoup,Python,Beautifulsoup,我想找到属于a的a的值？我可以在标记中搜索文本并找到它，但我不知道值，也没有要搜索的类。列的数量也可能有所不同。因此，我所拥有的只是《圣经》中的文本表格示例：信息1 信息2 信息3 信息4 信息5 价值1 价值2 价值3 价值4 价值5 假设我想找到属于Info4的Value4，这在BeautifulSoup中是如何实现的 Python 3.7.4和BeautifulSoup 4.9.3 tr = soup.find_all('tr')[1] #instead of this you can

我想找到属于a的a的值？我可以在标记中搜索文本并找到它，但我不知道值，也没有要搜索的类。列的数量也可能有所不同。因此，我所拥有的只是《圣经》中的文本

表格示例：

信息1 信息2 信息3 信息4 信息5 价值1 价值2 价值3 价值4 价值5 假设我想找到属于Info4的Value4，这在BeautifulSoup中是如何实现的

Python 3.7.4和BeautifulSoup 4.9.3

tr = soup.find_all('tr')[1] #instead of this you can search for Info4 and take its parent tr

for i, th in enumerate(tr.find_all('th')):
    if th.text == 'Info4':
        idx = i

此索引可用于访问属于所选标题的值

tr = soup.find_all('tr')[2] 
value = tr.find_all('td')[idx]

此索引可用于访问属于所选标题的值

tr = soup.find_all('tr')[2] 
value = tr.find_all('td')[idx]

可以使用pandas获取表格并获取该列：

html = '''
<table>
    <tbody>
        <tr>
            <th colspan="8">
                <span>
                    <a href="/link">Table Title</a>
                </span>
            </th>
        </tr>
        <tr>
            <th>Info1</th>
            <th>Info2</th>
            <th>Info3</th>
            <th>Info4</th>
            <th>Info5</th>
        </tr>
        <tr>
            <td>Value1</td>
            <td>Value2</td>
            <td>Value3</td>
            <td>Value4</td>
            <td>Value5</td>
        </tr>
    </tbody>
</table>'''

输出：

添加到Akasha之后，通过使用列表上的.index，该循环可以是一行

idx = [x.text for x in tr.find_all('th')].index('Info4')

将与以下内容相同：

for i, th in enumerate(tr.find_all('th')):
    if th.text == 'Info4':
        idx = i

可以使用pandas获取表格并获取该列：

html = '''
<table>
    <tbody>
        <tr>
            <th colspan="8">
                <span>
                    <a href="/link">Table Title</a>
                </span>
            </th>
        </tr>
        <tr>
            <th>Info1</th>
            <th>Info2</th>
            <th>Info3</th>
            <th>Info4</th>
            <th>Info5</th>
        </tr>
        <tr>
            <td>Value1</td>
            <td>Value2</td>
            <td>Value3</td>
            <td>Value4</td>
            <td>Value5</td>
        </tr>
    </tbody>
</table>'''

输出：

添加到Akasha之后，通过使用列表上的.index，该循环可以是一行

idx = [x.text for x in tr.find_all('th')].index('Info4')

将与以下内容相同：

for i, th in enumerate(tr.find_all('th')):
    if th.text == 'Info4':
        idx = i

你说过你可以得到第h个值info1，info2

基于此，我编写了非常简单的代码

如果你想，你可以升级这个，但如果你真的可以从另一个位置获得信息，应该已经可以工作了

html是您的html示例

我们的想法是映射信息位置示例，我想要info2的值，然后在tds部分中运行，以获得相等的td位置10，对于info10，是位置10，对于value10

from bs4 import BeautifulSoup

file = open('index.html')

soup = BeautifulSoup(file, 'html.parser')

text = soup.find_all('tr')


cont = 0
ths = 0
tds = 0


textS = 'Info3'
pos = 0


for word in text:

    if '<tr>' in str(word):
        cont += 1

    if cont == 2:

        for son in word:
            if '<th>' in str(son):
                ths += 1

            if textS in son:
                pos = ths

    if cont == 3:
        for son in word:
            if '<td>' in str(son):
                tds += 1

            if tds == pos:
                print(son)

你说过你可以得到第h个值info1，info2

基于此，我编写了非常简单的代码

如果你想，你可以升级这个，但如果你真的可以从另一个位置获得信息，应该已经可以工作了

html是您的html示例

我们的想法是映射信息位置示例，我想要info2的值，然后在tds部分中运行，以获得相等的td位置10，对于info10，是位置10，对于value10

from bs4 import BeautifulSoup

file = open('index.html')

soup = BeautifulSoup(file, 'html.parser')

text = soup.find_all('tr')


cont = 0
ths = 0
tds = 0


textS = 'Info3'
pos = 0


for word in text:

    if '<tr>' in str(word):
        cont += 1

    if cont == 2:

        for son in word:
            if '<th>' in str(son):
                ths += 1

            if textS in son:
                pos = ths

    if cont == 3:
        for son in word:
            if '<td>' in str(son):
                tds += 1

            if tds == pos:
                print(son)

这可能是相关的这可能是相关的真棒，谢谢！知道哪种方法在性能方面更快吗？我需要根据大量数据来做这件事好吧，熊猫在引擎盖下用漂亮的汤。所以不确定这是否有很大区别。我不完全确定熊猫是如何实现它的。我猜要么是差不多，要么是熊猫使用起来更快。但老实说，这只是猜测。太棒了，谢谢！知道哪种方法在性能方面更快吗？我需要根据大量数据来做这件事好吧，熊猫在引擎盖下用漂亮的汤。所以不确定这是否有很大区别。我不完全确定熊猫是如何实现它的。我猜要么是差不多，要么是熊猫使用起来更快。但老实说，这只是一个猜测。