Python 使用beautifulsoup从span标记中刮取数据_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 使用beautifulsoup从span标记中刮取数据

python python-3.x web-scraping

Python 使用beautifulsoup从span标记中刮取数据,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我正在尝试刮网页，在那里我需要解码成一个数据帧的整个表。我用漂亮的汤来做这个。在某些td标记中，有span标记没有任何文本。但是这些值显示在网页上的特定span标记中以下html代码对应于该网页 <td> <span class="nttu">::after</span> <span class="ntbb">::after</span> <span class="n

我正在尝试刮网页，在那里我需要解码成一个数据帧的整个表。我用漂亮的汤来做这个。在某些

td

标记中，有

span

标记没有任何文本。但是这些值显示在网页上的特定span标记中

以下

html

代码对应于该网页

<td>
  <span class="nttu">::after</span>
  <span class="ntbb">::after</span>
  <span class="ntyc">::after</span>
  <span class="nttu">::after</span>
</td>

误解了您的问题，因为您引用了两个不同的URL。我现在明白你的意思了

是的，奇怪的是，在第二个表中，他们使用CSS填充了一些

标记的内容。您需要做的是从

标签中取出这些特殊情况。一旦有了这些元素，就可以在html源代码中替换这些元素，并最终将其解析为数据帧。我使用熊猫，因为它在引擎盖下使用BeautifulSoup解析

标记。但我相信这会让你得到你想要的：

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")

hiddenData = str(soup.find_all('style')[1])
hiddenSpan = {}
for group in re.findall(r'span\.(.+?)}',hiddenData):
    class_attr = group.split('span.')[-1].split('::')[0]
    content = group.split('"')[1]
    hiddenSpan[class_attr] = content

climate_table = str(soup.find("table", attrs={"class": "medias mensuales numspan"}))   
for k, v in hiddenSpan.items():
    climate_table = climate_table.replace('<span class="%s"></span>' %(k), hiddenSpan[k])


df = pd.read_html(climate_table)[0]

页面可能是动态的，您需要从呈现的页面中提取html。除非你分享这个网址，否则没有人能提供更多帮助with@chitown88，我已经添加了该网站的URL，您在第五行中发现了一个问题。谢谢，您最好包含您的代码，否则很难看出问题所在；）@Thanjayas，你只是想拉那张桌子吗？@Isma，我已经添加了代码供你参考，谢谢。我不知道为什么一开始投票被否决了。但现在我明白了。您在问题中使用了两个不同的URL。我在看第一张，那是我提供的桌子。但是所使用的代码和您实际指的是给您带来了问题。@Thanjaya S，我更新了代码来回答您的问题。考虑到两个不同的URL引用，最初并不清楚，但再看一次，我明白你的意思了。

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")

hiddenData = str(soup.find_all('style')[1])
hiddenSpan = {}
for group in re.findall(r'span\.(.+?)}',hiddenData):
    class_attr = group.split('span.')[-1].split('::')[0]
    content = group.split('"')[1]
    hiddenSpan[class_attr] = content

climate_table = str(soup.find("table", attrs={"class": "medias mensuales numspan"}))   
for k, v in hiddenSpan.items():
    climate_table = climate_table.replace('<span class="%s"></span>' %(k), hiddenSpan[k])


df = pd.read_html(climate_table)[0]

print (df.to_string())
                          Day                          T                         TM                         Tm                        SLP                          H                         PP                         VV                          V                         VM                         VG                         RA                         SN                         TS                         FG
0                           1                       23.4                       30.3                         19                          -                         59                          0                        6.3                        4.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
1                           2                       22.4                       30.3                       16.9                          -                         57                          0                        6.9                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
2                           3                         24                       31.8                       16.9                          -                         51                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
3                           4                       24.2                         32                       17.4                          -                         53                          0                          6                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
4                           5                       23.8                         32                         18                          -                         58                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
5                           6                       23.3                         31                       18.3                          -                         60                          0                        6.9                          5                        9.4                          -                        NaN                        NaN                        NaN                        NaN
6                           7                       22.8                       30.2                       17.6                          -                         55                          0                        7.7                        3.7                        7.6                          -                        NaN                        NaN                        NaN                        NaN
7                           8                       23.1                       30.6                       17.4                          -                         46                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
8                           9                       22.9                       30.6                       17.4                          -                         51                          0                        6.9                        3.5                        3.5                          -                        NaN                        NaN                        NaN                        NaN
9                          10                       22.3                         30                         17                          -                         56                          0                        6.3                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
10                         11                       22.3                       29.4                         17                          -                         53                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
11                         12                       21.8                       29.4                       15.7                          -                         54                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
12                         13                       22.3                       30.1                       15.7                          -                         43                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
13                         14                       21.8                       30.6                       14.8                          -                         41                          0                        6.9                        1.9                        5.4                          -                        NaN                        NaN                        NaN                        NaN
14                         15                       21.6                       30.6                       14.2                          -                         43                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
15                         16                       21.1                       29.9                       15.4                          -                         55                          0                        6.9                        4.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
16                         17                       20.4                       28.1                       15.4                          -                         59                          0                        6.9                          5                       11.1                          -                        NaN                        NaN                        NaN                        NaN
17                         18                       21.2                       28.3                       14.5                          -                         53                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
18                         19                       21.6                       29.6                       16.4                          -                         58                          0                        6.9                        2.2                        3.5                          -                        NaN                        NaN                        NaN                        NaN
19                         20                       21.9                       29.6                       16.6                          -                         58                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
20                         21                       22.3                       29.9                       17.5                          -                         55                          0                        6.9                        3.1                        5.4                          -                        NaN                        NaN                        NaN                        NaN
21                         22                       21.9                       29.9                       15.1                          -                         46                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
22                         23                       21.3                         29                       15.2                          -                         50                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
23                         24                       21.3                       28.8                       14.6                          -                         45                          0                        6.9                          3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
24                         25                       21.6                       29.1                       15.5                          -                         47                          0                        7.7                        4.8                        7.6                          -                        NaN                        NaN                        NaN                        NaN
25                         26                       21.8                       29.2                       14.6                          -                         41                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
26                         27                       22.3                       30.1                       15.6                          -                         40                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
27                         28                       22.4                       30.3                         16                          -                         51                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
28                         29                         23                       30.3                       16.9                          -                         53                          0                        6.6                        2.8                        5.4                          -                        NaN                        NaN                        NaN                          o
29                         30                       23.1                         30                       17.8                          -                         54                          0                        6.9                        5.4                        7.6                          -                        NaN                        NaN                        NaN                        NaN
30                         31                       22.1                       29.8                       17.3                          -                         54                          0                        6.9                        5.2                        9.4                          -                        NaN                        NaN                        NaN                        NaN
31  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:
32                        NaN                       22.3                         30                       16.4                          -                       51.6                          0                        6.9                        3.5                        6.3                        NaN                          0                          0                          0                          1