Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html 在BeautifulSoup响应中处理十六进制值?_Html_Python 3.x_Web Scraping_Beautifulsoup_Hex - Fatal编程技术网

Html 在BeautifulSoup响应中处理十六进制值?

Html 在BeautifulSoup响应中处理十六进制值?,html,python-3.x,web-scraping,beautifulsoup,hex,Html,Python 3.x,Web Scraping,Beautifulsoup,Hex,我正在使用beautiful soup收集一些数据: url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447" heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'} respon

我正在使用beautiful soup收集一些数据:

url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}

response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")
然后,我用以下方法提取特定信息:

height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
print(height)
按预期工作,打印

1,74 m
但当我尝试使用此函数计算该字符串时:

def format_height(height_string):
    return int(height_string.split(" ")[0].replace(',',''))
我得到以下错误:

format_height(height)
Traceback (most recent call last):
  File "get_player_info.py", line 73, in <module>
    player_info = get_player_info(url)
  File "get_player_info.py", line 39, in get_player_info
    format_height(height)
  File "/Users/kompella/Documents/la-segunda/util.py", line 49, in format_height
    return int(height_string.split(" ")[0].replace(',',''))
ValueError: invalid literal for int() with base 10: '174\xa0m'
格式\u高度(高度)
回溯(最近一次呼叫最后一次):
文件“get_player_info.py”,第73行,在
玩家信息=获取玩家信息(url)
文件“get_player_info.py”,第39行,在get_player_info中
格式\u高度(高度)
文件“/Users/kompella/Documents/la segunda/util.py”,第49行,格式为
返回int(高度_字符串.split(“”[0]。替换(“”,“”,“”))
ValueError:基数为10的int()的文本无效:“174\xa0m”

我想知道我应该如何计算得到的十六进制值?

一切都很好,只要解构它们&之后你可以做任何你想做的事情

import requests
import re
from bs4 import BeautifulSoup

url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}

response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")

height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
numerals = [int(s) for s in re.findall(r'\b\d+\b', height)]
print (numerals)
#output: [1, 74]
print ("Height is: " + str(numerals[0]) +"."+ str(numerals[1]) +"m")
#output: Height is: 1.75m
print ("Height is: " + str(numerals[0]) + str(numerals[1]) +"cm")
#output: Height is: 175cm
无论如何,这篇文章也讨论了同样的问题。你可以看看:

使用属性=值选择器选择目标高度,然后按原样使用函数

import requests
from bs4 import BeautifulSoup as bs

def format_height(height_string):
    return int(height_string.split(" ")[0].replace(',',''))

r = requests.get('https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447', headers = {'User-Agent':'Mozilla\5.0'})
soup = bs(r.content,'lxml')
height_string = soup.select_one('[itemprop=height]').text

print(format_height(height_string))