Python 美丽的汤
我无法解析这个似乎没有任何类引用的xml 我的代码片段:Python 美丽的汤,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我无法解析这个似乎没有任何类引用的xml 我的代码片段: sock = urllib2.urlopen(l) link = sock.read() soup = BeautifulSoup(link,"xml") FirstNameHome=soup.find('home_probable_pitcher','first_name') 我想找到主客场球队的名字: (只有两个实例,因此不确定是否应该使用findAll) 这是使用soup.prettify LookupError: unkn
sock = urllib2.urlopen(l)
link = sock.read()
soup = BeautifulSoup(link,"xml")
FirstNameHome=soup.find('home_probable_pitcher','first_name')
我想找到主客场球队的名字:
(只有两个实例,因此不确定是否应该使用findAll
)
这是使用soup.prettify
LookupError: unknown encoding: <?xml version="1.0" encoding="UTF-8"?><!--Copyright 2017 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt-->
<game id="2017/06/02/nyamlb-tormlb-1" venue="Rogers Centre" game_pk="490921"
time="7:07"
time_date="2017/06/02 7:07"
time_date_aw_lg="2017/06/02 7:07"
time_date_hm_lg="2017/06/02 7:07"
time_zone="ET"
ampm="PM"
first_pitch_et=""
away_time="7:07"
away_time_zone="ET"
away_ampm="PM"
home_time="7:07"
home_time_zone="ET"
home_ampm="PM"
game_type="R"
tiebreaker_sw="N"
original_date="2017/06/02"
time_zone_aw_lg="-4"
time_zone_hm_lg="-4"
time_aw_lg="7:07"
aw_lg_ampm="PM"
tz_aw_lg_gen="ET"
time_hm_lg="7:07"
hm_lg_ampm="PM"
tz_hm_lg_gen="ET"
venue_id="14"
scheduled_innings="9"
away_name_abbrev="NYY"
home_name_abbrev="TOR"
away_code="nya"
away_file_code="nyy"
away_team_id="147"
away_team_city="NY Yankees"
away_team_name="Yankees"
away_division="E"
away_league_id="103"
away_sport_code="mlb"
home_code="tor"
home_file_code="tor"
home_team_id="141"
home_team_city="Toronto"
home_team_name="Blue Jays"
home_division="E"
home_league_id="103"
home_sport_code="mlb"
day="FRI"
gameday_sw="P"
double_header_sw="N"
game_nbr="1"
tbd_flag="N"
venue_w_chan_loc="CAXX0504"
location="Toronto, Canada"
gameday_link="2017_06_02_nyamlb_tormlb_1"
away_win="30"
away_loss="20"
home_win="26"
home_loss="27"
game_data_directory="/components/game/mlb/year_2017/month_06/day_02/gid_2017_06_02_nyamlb_tormlb_1"
league="AA"
inning_state=""
note=""
status="Preview"
ind="S"
tv_station="SNET-1, MLBN (out-of-market only)">
<home_probable_pitcher id="434538" first_name="Francisco" first="Francisco" last_name="Liriano"
last="Liriano"
name_display_roster="Liriano"
number="45"
throwinghand="LHP"
wins="2"
losses="2"
era="6.35"
s_wins="2"
s_losses="2"
s_era="6.35"
stats_season="2017"
stats_type="R"/>
<away_probable_pitcher id="501381" first_name="Michael" first="Michael" last_name="Pineda"
last="Pineda"
name_display_roster="Pineda"
number="35"
throwinghand="RHP"
wins="6"
losses="2"
era="3.32"
s_wins="6"
s_losses="2"
s_era="3.32"
stats_season="2017"
stats_type="R"/>
<game_media>
<media type="game" calendar_event_id="14-490921-2017-06-02"
start="2017-06-02T19:07:00-0400"
title="NYY @ TOR"
has_mlbtv="true"
free="NO"
enhanced="N"
media_state="media_off"
thumbnail="http://mediadownloads.mlb.com/mlbam/preview/nyator_490921_th_7_preview.jpg"/>
</game_media>
</game>
LookupError:未知编码:
如果我们写
# for Python 3
# import urllib.request
import urllib2
from bs4 import BeautifulSoup
l = 'http://gd2.mlb.com/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1/linescore.xml'
sock = urllib2.urlopen(l)
# for Python 3
# sock = urllib.request.urlopen(l)
link = sock.read()
soup = BeautifulSoup(link, "xml")
FirstNameHome = soup.find('home_probable_pitcher').attrs['first_name']
print(FirstNameHome)
它给
Edinson
也
给予
这不是您需要的,因为参数可以是('utf-8'
)和(默认为'minimal'
),而不是原始内容,所以只需编写
pretty = soup.prettify()
它会给
>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
它会给
>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
>>类型(漂亮)
如果我们写
# for Python 3
# import urllib.request
import urllib2
from bs4 import BeautifulSoup
l = 'http://gd2.mlb.com/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1/linescore.xml'
sock = urllib2.urlopen(l)
# for Python 3
# sock = urllib.request.urlopen(l)
link = sock.read()
soup = BeautifulSoup(link, "xml")
FirstNameHome = soup.find('home_probable_pitcher').attrs['first_name']
print(FirstNameHome)
它给
Edinson
也
给予
好吧,这不是您需要的,因为参数可以是('utf-8'
例如)和(默认为'minimal'
),而不是原始内容,所以只需编写即可
pretty = soup.prettify()
它会给
>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
它会给
>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
>>类型(漂亮)
请,添加URL示例(l
对象在您的示例中)注意,URL输出将在2017年3月6日后更改请,添加URL示例(l
对象在您的示例中)注意,URL输出将在2017年3月6日后更改谢谢..我并不太担心查找错误,因为我只需要知道如何解析名称。看起来像FirstNameHome=soup.find('home\u popular\u pitcher')。attrs['first\u name']起作用。我很快会再检查一遍。@DannyW:让我知道它是否可以改进谢谢。我不太关心查找错误,我只是需要知道如何解析名称。看起来像FirstNameHome=soup.find('home\u popular\u pitcher')。attrs['first\u name']起作用。我很快会再检查一遍。@DannyW:如果可以改进,请告诉我