Python 美丽的汤

Python 美丽的汤,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我无法解析这个似乎没有任何类引用的xml 我的代码片段: sock = urllib2.urlopen(l) link = sock.read() soup = BeautifulSoup(link,"xml") FirstNameHome=soup.find('home_probable_pitcher','first_name') 我想找到主客场球队的名字: (只有两个实例,因此不确定是否应该使用findAll) 这是使用soup.prettify LookupError: unkn

我无法解析这个似乎没有任何类引用的xml

我的代码片段:

sock = urllib2.urlopen(l)
link = sock.read()

soup = BeautifulSoup(link,"xml")

FirstNameHome=soup.find('home_probable_pitcher','first_name')
我想找到主客场球队的名字:

(只有两个实例,因此不确定是否应该使用
findAll

这是使用
soup.prettify

 LookupError: unknown encoding: <?xml version="1.0" encoding="UTF-8"?><!--Copyright 2017 MLB Advanced Media, L.P.  Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt-->
<game id="2017/06/02/nyamlb-tormlb-1" venue="Rogers Centre" game_pk="490921"
      time="7:07"
      time_date="2017/06/02 7:07"
      time_date_aw_lg="2017/06/02 7:07"
      time_date_hm_lg="2017/06/02 7:07"
      time_zone="ET"
      ampm="PM"
      first_pitch_et=""
      away_time="7:07"
      away_time_zone="ET"
      away_ampm="PM"
      home_time="7:07"
      home_time_zone="ET"
      home_ampm="PM"
      game_type="R"
      tiebreaker_sw="N"
      original_date="2017/06/02"
      time_zone_aw_lg="-4"
      time_zone_hm_lg="-4"
      time_aw_lg="7:07"
      aw_lg_ampm="PM"
      tz_aw_lg_gen="ET"
      time_hm_lg="7:07"
      hm_lg_ampm="PM"
      tz_hm_lg_gen="ET"
      venue_id="14"
      scheduled_innings="9"
      away_name_abbrev="NYY"
      home_name_abbrev="TOR"
      away_code="nya"
      away_file_code="nyy"
      away_team_id="147"
      away_team_city="NY Yankees"
      away_team_name="Yankees"
      away_division="E"
      away_league_id="103"
      away_sport_code="mlb"
      home_code="tor"
      home_file_code="tor"
      home_team_id="141"
      home_team_city="Toronto"
      home_team_name="Blue Jays"
      home_division="E"
      home_league_id="103"
      home_sport_code="mlb"
      day="FRI"
      gameday_sw="P"
      double_header_sw="N"
      game_nbr="1"
      tbd_flag="N"
      venue_w_chan_loc="CAXX0504"
      location="Toronto, Canada"
      gameday_link="2017_06_02_nyamlb_tormlb_1"
      away_win="30"
      away_loss="20"
      home_win="26"
      home_loss="27"
      game_data_directory="/components/game/mlb/year_2017/month_06/day_02/gid_2017_06_02_nyamlb_tormlb_1"
      league="AA"
      inning_state=""
      note=""
      status="Preview"
      ind="S"
      tv_station="SNET-1, MLBN (out-of-market only)">
   <home_probable_pitcher id="434538" first_name="Francisco" first="Francisco" last_name="Liriano"
                          last="Liriano"
                          name_display_roster="Liriano"
                          number="45"
                          throwinghand="LHP"
                          wins="2"
                          losses="2"
                          era="6.35"
                          s_wins="2"
                          s_losses="2"
                          s_era="6.35"
                          stats_season="2017"
                          stats_type="R"/>
   <away_probable_pitcher id="501381" first_name="Michael" first="Michael" last_name="Pineda"
                          last="Pineda"
                          name_display_roster="Pineda"
                          number="35"
                          throwinghand="RHP"
                          wins="6"
                          losses="2"
                          era="3.32"
                          s_wins="6"
                          s_losses="2"
                          s_era="3.32"
                          stats_season="2017"
                          stats_type="R"/>
   <game_media>
      <media type="game" calendar_event_id="14-490921-2017-06-02"
             start="2017-06-02T19:07:00-0400"
             title="NYY @ TOR"
             has_mlbtv="true"
             free="NO"
             enhanced="N"
             media_state="media_off"
             thumbnail="http://mediadownloads.mlb.com/mlbam/preview/nyator_490921_th_7_preview.jpg"/>
   </game_media>
</game>
LookupError:未知编码:
如果我们写

# for Python 3
# import urllib.request

import urllib2

from bs4 import BeautifulSoup

l = 'http://gd2.mlb.com/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1/linescore.xml'

sock = urllib2.urlopen(l)
# for Python 3
# sock = urllib.request.urlopen(l)
link = sock.read()

soup = BeautifulSoup(link, "xml")

FirstNameHome = soup.find('home_probable_pitcher').attrs['first_name']
print(FirstNameHome)
它给

Edinson

给予

这不是您需要的,因为参数可以是(
'utf-8'
)和(默认为
'minimal'
),而不是原始内容,所以只需编写

pretty = soup.prettify()
它会给

>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
它会给

>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
>>类型(漂亮)
如果我们写

# for Python 3
# import urllib.request

import urllib2

from bs4 import BeautifulSoup

l = 'http://gd2.mlb.com/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1/linescore.xml'

sock = urllib2.urlopen(l)
# for Python 3
# sock = urllib.request.urlopen(l)
link = sock.read()

soup = BeautifulSoup(link, "xml")

FirstNameHome = soup.find('home_probable_pitcher').attrs['first_name']
print(FirstNameHome)
它给

Edinson

给予

好吧,这不是您需要的,因为参数可以是(
'utf-8'
例如)和(默认为
'minimal'
),而不是原始内容,所以只需编写即可

pretty = soup.prettify()
它会给

>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
它会给

>>> type(pretty)
<type 'unicode'>
>>> type(pretty)
<type 'str'>
>>类型(漂亮)

请,添加URL示例(
l
对象在您的示例中)注意,URL输出将在2017年3月6日后更改请,添加URL示例(
l
对象在您的示例中)注意,URL输出将在2017年3月6日后更改谢谢..我并不太担心查找错误,因为我只需要知道如何解析名称。看起来像FirstNameHome=soup.find('home\u popular\u pitcher')。attrs['first\u name']起作用。我很快会再检查一遍。@DannyW:让我知道它是否可以改进谢谢。我不太关心查找错误,我只是需要知道如何解析名称。看起来像FirstNameHome=soup.find('home\u popular\u pitcher')。attrs['first\u name']起作用。我很快会再检查一遍。@DannyW:如果可以改进,请告诉我