Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用pandas将多个XPath转换为数据帧?_Python_Pandas_Dataframe_Xpath_Web Scraping - Fatal编程技术网

Python 如何使用pandas将多个XPath转换为数据帧?

Python 如何使用pandas将多个XPath转换为数据帧?,python,pandas,dataframe,xpath,web-scraping,Python,Pandas,Dataframe,Xpath,Web Scraping,我开始为2018年美国职业棒球大联盟的投手们拼搏。我有各种类别,我想变成一个数据框,以便我可以打印到excel。我想用熊猫。这是我目前的代码: from urllib.request import urlopen from lxml.html import fromstring url = "https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml" #remove HTML comment mar

我开始为2018年美国职业棒球大联盟的投手们拼搏。我有各种类别,我想变成一个数据框,以便我可以打印到excel。我想用熊猫。这是我目前的代码:

from urllib.request import urlopen
from lxml.html import fromstring

url = "https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml"

#remove HTML comment markup
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)    

for pitcher_row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
    names = pitcher_row.xpath('.//td[@data-stat="player"]/a')[0].text
    age = pitcher_row.xpath('.//td[@data-stat="age"]/text()')[0]
    w = pitcher_row.xpath('.//td[@data-stat="W"]/text()')[0]
    l = pitcher_row.xpath('.//td[@data-stat="L"]/text()')[0]
    g = pitcher_row.xpath('.//td[@data-stat="G"]/text()')[0]
    gs = pitcher_row.xpath('.//td[@data-stat="GS"]/text()')[0]
    ip = pitcher_row.xpath('.//td[@data-stat="IP"]/text()')[0]
    hits = pitcher_row.xpath('.//td[@data-stat="H"]/text()')[0]
    runs = pitcher_row.xpath('.//td[@data-stat="R"]/text()')[0]
    bb = pitcher_row.xpath('.//td[@data-stat="BB"]/text()')[0]
    so = pitcher_row.xpath('.//td[@data-stat="SO"]/text()')[0]

#print data        
    print(names, age, w, l, g, gs, ip, hits, runs, bb, so)
不过,我想使用上面的数据。不确定是否需要附加数据


谢谢

如何实例化一个空数据帧并按行追加您的已刮取数据:

columns = ("names", "age", "w", "l", "g", "gs", "ip", "hits", "runs", "bb", "so")
df = pd.DataFrame(columns=columns)

for idx, pitcher_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    tmp = []
    tmp.append(pitcher_row.xpath('.//td[@data-stat="player"]/a')[0].text)
    tmp.append(pitcher_row.xpath('.//td[@data-stat="age"]/text()')[0])
    tmp.append(pitcher_row.xpath('.//td[@data-stat="W"]/text()')[0])
    ...

    df.loc[idx] = tmp
如果您想继续使用大部分代码,则更简单:

columns = ("names", "age", "w", "l", "g", "gs", "ip", "hits", "runs", "bb", "so")
df = pd.DataFrame(columns=columns)

for idx, pitcher_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    names = pitcher_row.xpath('.//td[@data-stat="player"]/a')[0].text
    age = pitcher_row.xpath('.//td[@data-stat="age"]/text()')[0]
    w = pitcher_row.xpath('.//td[@data-stat="W"]/text()')[0]
    l = pitcher_row.xpath('.//td[@data-stat="L"]/text()')[0]
    g = pitcher_row.xpath('.//td[@data-stat="G"]/text()')[0]
    gs = pitcher_row.xpath('.//td[@data-stat="GS"]/text()')[0]
    ip = pitcher_row.xpath('.//td[@data-stat="IP"]/text()')[0]
    hits = pitcher_row.xpath('.//td[@data-stat="H"]/text()')[0]
    runs = pitcher_row.xpath('.//td[@data-stat="R"]/text()')[0]
    bb = pitcher_row.xpath('.//td[@data-stat="BB"]/text()')[0]
    so = pitcher_row.xpath('.//td[@data-stat="SO"]/text()')[0]

    df.loc[idx] = (names, age, w, l, g, gs, ip, hits, runs, bb, so)

你的作品完美无缺@petezurich!非常感谢您的时间和努力,先生。非常感谢=)
columns = ("names", "age", "w", "l", "g", "gs", "ip", "hits", "runs", "bb", "so")
df = pd.DataFrame(columns=columns)

for idx, pitcher_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    names = pitcher_row.xpath('.//td[@data-stat="player"]/a')[0].text
    age = pitcher_row.xpath('.//td[@data-stat="age"]/text()')[0]
    w = pitcher_row.xpath('.//td[@data-stat="W"]/text()')[0]
    l = pitcher_row.xpath('.//td[@data-stat="L"]/text()')[0]
    g = pitcher_row.xpath('.//td[@data-stat="G"]/text()')[0]
    gs = pitcher_row.xpath('.//td[@data-stat="GS"]/text()')[0]
    ip = pitcher_row.xpath('.//td[@data-stat="IP"]/text()')[0]
    hits = pitcher_row.xpath('.//td[@data-stat="H"]/text()')[0]
    runs = pitcher_row.xpath('.//td[@data-stat="R"]/text()')[0]
    bb = pitcher_row.xpath('.//td[@data-stat="BB"]/text()')[0]
    so = pitcher_row.xpath('.//td[@data-stat="SO"]/text()')[0]

    df.loc[idx] = (names, age, w, l, g, gs, ip, hits, runs, bb, so)