Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/fortran/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在Scrapy中使用For循环将Xpath值追加到列表_Python_Pandas_Numpy_Scrapy - Fatal编程技术网

Python 在Scrapy中使用For循环将Xpath值追加到列表

Python 在Scrapy中使用For循环将Xpath值追加到列表,python,pandas,numpy,scrapy,Python,Pandas,Numpy,Scrapy,我正在寻找尝试和自动化我的html表刮在刮。这就是我到目前为止所做的: import scrapy import pandas as pd class XGSpider(scrapy.Spider): name = 'expectedGoals' start_urls = [ 'https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures', ] def p

我正在寻找尝试和自动化我的html表刮在刮。这就是我到目前为止所做的:

import scrapy
import pandas as pd

class XGSpider(scrapy.Spider):

    name = 'expectedGoals'

    start_urls = [
        'https://fbref.com/en/comps/9/schedule/Premier-League-Scores-and-Fixtures',
    ]

    def parse(self, response):

        matches = []

        for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):

            match = {
                'home': row.xpath('td[4]//text()').extract_first(),
                'homeXg': row.xpath('td[5]//text()').extract_first(),
                'score': row.xpath('td[6]//text()').extract_first(),
                'awayXg': row.xpath('td[7]//text()').extract_first(),
                'away': row.xpath('td[8]//text()').extract_first()
            }

            matches.append(match)

        x = pd.DataFrame(
            matches, columns=['home', 'homeXg', 'score', 'awayXg', 'away'])

        yield x.to_csv("xG.csv", sep=",", index=False)
它工作正常,但是正如您所看到的,我正在为
匹配对象硬编码键(
home
homeXg
,等等)。我想自动将键刮到列表中,然后用所述列表中的键初始化dict。问题是,我不知道如何通过索引循环xpath。例如,

 headers = [] 
        for row in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr'): 
            yield{
                'first': row.xpath('th[1]/text()').extract_first(),
                'second': row.xpath('th[2]/text()').extract_first()
            }
是否可以将
th[1]
th[2]
th[3]
等粘贴到for循环中,以数字作为索引,并将值附加到列表中?e、 g


row.xpath('th[i]/text()).extract_first()

未测试,但应能工作:

column_index = 1
columns = {}
for column_node in response.xpath('//*[@id="sched_ks_3260_1"]/thead/tr/th'):
    column_name = column_node.xpath('./text()').extract_first()
    columns[column_name] = column_index
    column_index += 1
    matches = []

for row in response.xpath('//*[@id="sched_ks_3232_1"]//tbody/tr'):
    match = {}        
    for column_name in columns.keys():
        match[column_name] = row.xpath('./td[{index}]//text()'.format(index=columns[column_name])).extract_first()
    matches.append(match)

我不确定我是否理解这个问题。f字符串不能解决您的问题吗?比如:
row.xpath(f'th[{index\u var}]/text())
?对不起,我对Python很陌生,可能问题不清楚。。。标题键目前是硬编码的,我想自动对其进行刮取,但要做到这一点,我必须弄清楚如何计算表中的列数,然后循环遍历每个xpath—不知道如何做到这一点。