Scrapy/Python:如何将项目添加到类内创建的列表中?

Scrapy/Python:如何将项目添加到类内创建的列表中?,python,list,class,collections,scrapy,Python,List,Class,Collections,Scrapy,我试图将一个对象添加到另一个对象内创建的列表中。 下面是我的课程: # Clases auxiliares class job(scrapy.Item): # containing class, it has a List of 'batches' job_name = scrapy.Field() status = scrapy.Field() start = scrapy.Field() end = scrapy.Field() operator =

我试图将一个对象添加到另一个对象内创建的列表中。 下面是我的课程:

# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
    job_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    planned = scrapy.Field()
    executed = scrapy.Field()   
    def __init__(self):
        self.batches = [] 

class batch(scrapy.Item): # this class goes inside a job class, and
                          # also stores a list of 'units'
    batch_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    def __init__(self):
        self.units = [] 

class unit(scrapy.Item): # Finally, this class stores a list of data
    unit_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    def __init__(self):
        self.datos = [] 
下面是我正在尝试运行的代码(不幸的是,有错误):

任何帮助都将不胜感激


谢谢。

根据官方文件()

Field类只是内置dict类的别名,不提供任何额外的功能或属性。换句话说,字段对象是普通的老Python dict。一个单独的类用于支持基于类属性的项声明语法

因此,您可以将
批次
单位
数据
定义为
字段
对象,如下所示

# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
    job_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    planned = scrapy.Field()
    executed = scrapy.Field()
    batches = scrapy.Field()   


class batch(scrapy.Item): # this class goes inside a job class, and
                          # also stores a list of 'units'
    batch_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    units = scrapy.Field()

class unit(scrapy.Item): # Finally, this class stores a list of data
    unit_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    datos = scrapy.Field()
def inicializa_batches(self, lista_batches, jobs):

# 1- the param lista_batches is an extract() of a portion of the 
# response.css with the required data

# 2 - The param jobs is a list of job() objects previously created
    for batchname in lista_batches:
        bn =  str(batchname.strip()) #mejor recibir pura cadena de texto
        if len(bn) > 0:
            newbatch = batch() #declare a new batch object
            newbatch['batch_name'] = bn
            for job in jobs:
                job['batches'] = []
                nom_job = job['job_name']
                if nom_job[0:4] == bn[0:4]: #4 letter match

                    job['batches'].append(newbatch)
        self.log(bn)
在函数中,您可以将其更改为以下内容

# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
    job_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    planned = scrapy.Field()
    executed = scrapy.Field()
    batches = scrapy.Field()   


class batch(scrapy.Item): # this class goes inside a job class, and
                          # also stores a list of 'units'
    batch_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    units = scrapy.Field()

class unit(scrapy.Item): # Finally, this class stores a list of data
    unit_name = scrapy.Field()
    status = scrapy.Field()
    start = scrapy.Field()
    end = scrapy.Field()
    operator = scrapy.Field()
    recipe = scrapy.Field()
    datos = scrapy.Field()
def inicializa_batches(self, lista_batches, jobs):

# 1- the param lista_batches is an extract() of a portion of the 
# response.css with the required data

# 2 - The param jobs is a list of job() objects previously created
    for batchname in lista_batches:
        bn =  str(batchname.strip()) #mejor recibir pura cadena de texto
        if len(bn) > 0:
            newbatch = batch() #declare a new batch object
            newbatch['batch_name'] = bn
            for job in jobs:
                job['batches'] = []
                nom_job = job['job_name']
                if nom_job[0:4] == bn[0:4]: #4 letter match

                    job['batches'].append(newbatch)
        self.log(bn)

谢谢你的回答。但现在每次我运行代码时,批都会被覆盖(一个作业可以有多个批,一个批可以有多个单元,一个单元可以有多个不同的数据)。这是因为“for job in jobs:”循环中的job['batches']=[]导致的。inicializa_batch函数的预期行为是什么?inicializa_batches将浏览所有文档,以查找特定类中的特定文本,它必须创建一系列批处理对象,并将它们附加到“for job in jobs”循环中迭代的作业对象的批处理字段中。我为什么要这样做?因为源文件的结构没有正确格式化——整个文档是一个表,没有任何类型的分区,没有div或任何东西。幸运的是,关键数据点有特定的类,但是无法区分每个批属于哪个作业。