Scrapy/Python:如何将项目添加到类内创建的列表中?
我试图将一个对象添加到另一个对象内创建的列表中。 下面是我的课程:Scrapy/Python:如何将项目添加到类内创建的列表中?,python,list,class,collections,scrapy,Python,List,Class,Collections,Scrapy,我试图将一个对象添加到另一个对象内创建的列表中。 下面是我的课程: # Clases auxiliares class job(scrapy.Item): # containing class, it has a List of 'batches' job_name = scrapy.Field() status = scrapy.Field() start = scrapy.Field() end = scrapy.Field() operator =
# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
job_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
planned = scrapy.Field()
executed = scrapy.Field()
def __init__(self):
self.batches = []
class batch(scrapy.Item): # this class goes inside a job class, and
# also stores a list of 'units'
batch_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
def __init__(self):
self.units = []
class unit(scrapy.Item): # Finally, this class stores a list of data
unit_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
def __init__(self):
self.datos = []
下面是我正在尝试运行的代码(不幸的是,有错误):
任何帮助都将不胜感激
谢谢。根据官方文件() Field类只是内置dict类的别名,不提供任何额外的功能或属性。换句话说,字段对象是普通的老Python dict。一个单独的类用于支持基于类属性的项声明语法 因此,您可以将
批次
、单位
和数据
定义为字段
对象,如下所示
# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
job_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
planned = scrapy.Field()
executed = scrapy.Field()
batches = scrapy.Field()
class batch(scrapy.Item): # this class goes inside a job class, and
# also stores a list of 'units'
batch_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
units = scrapy.Field()
class unit(scrapy.Item): # Finally, this class stores a list of data
unit_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
datos = scrapy.Field()
def inicializa_batches(self, lista_batches, jobs):
# 1- the param lista_batches is an extract() of a portion of the
# response.css with the required data
# 2 - The param jobs is a list of job() objects previously created
for batchname in lista_batches:
bn = str(batchname.strip()) #mejor recibir pura cadena de texto
if len(bn) > 0:
newbatch = batch() #declare a new batch object
newbatch['batch_name'] = bn
for job in jobs:
job['batches'] = []
nom_job = job['job_name']
if nom_job[0:4] == bn[0:4]: #4 letter match
job['batches'].append(newbatch)
self.log(bn)
在函数中,您可以将其更改为以下内容
# Clases auxiliares
class job(scrapy.Item): # containing class, it has a List of 'batches'
job_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
planned = scrapy.Field()
executed = scrapy.Field()
batches = scrapy.Field()
class batch(scrapy.Item): # this class goes inside a job class, and
# also stores a list of 'units'
batch_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
units = scrapy.Field()
class unit(scrapy.Item): # Finally, this class stores a list of data
unit_name = scrapy.Field()
status = scrapy.Field()
start = scrapy.Field()
end = scrapy.Field()
operator = scrapy.Field()
recipe = scrapy.Field()
datos = scrapy.Field()
def inicializa_batches(self, lista_batches, jobs):
# 1- the param lista_batches is an extract() of a portion of the
# response.css with the required data
# 2 - The param jobs is a list of job() objects previously created
for batchname in lista_batches:
bn = str(batchname.strip()) #mejor recibir pura cadena de texto
if len(bn) > 0:
newbatch = batch() #declare a new batch object
newbatch['batch_name'] = bn
for job in jobs:
job['batches'] = []
nom_job = job['job_name']
if nom_job[0:4] == bn[0:4]: #4 letter match
job['batches'].append(newbatch)
self.log(bn)
谢谢你的回答。但现在每次我运行代码时,批都会被覆盖(一个作业可以有多个批,一个批可以有多个单元,一个单元可以有多个不同的数据)。这是因为“for job in jobs:”循环中的job['batches']=[]导致的。inicializa_batch函数的预期行为是什么?inicializa_batches将浏览所有文档,以查找特定类中的特定文本,它必须创建一系列批处理对象,并将它们附加到“for job in jobs”循环中迭代的作业对象的批处理字段中。我为什么要这样做?因为源文件的结构没有正确格式化——整个文档是一个表,没有任何类型的分区,没有div或任何东西。幸运的是,关键数据点有特定的类,但是无法区分每个批属于哪个作业。