Scrapy Pipelines-如何在SQL查询中创建变量?
我试图将我的临时数据(productid、category、name、description、price和timestamp)存储在Microsoft SQL数据库的两个单独的表中。一个名为Scrapy Pipelines-如何在SQL查询中创建变量?,sql,sql-server,scrapy,pymssql,Sql,Sql Server,Scrapy,Pymssql,我试图将我的临时数据(productid、category、name、description、price和timestamp)存储在Microsoft SQL数据库的两个单独的表中。一个名为products\u tb的表产生productid,类别,名称和说明。将数据存储在此相应数据库中的SQL语句还创建了一个productgroupid。必须使用productgroupid在名为priceinvalization的第二个表中存储剩余数据、price和timestamp。这背后的想法是,我有一个
products\u tb
的表产生productid
,类别
,名称
和说明
。将数据存储在此相应数据库中的SQL语句还创建了一个productgroupid
。必须使用productgroupid
在名为priceinvalization
的第二个表中存储剩余数据、price
和timestamp
。这背后的想法是,我有一个表,其中包含所有独特的产品,还有一个表,可以每天更新所有这些产品的价格+时间戳。然后可以使用productgroupid
对所有价格和时间戳进行分组
我尝试创建第二条SQL语句,但我不知道如何从SELECT
创建变量,以便使用结果插入到另一个表中
管道。py
import pymssql
class KrcPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
# This part works
sql_statement = f'''
BEGIN
IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']})
BEGIN
INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
OUTPUT (Inserted.productgroupid)
VALUES ({item['productid']}, '{item['category']}', '{item['name']}', '{item['description']}')
END
END
'''
# This part doesn't work :(
sql_statement2 = f'''
SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
INSERT INTO [kaercher].[dbo].[pricefluctuation_tb] (productgroupid, price, timestamp)
VALUES ( variable for the productgroupid? , {item['price']}, {item['timestamp']})
'''
self.cursor.execute(sql_statement)
self.cursor.execute(sql_statement2)
self.conn.commit()
return item
import scrapy
class KrcItem(scrapy.Item):
productid=scrapy.Field()
name=scrapy.Field()
description=scrapy.Field()
price=scrapy.Field()
producttype=scrapy.Field()
timestamp=scrapy.Field()
category=scrapy.Field()
pass
items.py
import pymssql
class KrcPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='DESKTOP-P1TF28R', user='sa', password='123', database='kaercher')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
# This part works
sql_statement = f'''
BEGIN
IF NOT EXISTS (SELECT * FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']})
BEGIN
INSERT INTO [kaercher].[dbo].[products_tb] (productid, category, name, description)
OUTPUT (Inserted.productgroupid)
VALUES ({item['productid']}, '{item['category']}', '{item['name']}', '{item['description']}')
END
END
'''
# This part doesn't work :(
sql_statement2 = f'''
SELECT productgroupid FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
INSERT INTO [kaercher].[dbo].[pricefluctuation_tb] (productgroupid, price, timestamp)
VALUES ( variable for the productgroupid? , {item['price']}, {item['timestamp']})
'''
self.cursor.execute(sql_statement)
self.cursor.execute(sql_statement2)
self.conn.commit()
return item
import scrapy
class KrcItem(scrapy.Item):
productid=scrapy.Field()
name=scrapy.Field()
description=scrapy.Field()
price=scrapy.Field()
producttype=scrapy.Field()
timestamp=scrapy.Field()
category=scrapy.Field()
pass
MSSQL中的数据库结构:
kaercher.db
- productgroupid(bigint)
- productid(int)
- 类别(nvarchar(100))
- 姓名(nvarchar(350))
- 说明(nvarchar(1000))
- productgroupid(bigint)
- 价格(浮动)
- 时间戳(int)
sql_statement2 = f'''
DECLARE @productgroupid INT;
SET @productgroupid = (
SELECT productgroupid
FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
);
INSERT INTO [kaercher].[dbo].[pricefluctuation_tb] (productgroupid, price, timestamp)
VALUES ( @productgroupid , {item['price']}, {item['timestamp']})
'''
这假设每个产品ID有一个产品组ID。尝试以下方法:
sql_statement2 = f'''
DECLARE @productgroupid INT;
SET @productgroupid = (
SELECT productgroupid
FROM [kaercher].[dbo].[products_tb]
WHERE productid = {item['productid']}
);
INSERT INTO [kaercher].[dbo].[pricefluctuation_tb] (productgroupid, price, timestamp)
VALUES ( @productgroupid , {item['price']}, {item['timestamp']})
'''
这假设每个产品ID有一个产品组ID。这似乎工作得很好,非常感谢!这似乎工作得很好,非常感谢!