使用python将长字符串存储到MySQL中

使用python将长字符串存储到MySQL中,python,mysql,python-3.x,scrapy,Python,Mysql,Python 3.x,Scrapy,这是我的计划的一部分。它很大,所以我不能把整个剧本放在这里,但我可以尽量把事情弄清楚。 python版本是3.6.2 我试图将带有表情符号的字符串存储到MySQL数据库中,这里是我的数据库模式 创建数据库: DATABASE = "CREATE DATABASE IF NOT EXISTS testdb DEFAULT CHARACTER SET 'utf8'" ALTER_DB = "ALTER SCHEMA `testdb` DEFAULT CHARACTER SET utf8mb4"

这是我的计划的一部分。它很大,所以我不能把整个剧本放在这里,但我可以尽量把事情弄清楚。 python版本是3.6.2

我试图将带有表情符号的字符串存储到MySQL数据库中,这里是我的数据库模式

创建数据库:

DATABASE = "CREATE DATABASE IF NOT EXISTS testdb DEFAULT CHARACTER SET 'utf8'"
ALTER_DB = "ALTER SCHEMA `testdb`  DEFAULT CHARACTER SET utf8mb4"
Alter数据库:

DATABASE = "CREATE DATABASE IF NOT EXISTS testdb DEFAULT CHARACTER SET 'utf8'"
ALTER_DB = "ALTER SCHEMA `testdb`  DEFAULT CHARACTER SET utf8mb4"
表replyes,线程,用户:

TABLES = {}
TABLES['replyes'] = (
    "CREATE TABLE IF NOT EXISTS `replyes` ("
    "  `reply_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `thread_name` TEXT NOT NULL,"
    "  `reply_text` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,"
    "  `replyer` varchar(30) NOT NULL,"
    "  `reply_reactions` int(5),"
    "  `reply_date` varchar(11) NOT NULL,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`reply_no`)"
    ") ENGINE=InnoDB")


TABLES['threads'] = (
    "CREATE TABLE IF NOT EXISTS `threads` ("
    "  `thread_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `topic_name` varchar(50) NOT NULL,"
    "  `group_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,"
    "  `thread_name` TEXT NOT NULL,"
    "  `thread_text` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,"
    "  `thread_starter` varchar(30) NOT NULL,"
    "  `thread_reactions` int(5),"
    "  `thread_replyes` int(5),"
    "  `thread_date` varchar(11) NOT NULL,"
    "  `thread_url`  varchar(150) NOT NULL,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`thread_no`)"
    ") ENGINE=InnoDB")

TABLES['users'] = (
    "  CREATE TABLE IF NOT EXISTS `users` ("
    "  `user_no` int(11) NOT NULL AUTO_INCREMENT,"
    "  `user_name` varchar(30) NOT NULL,"
    "  `user_posts` int(11),"
    "  `user_comments` int(11),"
    "  `visibility` varchar(8),"
    "  `user_location` varchar(30),"
    "  `user_since` varchar(30),"
    "  `groups` int(3),"
    "  `group_names` LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,"
    "  `group_urls` LONGTEXT,"
    "  `add_date` TIMESTAMP NOT NULL DEFAULT now(),"
    "  PRIMARY KEY (`user_no`)"
    ") ENGINE=InnoDB")
插入查询:

insert_replyes = """INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("{thread_name}","{reply_text}", "{replyer}", {reply_reactions}, "{reply_date}")"""
insert_thread = """INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("{topic_name}","{group_name}","{thread_name}","{thread_text}", "{thread_starter}",{thread_reactions},{thread_replyes}, "{thread_date}", "{thread_url}")"""
insert_user = """INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("{user_name}", {user_posts}, {user_comments}, "{visibility}", "{user_location}", "{user_since}", {groups}, "group_names}", "{group_urls}")"""
我正在插入具有各种字符的数据,如表情符号、特殊字符($,
\xc2\xa0~\xc2\xa0\xc2\xa0~\xc2\xa0\xc2\xa0\xc2\xa0
等)。 我为数据库和表尝试了许多字符集

整个过程都在进行中,有些数据没有存储到数据库中

工作数据:

# For replyes
INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Is this positive??!! Plz help!! ","b'I see it in both! Congratulations \n '", "Skymomof4", 1, "2018-04-26")


INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Sticky: Rules, Tools and Helpful Links. Updated with working photos","b'Thank you so much for ALL the info, you are amazing!!! '", "chandresteen", 0, "2018-04-26")

# For users
INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("chandresteen", 7, 605, "Public", "Coconut Creek,FL", "September 2013", 10, "['Big Kids', 'Getting Pregnant - Trying to Conceive ', 'High-Tech Methods for Getting Pregnant - IVF, ICSI, FET', 'May 2015 Birth Club', 'November 2018 Birth Club', 'Preschoolers', 'Soy Isoflavones, Clomid, Vitex & Femara Girls!', 'Toddlers', 'Trying to Conceive Community', 'TTC/Pregnancy South Africa']", "['https://community.babycenter.com/groups/a155/big_kids', 'https://community.babycenter.com/groups/a6720413/getting_pregnant_-_trying_to_conceive', 'https://community.babycenter.com/groups/a696465/high-tech_methods_for_getting_pregnant_-_ivf_icsi_fet', 'https://community.babycenter.com/groups/a6748015/may_2015_birth_club', 'https://community.babycenter.com/groups/a6768388/november-2018-birth-club', 'https://community.babycenter.com/groups/a145/preschoolers', 'https://community.babycenter.com/groups/a6731007/soy_isoflavones_clomid_vitex_femara_girls', 'https://community.babycenter.com/groups/a135/toddlers', 'https://community.babycenter.com/groups/a43905/trying_to_conceive_community', 'https://community.babycenter.com/groups/a6758887/ttcpregnancy_south_africa']")

# For threads
INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("Getting Pregnant","b'Getting Pregnant - Trying to Conceive '","Sticky: Rules, Tools and Helpful Links. Updated with working photos","b'Welcome to the BBC   Group Getting Pregnant! Below are some links to some threads that would help some of you lovely ladies! Good Luck on your TTC   journey. \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 ~ \xc2\xa0 \xc2\xa0 \xc2\xa0 If there is something not here that you would like to see please feel free to message me. Sorry that the list is so short! Babydust ladies!! '", "~Dovah~",59,6, "2018-04-26", "https://community.babycenter.com/post/a60866779/sticky_rules_tools_and_helpful_links._updated_with_working_photos")
以上是在数据库中实际运行的

非工作数据

# For threads
INSERT INTO threads(topic_name, group_name, thread_name,thread_text, thread_starter, thread_reactions, thread_replyes, thread_date,thread_url) values("Getting Pregnant","b'Getting Pregnant - Trying to Conceive '","Struggling with TTC Process","b"This has officially turned into more of a supportive post surrounding the TTC   process when its taking longer. It's been sooooo difficult trying to go through this the past 1.5 yrs... waiting to finally get our BFP   while others come and go with their journies. OP:\xc2\xa0 Hey ladies.... any input would be greatly appreciated. Hubby and I have been TTC   going on month 15, I think about 20 cycles as mine are shorter. I've been to an RE   and slowly have been getting everything done to see if anything is wrong.... nothing is so far. All that's left is an HSG test. Even hubby has super sperm... yet we're not pregnant. The Dr's office wants to put me on birth control for a week or so and start me on Clomid then do an injection for ovulation followed by IUI. This all feels so quick and sudden that I'm just all over the place with whether to go ahead and I need to let them know basically tomorrow! (No pressure!) My struggle is that a part of me feels that if God wanted it, it would have happened already.... so is doing something like IUI   going against God? I'm so scared that if i do, the baby will have problems and I don't know if I can handle that. On the other hand, I don't want to wait years and years for another baby! It crushes me every time someone else is pregnant and I'm still longing for mine. Thoughts??? "", "PaoPao820",6,50, "2018-04-26", "https://community.babycenter.com/post/a68579914/struggling-with-ttc-process")

# For replyes
INSERT INTO replyes(thread_name, reply_text, replyer,reply_reactions, reply_date) values("Is this positive??!! Plz help!! ","b'Thank you!! I took another this morning and it looks exactly the same, not any darker so fingers crossed \xf0\x9f\xa4\x9e\xf0\x9f\x8f\xbb\xf0\x9f\xa4\x9e\xf0\x9f\x8f\xbb '", "BeBeNBabY", 0, "2018-04-26")

# For user
INSERT INTO users(user_name, user_posts, user_comments,visibility, user_location, user_since, groups, group_names, group_urls) values("~Dovah~", 544, 12988, "Public", "Whiterun, Skyrim", "February 2012", 3, "['Crocheting Mamas', 'Getting Pregnant - Trying to Conceive ', "Getting Pregnant's GOT PREGNANT!"]", "['https://community.babycenter.com/groups/a90405/crocheting_mamas', 'https://community.babycenter.com/groups/a6720413/getting_pregnant_-_trying_to_conceive', 'https://community.babycenter.com/groups/a6762598/getting_pregnants_got_pregnant']")
这就是我如何从零散项创建插入查询的方法。所有要存储和创建插入语句的代码都是在项目的pipline下的piplines.py中编写的 要存储的代码:

import mysql.connector as sql

config = {
    'user': 'root',
    'password': 'root',
    'host': '127.0.0.1',
    'charset': "utf8",
    'use_unicode': True,
}
connection = sql.connect(**config)

curser = connection.cursor()

string = insert_thread.format(
    topic_name=item['topic'],
    group_name=item['group'],
    thread_name=item['name'],
    thread_text=item['text'],
    thread_starter=item['starter'],
    thread_reactions=item['reactions'],
    thread_replyes=item['replyes'],
    thread_date=item['date'],
    thread_url=item['url']
)
cursor.execute(string)
connection.commit()
所有不工作的数据都会产生如下所示的错误

    Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 43, in process_item
    self.insert_user(item, spider)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 100, in insert_user
    self.insert(string)
  File "/media/mnthn/work/Office/2018/james_mcallister/babycenterforum/babycenterforum/pipelines.py", line 132, in insert
    self.cursor.execute(string.replace('\n',''))
  File "/usr/lib/python3/dist-packages/mysql/connector/cursor.py", line 566, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 537, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/usr/lib/python3/dist-packages/mysql/connector/connection.py", line 436, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Getting Pregnant's GOT PREGNANT!"]", "['https://community.babycenter.com/groups/' at line 1

我尝试过很多次我尝试过使用python shell手动输入数据,我在互联网上搜索了一整天我一直在尝试解决这个问题大约四天了,但实际上什么都没用

很明显,它来自
“怀孕就是怀孕!”“
正如错误所说,您的报价可能没有正确填写。另外,您需要在SQL中转义单引号,这是通过将它们加倍来完成的,因此应该是
怀孕的
,它将打印为
怀孕的
。转义也可以在简单的引号中使用,因此您可以编写:
“怀孕”就是怀孕了
我认为将您的行更改为
“[‘钩针妈妈’、‘怀孕-尝试怀孕’、‘怀孕’是怀孕了!””
应该可以。这是一个小细节,但我很好奇为什么您的用户名需要4GB3我这样做是为了测试,也许它解决了问题,因为我看到了大量文本的细节。这是第一次作为MEDIUMTEXT进行sat,我也更新回了MEDIUMTEXT。谢谢你的解决方案有点有效,但它需要更多的努力和不同的逻辑,但你的解决方案暗示我,如果我在发布后检查我自己的问题,我可以确定这一点