Python 优化读取非常大的csv并将其写入SQLite_Python_Sqlite_Csv_Bigdata - Fatal编程技术网

Python 优化读取非常大的csv并将其写入SQLite

python sqlite csv

Python 优化读取非常大的csv并将其写入SQLite,python,sqlite,csv,bigdata,Python,Sqlite,Csv,Bigdata,我有一个10gb的csv文件，其中包含用户ID和性别，有时会重复 userID,gender 372,f 37261,m 23,m 4725,f ... 以下是我导入csv并将其写入SQLite数据库的代码： import sqlite3 import csv path = 'genders.csv' user_table = 'Users' conn = sqlite3.connect('db.sqlite') cur = conn.cursor() cur.execute(f'''

我有一个10gb的csv文件，其中包含用户ID和性别，有时会重复

userID,gender
372,f
37261,m
23,m
4725,f
...

以下是我导入csv并将其写入SQLite数据库的代码：

import sqlite3
import csv


path = 'genders.csv'
user_table = 'Users'

conn = sqlite3.connect('db.sqlite')
cur = conn.cursor()

cur.execute(f'''DROP TABLE IF EXISTS {user_table}''')

cur.execute(f'''CREATE TABLE {user_table} (
            userID INTEGER NOT NULL, 
            gender INTEGER,
            PRIMARY KEY (userID))''')

with open(path) as csvfile:
    datareader = csv.reader(csvfile)
    # skip header        
    next(datareader, None)
    for counter, line in enumerate(datareader):
        # change gender string to integer
        line[1] = 1 if line[1] == 'f' else 0

        cur.execute(f'''INSERT OR IGNORE INTO {user_table} (userID, gender) 
                    VALUES ({int(line[0])}, {int(line[1])})''')

conn.commit()
conn.close()

现在，处理1MB文件需要10秒钟（实际上，我有更多的列，也创建了更多的表）。

我不认为可以使用pd.to_sql，因为我想要一个主键。

而不是使用

游标。对每一行执行，使用游标。executemany
并一次插入所有数据
以格式存储您的值\u list=[（a，b，c…）（a2，b2，c2…）（a3，b3，c3…）

cursor.executemany('''INSERT OR IGNORE INTO {user_table} (userID, gender,...) 
                    VALUES (?,?,...)''',(_list))
conn.commit()

信息：
该列表不是比我的RAM大吗？您可以使用计数器分解列表，然后执行多个ExecuteManyTank。它的运行速度仍然比pd.to_sql慢5倍以上。正确的方法是为executemany（）
提供一个迭代器，用于动态读取和转换CSV中的数据。您能提供一个更详细的答案吗？非常感谢。




[sqlite]相关文章推荐



                                                        
Sqlite 修复csv文件中的数据
sqlitecsv 
通过sqlite3选择列中至少有X行具有某些值的行？
sqlite 
如果我'；如果我在Sqlite中的列中创建日期/日期时间，是否将其设置为除'；文本'；？
sqlitedatetimetypes 
sqlite不工作
sqlitephp 
如何配置sqlite以默认显示标题
sqlite 
Scrapy SQLite3错误？
sqlitescrapy 
合并两个sqlite语句
sqlite 
Sqlite 在不可为空的列中忽略默认值
sqlitesqlalchemy 
SQLite未正确导入文本文档
sqlitetextimport 
高级SQLite3选择查询
sqlite 
SQLite将毫秒转换为HH:MM:SS
sqlite 
Sqlite 返回特定日期的记录
sqlite 
Sqlite 如何将数据库发送到android studio中的其他活动
sqliteandroid-studio 
如何在SQLite中将UTC时间转换为PST
sqlite 
Sqlite React本机函数不返回任何内容
sqlitereact-native 
使用SQLite触发器记录更新中更改的字段值
sqlite 
SQlite在一个具体的查询上运行缓慢
sqlite 
如何"；“重新编译”；具有更高附加限制的Sqlite
sqlitejdbc 
如何在UIPath中创建到SQLite DB的连接
sqliteautomation 
Sqlite 根据从ViewModel输入的多个数据项执行计算
sqliteandroid-studiokotlin 
                                       





随机文章推荐



                                                        
Ios UITableView选定行加载新视图携带变量
iosobjective-cxcode 
Ios SKDownloadStateFailed除了显示警报之外，还需要做什么？
iosobjective-c 
Ios 隐藏/显示&；启用/禁用按钮？
iosobjective-c 
Ios 每年特定日期的本地火灾通知
ios 
Ios 核心数据是否在保存时自动更新多对多关系
iosobjective-csqlitecore-data 
IOS函数处理程序线程安全
iosobjective-c 
ios检查文本对于uiTextfield是否太大
ios 
Ios SKPhysicsBody球获胜'；如果脉冲太小，则不会反弹
iosswiftsprite-kit 
如何为iOS创建MDM注册配置文件
iosios8 
iOS Swift iAlertView
iosxcodeswift 
Ios 如何使用Swift通过协议注入依赖关系？
iosswiftdependency-injectionswift2 
Ios 我是否必须在UIScrollView中添加视图才能滚动这些视图？
iosiphone 
Ios PKPaymentAuthorizationViewController-如何输入电子邮件&；用户提供的ph值？
iosobjective-c 
Ios Podspec和项目版本
ios 
Ios 如何删除CATTransition中的淡入淡出动画
iosswiftanimation 
iOS swift中控件按百分比的堆栈视图排列
iosswift 
Ios UITableView将底部的单元格置于视图层次结构的前面
iosuitableview 
Ios 取消以前的性能请求在Swift 3.0中似乎不起作用
iosswift3 
Ios 通过segue将图像从集合视图单元格传递到另一个视图控制器
iosswift 
Ios 不带导航视图控制器的“显示后退”按钮
iosswift


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python正在目录中创建不需要的文件夹
									Python
							 
Python 文本挖掘：何时使用解析器、标记器、工具？
									Python
							 									Nlp
							 
google appengine python的调试器
									Python
							 									Debugging
							 									Google App Engine
							 
什么'；在Python中，为非常大的图像集合生成唯一的哈希键是一个好策略吗？
									Python
							 									Hash
							 
如何在Python中获得手动锁定机制的进程PID？
									Python
							 									Linux
							 
使用Python2.x将html源内容转换为可读格式
									Python
							 									Html
							 
matplotlib plot、python的错误
									Python
							 									Matplotlib
							 
如何从简单的Python服务器打印到html页面？
									Python
							 									Json
							 									Websocket
							 
Python 如何在json文档中的单行shell脚本中生成一个大shell脚本？
									Python
							 									Linux
							 									Bash
							 									Shell
							 
Python 什么是；iterable[键]”的缩写；在方法签名中是什么意思？
									Python
							 
Python 更改matplotlib中的次要记号颜色
									Python
							 									Matplotlib
							 
Python 硒中的旋转代理
									Python
							 									Firefox
							 									Selenium
							 									Selenium Webdriver
							 									Scrapy
							 
Python 如果邻居相等，则合并已排序的列表成员列表
									Python
							 
Python正则表达式findall交替行为
									Python
							 									Regex
							 
Python windows中的pyaudio安装错误
									Python
							 									Windows
							 									Python 2.7
							 
Python 为什么Symphy认为只有实变量的函数是复杂的？
									Python
							 
Python For循环中的Matplotlib图例
									Python
							 									Matplotlib
							 
在python中读取并行netCDF CDF-5格式数据
									Python
							 
Python 我怎么能从dict打电话给missing
									Python
							 									Oop
							 									Python 3.x
							 									Dictionary
							 
Python 如何从URL列表创建Dask数据帧？
									Python
							 									Pandas
							 									Dask
							 
使用Python中的Selenium下载Chrome无头文件
									Python
							 									Google Chrome
							 									Selenium
							 
如何将数据附加到python和beutifulsoup中循环生成的数据帧中
									Python
							 									Python 3.x
							 
Python 以大熊猫的形式并排显示两个数据帧
									Python
							 									Pandas
							 									Dataframe
							 
Python 使用scipy进行条件曲线拟合？
									Python
							 
在Python中查找列表的大小/内部结构
									Python
							 
Python 是一个金字塔；模型"；也是一个金字塔；资源；？
									Python
							 									Python 3.x
							 
Python ProxyError导致url超过最大重试次数：/
									Python
							 									Web Scraping
							 
python中COM对象的正确类型提示是什么？
									Python
							 									Com
							 
我必须在python源代码中嵌入代码版本，有什么实际原因吗？
									Python
							 
Python 是否可以在一个DICITORY内使用两个（非嵌套）循环？
									Python
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Migration
Google Maps
Vmware
Outlook
Actionscript 3
Cloud
Teradata
Openlayers 3
Single Sign On
Laravel 4
Gdb
Qt
Struct
Asp.net Core
Web Applications
Amazon Ec2
Dns
File Upload
Teamcity
Javascript
Ssl
Drools
Applescript
Clang
Azure Active Directory
Swagger
Routes
C# 4.0
Knockout.js
Login
Db2
Enums
Synchronization
Time Complexity
Unit Testing
Pascal
Coding Style
Weblogic
Linq
Localization
Ruby On Rails 4
Osgi
Jasper Reports
Microsoft Graph Api
Bazel
Android Ndk
Encoding
Spring Boot
Notepad++
Protocol Buffers
Sharepoint 2007
Youtube Api
Yii2
Mips
Tcl
Build
File Io
Oauth
Floating Point
Couchbase
Binding
Jdbc
Doctrine Orm
Scikit Learn
Selenium
Mpi
Data Structures
Silverlight 4.0
Visual Studio
Arm
Bootstrap 4
Random
Playframework
Ssrs 2008
Yocto
Indexing
Webview
Windows Mobile
Multithreading
Signalr
Orm
Drupal 7
Material Ui
Zurb Foundation
Syntax
Windows Installer
.net Core
Sbt
Jersey
E Commerce
Windows Phone 7
Microservices
Primefaces
Monitoring
Forms
Passwords
Animation
Bash
Typescript
Frameworks
Eclipse Plugin
Internet Explorer
Django
Amazon Web Services
Phpmyadmin
Process
Apache Flink
Zsh
Amp Html
Libgdx
Raspberry Pi
Google Colaboratory
Python Sphinx
Stream
Dependency Injection
Sprite Kit
Cypress
Ipad
Jhipster
Parse Platform
Tensorflow
Google Calendar Api
Stored Procedures
Firebase
Nunit
Yii
Kdb
Karate
Responsive Design
String
Hyperledger Fabric
Lua
Telerik
Cluster Computing
Phantomjs
Joomla
Instagram
Polymer
Image Processing
Mule
Debugging
Azure Cosmosdb
Angular6
Listview
Chart.js
Sql Server 2012
Uml
Mapbox
Npm
Utf 8
Dll
Svg
Groovy
Sql Server 2008
Excel
Anaconda
Amazon Redshift
Cmd
Firefox
Select
Redis
Automated Tests
Snowflake Cloud Data Platform
Extjs4
Cygwin
Asp.net Mvc 5
Z3
Kernel
Internet Explorer 8
Resharper
Botframework
Cloud Foundry
Identityserver4
Autohotkey
Layout
Graphics
Sql Server 2008 R2
Subsonic
Xslt
Breeze
Memory Management
Maven
Iis
Events
Design Patterns
Sitecore
Formatting
Makefile
Grid
Tabs
Mobile
Ms Word
Jenkins
Hive
Hbase
Vuejs2
Usb
Gnuplot
Ibm Midrange
Ios6
Video


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网