将Netcdf优化为Python代码_Python_Postgresql_Postgis_Netcdf

将Netcdf优化为Python代码

python postgresql

将Netcdf优化为Python代码,python,postgresql,postgis,netcdf,Python,Postgresql,Postgis,Netcdf,我有一个python到sql脚本，它读取netcdf文件并将气候数据插入到postgresql表中，每次插入一行。这当然需要很长时间，现在我想弄清楚如何优化这段代码。我一直在考虑制作一个巨大的列表，然后使用copy命令。然而，我不确定人们将如何解决这个问题。另一种方法可能是写入csv文件，然后使用Postgresql中的copy命令将此csv文件复制到postgres数据库。我想这比一次插入一行要快如果您对如何优化这一点有任何建议，那么我将非常感激。netcdf文件在此处可用（但需要注册）

我有一个python到sql脚本，它读取netcdf文件并将气候数据插入到postgresql表中，每次插入一行。这当然需要很长时间，现在我想弄清楚如何优化这段代码。我一直在考虑制作一个巨大的列表，然后使用copy命令。然而，我不确定人们将如何解决这个问题。另一种方法可能是写入csv文件，然后使用Postgresql中的copy命令将此csv文件复制到postgres数据库。我想这比一次插入一行要快

如果您对如何优化这一点有任何建议，那么我将非常感激。netcdf文件在此处可用（但需要注册）：

#NetCDF到PostGreSQL数据库
#CRU-TS 3.21降水和温度数据。从NetCDF到数据库表
#需要Python2.6、Postgresql、Psycopg2、Scipy
#使用Vista 64位进行测试。
#导入模块
导入psycopg2、时间、日期时间
从scipy.io导入netcdf
#建立联系
db1=psycopg2.connect（“主机=192.168.1.162 dbname=dbname用户=username密码=password”）
cur=db1.cursor（）
###创建表
打印str（time.ctime（））+“正在创建精度表。”
cur.execute（“如果存在DROP TABLE precip；”）
cur.execute（“创建表精度（gid串行主键不为null、年整数、月整数、长十进制、纬度十进制、十进制前）；”）
###读取netcdf文件
f=netcdf.netcdf_文件（'/home/username/output/project_v2/inputdata/determination/cru_ts3.21.1901.2012.pre.dat.nc'，r'）
##
###创建lathash
print str（time.ctime（））+“通过lat坐标循环。”
temp=f.variables['lat'].data.tolist（）
lathash={}
对于temp中的输入：
打印str（条目）
lathash[临时索引（条目）]=条目
##
###创建lonhash
print str（time.ctime（））+“在长坐标中循环。”
temp=f.variables['lon'].data.tolist（）
lonhash={}
对于temp中的输入：
打印str（条目）
lonhash[临时索引（条目）]=条目
##
###通过每一个观察循环。设置时间维度、横向和纵向观察值。
对于X范围内的_个月（1344）：
如果_月<528：
打印（str（_月））
打印（“尚未”）
其他：
今年=整数（（_月）/12+1901）
本月=（_月）%12）+1
thisdate=datetime.date（今年、本月、1）
打印（str（本日期））
_时间=整数（_个月）
对于X范围内的_lon（720）：
对于X范围内的_lat（360）：
数据=[int（今年），int（本月），lonhash[_lon]，lathash[_lat]，f.变量[（'pre'）]。数据[_time，_lat，_lon]]
cur.execute（“插入precip（年、月、长、纬度、前）值”+str（元组（数据））+”；）
db1.commit（）
当前执行（“使用btree（年、月、长、纬度、前）在precip上创建索引idx_precip”）
当前执行（“更改表格精度添加列几何图形；”）
当前执行（“更新精度集geom=ST_设置网格（ST_点（lon，lat），4326）；”）
当前执行（“使用gist（geom）在precip上创建索引idx_precip_geom”）
db1.commit（）
当前关闭（）
db1.close（）
打印str（time.ctime（））+“完成！”

它需要一个类似文件的对象，但可以是您自己的类，该类读取和处理输入文件，并通过

read（）

和

readlines（）

方法按需返回该文件

如果您对这样做没有信心，您可以（如您所说）生成一个CSV临时文件，然后

复制该文件。为了获得最佳性能，您需要生成CSV（Python的CSV
模块非常有用），然后将其复制到服务器，并使用服务器端将表从“/local/path/to/file”
”复制，从而避免任何网络开销
大多数情况下，使用copy。。。通过psql的\copy
或psycopg2的copy\u从stdin
复制，速度足够快。特别是如果您通过Python的多处理
模块（没有听起来那么复杂）将其与生产者/消费者馈送相结合，那么您解析输入的代码就不会在数据库写入行时被卡住
有关加快批量加载的更多建议，请参见-但我可以看到，您至少已经在做一些正确的事情，比如在末尾创建索引并将工作批处理到事务中。
我也有类似的要求，我将Numpy数组重写为PostgreSQL二进制输入文件格式。主要缺点是需要插入目标表的所有列，如果需要对几何体WKB进行编码，这会变得很棘手。但是，您可以使用临时未标记表将netCDF文件加载到，然后将该数据选择到另一个具有适当几何体类型的表中
此处的详细信息：除了Craig的回答之外：使用最新的PostGIS版本，您可以使用地理类型，并直接在表定义中使用它，并将lon/lat对加载到插入的一个点中。地理类型也适用于任何SRID4326数据，因为它具有大地测量意识（例如，在计算距离时很方便）。您的idx_precip指数应仅以年和月为单位；您永远不会按年份、月份和位置查询降水量值。如果您想按月查询降水量值，请为其创建单独的索引。一般来说：先定义查询，然后定义索引。netcdf文件需要登录，但老实说，我只是好奇，没有什么有用的东西可以添加到Craig的答案中。
# NetCDF to PostGreSQL database
# CRU-TS 3.21 precipitation and temperature data. From NetCDF to database table
# Requires Python2.6, Postgresql, Psycopg2, Scipy
# Tested using Vista 64bit.

# Import modules
import psycopg2, time, datetime
from scipy.io import netcdf

# Establish connection
db1 = psycopg2.connect("host=192.168.1.162 dbname=dbname user=username password=password")
cur = db1.cursor()
### Create Table
print str(time.ctime())+ " Creating precip table."
cur.execute("DROP TABLE IF EXISTS precip;")
cur.execute("CREATE TABLE precip (gid serial PRIMARY KEY not null, year int, month int, lon decimal, lat decimal, pre decimal);")

### Read netcdf file
f = netcdf.netcdf_file('/home/username/output/project_v2/inputdata/precipitation/cru_ts3.21.1901.2012.pre.dat.nc', 'r')
##
### Create lathash
print str(time.ctime())+ " Looping through lat coords."
temp = f.variables['lat'].data.tolist()
lathash = {}
for entry in temp:
    print str(entry)
    lathash[temp.index(entry)] = entry
##
### Create lonhash
print str(time.ctime())+ " Looping through long coords."
temp = f.variables['lon'].data.tolist()
lonhash = {}
for entry in temp:
    print str(entry)
    lonhash[temp.index(entry)] = entry
##
### Loop through every observation. Set timedimension and lat and long observations.
for _month in xrange(1344):

    if _month < 528:
        print(str(_month))
        print("Not yet")
    else:
        thisyear = int((_month)/12+1901)
        thismonth = ((_month) % 12)+1
        thisdate = datetime.date(thisyear,thismonth, 1)
        print(str(thisdate))
        _time = int(_month)
        for _lon in xrange(720):
            for _lat in xrange(360):
                data = [int(thisyear), int(thismonth), lonhash[_lon], lathash[_lat], f.variables[('pre')].data[_time, _lat, _lon]]
                cur.execute("INSERT INTO precip (year, month, lon, lat, pre) VALUES "+str(tuple(data))+";")


db1.commit()
cur.execute("CREATE INDEX idx_precip ON precip USING btree(year, month, lon, lat, pre);")
cur.execute("ALTER TABLE precip ADD COLUMN geom geometry;")
cur.execute("UPDATE precip SET geom = ST_SetSRID(ST_Point(lon,lat), 4326);")
cur.execute("CREATE INDEX idx_precip_geom ON precip USING gist(geom);")


db1.commit()
cur.close()
db1.close()            
print str(time.ctime())+ " Done!"