如何使用Python将csv文件中的datetime对象拆分为30分钟数组,并将每个数组导出为新的csv文件
我有一个10 Hz.csv数据文件,其日期/时间戳在同一列中,格式如下:2014-07-05 12:01:00.0至2014-07-06 12:00:59.9,作为文件中的一天数据。我需要在24小时内(每天48小时)12:01:00.0-12:29:59.9、12:30:00.0-12:59:59.9,等等,分成30分钟的区块。最好将每个分割的30分钟块导出为其自己的文本文件。我对python非常陌生(第3天),我正在尽我最大的努力,但我的车轮在旋转。我正在实习,真的需要帮助。我不是一个程序员,只是一个试图用python来完成这项任务的化学家。我尝试按行分割(30分钟内的行数=18000),但由于我的数据在12:00:01开始时并不均匀,这会打乱我的行分割计算,也不会给出准确的30分钟分割。我被告知需要一个datetime对象来转换为字符串。任何指导或帮助都将不胜感激。提前谢谢。下面是我的内容,并开始为datetime对象修改它,但确实需要一些指导:如何使用Python将csv文件中的datetime对象拆分为30分钟数组,并将每个数组导出为新的csv文件,python,parsing,python-2.7,datetime,csv,Python,Parsing,Python 2.7,Datetime,Csv,我有一个10 Hz.csv数据文件,其日期/时间戳在同一列中,格式如下:2014-07-05 12:01:00.0至2014-07-06 12:00:59.9,作为文件中的一天数据。我需要在24小时内(每天48小时)12:01:00.0-12:29:59.9、12:30:00.0-12:59:59.9,等等,分成30分钟的区块。最好将每个分割的30分钟块导出为其自己的文本文件。我对python非常陌生(第3天),我正在尽我最大的努力,但我的车轮在旋转。我正在实习,真的需要帮助。我不是一个程序员,
import csv
import re
import os
import datetime
import numpy as np
filename = 'C:\Users\Jason\Documents\Flux Data Files\HL14_175.csv'
f = open('C:\Users\Jason\Documents\Flux Data Files\Output Flux Split 30 mins Data Files\HL14_175_split0.csv','wb')
writer = csv.writer(f,delimiter = ',')
with open(filename,"r") as datafile:
r = csv.reader(datafile,delimiter = ",")
timestamp = datetime.datetime.strptime("2014-07-05", "%Y-%m-%d %H:%M:%S:%f")
recordnumber = []
sonic1 = []
sonic2 = []
sonic3 = []
temperature = []
for row in r:
timestamp.append((row[0]))
recordnumber.append(float(row[1]))
sonic1.append(float(row[2]))
sonic2.append(float(row[3]))
sonic3.append(float(row[4]))
temperature.append(float(row[5]))
timestamp = np.array(timestamp)
recordnumber = np.array(recordnumber)
sonic1 = np.array(sonic1)
sonic2 = np.array(sonic2)
sonic3 = np.array(sonic3)
temperature = np.array(temperature)
datetime.strptime(date_string, format)
#row_count = 863998
row_count = sum(1 for row in csv.reader(open(filename)))
lines = row_count/18001.0
timestamp_split = np.array_split(timestamp,lines)
recordnumber_split = np.array_split(recordnumber,lines)
sonic1_split = np.array_split(sonic1,lines)
sonic2_split = np.array_split(sonic2,lines)
sonic3_split = np.array_split(sonic3,lines)
temperature_split = np.array_split(temperature,lines)
dataout = np.column_stack((timestamp_split[0],recordnumber_split[0],sonic1_split[0],sonic2_split[0],sonic3_split[0],temperature_split[0]))
writer.writerows(dataout)
f.close()
print('Flux Data Split Complete')
以下是一个示例数据文件:
6/24/2014 0:01,3583014,-59,-62,-9,296.51
01:00.1,3583015,-69,-68,16,296.54
01:00.2,3583016,-62,-59,36,296.56
01:00.3,3583017,-77,-45,26,296.56
01:00.4,3583018,-47,-50,36,296.56
01:00.5,3583019,-48,-70,27,296.51
01:00.6,3583020,-71,-60,28,296.54
01:00.7,3583021,-69,-73,24,296.52
01:00.8,3583022,-61,-69,15,296.49
01:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:01,3583024,-65,-42,-5,296.56
01:01.1,3583025,-71,-33,-11,296.56
这不是完整的解决方案,因为正确的日期转换仍然存在问题 我使用文本数据模拟从csv读取 我是熊猫的初学者,所以可能有人能把它做得更好
import pandas as pd
import StringIO
data = '''6/24/2014 0:01,3583014,-59,-62,-9,296.51
01:00.1,3583015,-69,-68,16,296.54
01:00.2,3583016,-62,-59,36,296.56
01:00.3,3583017,-77,-45,26,296.56
01:00.4,3583018,-47,-50,36,296.56
01:00.5,3583019,-48,-70,27,296.51
01:00.6,3583020,-71,-60,28,296.54
01:00.7,3583021,-69,-73,24,296.52
01:00.8,3583022,-61,-69,15,296.49
01:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:01,3583024,-65,-42,-5,296.56
01:01.1,3583025,-71,-33,-11,296.56
6/24/2014 0:31,3583014,-59,-62,-9,296.51
31:00.1,3583015,-69,-68,16,296.54
31:00.2,3583016,-62,-59,36,296.56
31:00.3,3583017,-77,-45,26,296.56
31:00.4,3583018,-47,-50,36,296.56
31:00.5,3583019,-48,-70,27,296.51
31:00.6,3583020,-71,-60,28,296.54
31:00.7,3583021,-69,-73,24,296.52
31:00.8,3583022,-61,-69,15,296.49
31:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:31,3583024,-65,-42,-5,296.56
31:01.1,3583025,-71,-33,-11,296.56'''
# reading from CSV
df = pd.DataFrame.from_csv(StringIO.StringIO(data), index_col=None, header=None)
#print df
# converting "wierd" date format - still can be problem
date = None
minut = None
second = 59
def change_date(line):
global date, minut, second
a = line.split(':')
if len(a[0]) > 2:
if a[0] != date or a[1] != minut:
second = 59
date = a[0]
minut = a[1]
second = (second + 1) % 60
return "%s:%02d.0" % (line, second)
#return line
else:
return date + ":" + line
df[0] = df[0].map(change_date)
#print df
#print df.dtypes
# converting string with date and time to object datetime
df[0] = pd.DatetimeIndex(df[0])
#print df.dtypes
# groub by date (year,month,day,hour) and minute (minute<30)
g = df.groupby( df[0].map(lambda t:(t.strftime("%Y_%m_%d_%H_") + ("00" if t.minute<30 else "30") )) )
# print groups
for name, group in g:
print 'name:', name
print group
group.to_csv(name + ".csv") # write groups to files
请原谅我的无知,什么是10Hz CSV文件?我猜它每秒记录数据十次。添加示例数据。也许您应该阅读模块
pandas
。是的,10Hz以每秒10次的速率记录数据。我将添加一个数据文件,以便您可以看到我引用的内容。第一列具有strage格式。您如何识别该列中的日期和时间?
name: 2014_01_24_00_00
0 1 2 3 4 5
0 2014-06-24 00:01:01 3583014 -59 -62 -9 296.51
1 2014-06-24 00:01:00.100000 3583015 -69 -68 16 296.54
2 2014-06-24 00:01:00.200000 3583016 -62 -59 36 296.56
3 2014-06-24 00:01:00.300000 3583017 -77 -45 26 296.56
4 2014-06-24 00:01:00.400000 3583018 -47 -50 36 296.56
5 2014-06-24 00:01:00.500000 3583019 -48 -70 27 296.51
6 2014-06-24 00:01:00.600000 3583020 -71 -60 28 296.54
7 2014-06-24 00:01:00.700000 3583021 -69 -73 24 296.52
8 2014-06-24 00:01:00.800000 3583022 -61 -69 15 296.49
9 2014-06-24 00:01:00.900000 3583023 -56 -68 8 296.52
10 2014-06-24 00:01:02 3583024 -65 -42 -5 296.56
11 2014-06-24 00:01:01.100000 3583025 -71 -33 -11 296.56
name: 2014_01_24_00_30
0 1 2 3 4 5
12 2014-06-24 00:31:03 3583014 -59 -62 -9 296.51
13 2014-06-24 00:31:00.100000 3583015 -69 -68 16 296.54
14 2014-06-24 00:31:00.200000 3583016 -62 -59 36 296.56
15 2014-06-24 00:31:00.300000 3583017 -77 -45 26 296.56
16 2014-06-24 00:31:00.400000 3583018 -47 -50 36 296.56
17 2014-06-24 00:31:00.500000 3583019 -48 -70 27 296.51
18 2014-06-24 00:31:00.600000 3583020 -71 -60 28 296.54
19 2014-06-24 00:31:00.700000 3583021 -69 -73 24 296.52
20 2014-06-24 00:31:00.800000 3583022 -61 -69 15 296.49
21 2014-06-24 00:31:00.900000 3583023 -56 -68 8 296.52
22 2014-06-24 00:31:04 3583024 -65 -42 -5 296.56
23 2014-06-24 00:31:01.100000 3583025 -71 -33 -11 296.56