python中按半小时间隔对文件列表进行分组
我使用的是python 2.7, 我的文件夹中有一个文件列表,有数千个文件,如下所示:python中按半小时间隔对文件列表进行分组,python,pandas,list,dataframe,group-by,Python,Pandas,List,Dataframe,Group By,我使用的是python 2.7, 我的文件夹中有一个文件列表,有数千个文件,如下所示: 20180828-024308.dat 20180828-024434.dat 20180828-030335.dat 20180828-032114.dat 20180828-040041.dat .......... 1: [20180828-024308.dat,20180828-024434.dat] 2: [20180828-030335.dat,20180828-032114.dat] 3: [
20180828-024308.dat
20180828-024434.dat
20180828-030335.dat
20180828-032114.dat
20180828-040041.dat
..........
1: [20180828-024308.dat,20180828-024434.dat]
2: [20180828-030335.dat,20180828-032114.dat]
3: [20180828-040041.dat,....]
.......
它们是年、月、日期、小时、分钟和秒
我想将所有这些文件分组为半小时间隔(注意:年、月和日不改变)
我想要这样的东西:
20180828-024308.dat
20180828-024434.dat
20180828-030335.dat
20180828-032114.dat
20180828-040041.dat
..........
1: [20180828-024308.dat,20180828-024434.dat]
2: [20180828-030335.dat,20180828-032114.dat]
3: [20180828-040041.dat,....]
.......
我认为列表对我来说很好,或者数据框对我来说也不错
谢谢你的帮助 我认为您也可以通过基本编程来实现这一点。 因此,首先使用os库加载所有文件,然后使用python获取文件列表。 这是我想说的一小段话
import os
folderPath = '/somepath'
filesInFolder = os.listdir(folderPath)
mapOfsimmilarFiles = {}
keyForMaps = 0
for fileNames in sorted(filesInFoldeyr):
timePartOfFile = fileNames.split('-')[-1].split('.dat')[0]
hr = timePartOfFile[0:2]
min = timePartOfFile[2:4]
sec = timePartOfFile[4:]
if len(mapOfsimmilarFiles.keys()) == 0:
mapOfsimmilarFiles[hr+'_'+min] = [fileNames]
else:
minsPresentInMaps = mapOfsimmilarFiles.keys()
hrPresent = [int(h.split('_')[0]) for h in mapOfsimmilarFiles]
minPresent = [(h.split('_')[1]) for h in mapOfsimmilarFiles]
for timeUsed in minsPresentInMaps:
hrPresent = timeUsed.split('_')[0]
minPresent = timeUsed.split('_')[1]
if abs(int(hrPresent)-int(hr)) == 1:
if abs(int(minPresent)-int(min)) <=30:
mapOfsimmilarFiles[timeUsed].append(fileNames)
else:
#same hr but not 30mins so add to map as a new entry
mapOfsimmilarFiles[hr+'_'+min] = [fileNames]
break
mapOfsimmilarFiles[hr+'_'+min] = [fileNames]
导入操作系统
folderPath='/somepath'
fileinfolder=os.listdir(folderPath)
MapofSimilarFile={}
keyForMaps=0
对于排序后的文件名(fileinfoldeyr):
timePartOfFile=文件名。拆分('-')[-1]。拆分('.dat')[0]
hr=时间部分文件[0:2]
最小值=时间部分文件[2:4]
秒=时间部分文件[4:]
如果len(mapofsimilarfiles.keys())==0:
MapofSimilarFiles[hr+'''+min]=[文件名]
其他:
MinsPresentMaps=MapofSimilarFiles.keys()
hrPresent=[int(h.split('''u')[0]),表示在similarfiles的映射中的h]
minPresent=[(h.split(“”“)[1]),在SimilarFile的映射中表示h]
对于地图中使用的时间:
hrPresent=timeUsed.split(“”“)[0]
minPresent=timeUsed.split(“”“)[1]
如果abs(int(hr存在)-int(hr))==1:
如果abs(int(minPresent)-int(min))首先将数据转换为dict,然后相应地连接这些字符串
代码:
d = ['20180828-024308.dat', '20180828-024434.dat', '20180828-030335.dat', '20180828-032114.dat', '20180828-040041.dat']
output = {}
for i in d:
key = i.split('-')[0]
key1 = i.split('-')[1]
# print(output)
if key in output:
if key1[0:2] in output[key]:
output[key][key1[0:2]].append(key1[2:])
else:
output[key][key1[0:2]] = [key1[2:]]
else:
output[key] = {}
output[key][key1[0:2]] = [key1[2:]]
print(output)
# print("_".join("{}_{}".format(k, v) for k, v in output.items()))
main_output = []
for i in output.keys():
temp = []
for j in output[i].keys():
# [s + mystring for s in mylist]
temp.append([i + '-' + j + s for s in output[i][j]])
main_output.extend(temp)
print(main_output)
{'20180828': {'02': ['4308.dat', '4434.dat'], '03': ['0335.dat', '2114.dat'], '04': ['0041.dat']}}
[['20180828-024308.dat', '20180828-024434.dat'], ['20180828-030335.dat', '20180828-032114.dat'], ['20180828-040041.dat']]
输出:
d = ['20180828-024308.dat', '20180828-024434.dat', '20180828-030335.dat', '20180828-032114.dat', '20180828-040041.dat']
output = {}
for i in d:
key = i.split('-')[0]
key1 = i.split('-')[1]
# print(output)
if key in output:
if key1[0:2] in output[key]:
output[key][key1[0:2]].append(key1[2:])
else:
output[key][key1[0:2]] = [key1[2:]]
else:
output[key] = {}
output[key][key1[0:2]] = [key1[2:]]
print(output)
# print("_".join("{}_{}".format(k, v) for k, v in output.items()))
main_output = []
for i in output.keys():
temp = []
for j in output[i].keys():
# [s + mystring for s in mylist]
temp.append([i + '-' + j + s for s in output[i][j]])
main_output.extend(temp)
print(main_output)
{'20180828': {'02': ['4308.dat', '4434.dat'], '03': ['0335.dat', '2114.dat'], '04': ['0041.dat']}}
[['20180828-024308.dat', '20180828-024434.dat'], ['20180828-030335.dat', '20180828-032114.dat'], ['20180828-040041.dat']]
据我所知,
假设您的数据帧看起来像:
print(df)
files
0 20180828-024308.dat
1 20180828-024434.dat
2 20180828-030335.dat
3 20180828-032114.dat
4 20180828-040041.dat
df['file_time']= pd.to_datetime(df['files'].str.split('.dat').str[0])
df.groupby([pd.Grouper(key='file_time',freq='1800s')])['files'].apply(list).reset_index()
产出:
file_time files
0 2018-08-28 02:30:00 [20180828-024308.dat, 20180828-024434.dat]
1 2018-08-28 03:00:00 [20180828-030335.dat, 20180828-032114.dat]
2 2018-08-28 03:30:00 []
3 2018-08-28 04:00:00 [20180828-040041.dat]
注意:由于3:30-4范围内没有文件,因此列表为空。您能显示您拥有的吗tried@i在3:30-4之间看不到文件,因此该组的列表将为空,对吗?