Python 无法处理MD的大型轨迹文件_Python_Pandas_Memory

Python 无法处理MD的大型轨迹文件

python pandas memory

Python 无法处理MD的大型轨迹文件,python,pandas,memory,Python,Pandas,Memory,对于MD模拟，我必须读入一个大的坐标文件，随机选择其中的几个并存储它们 import pickle import numpy as np import itertools import pandas as pd df = open('/Users/apple/Downloads/traj-3frames.crd','r').readlines()[1:] df = list(map(lambda x : x.strip().split(' '),df)) image_ends = [i fo

对于MD模拟，我必须读入一个大的坐标文件，随机选择其中的几个并存储它们

import pickle
import numpy as np
import itertools
import pandas as pd

df = open('/Users/apple/Downloads/traj-3frames.crd','r').readlines()[1:]
df = list(map(lambda x : x.strip().split('  '),df))
image_ends = [i for i,x in enumerate(df) if len(x) < 4]

no_images = 2
images_selected_indices = np.random.choice(len(image_ends),no_images,True)

for index,image in enumerate(images_selected_indices):
image_index = image_ends[image]
if image == 0:
    selected_image = df[0:image_index]
else:
    previous_image_index = image_ends[image-1]
    selected_image = df[previous_image_index + 1:image_index]
flattened_coordinates = list(itertools.chain.from_iterable(selected_image))
total_rows = int(len(flattened_coordinates)/3)
coordinates_df = [flattened_coordinates[i*3:(i+1)*3] for i in range(total_rows)]
coordinates_df = pd.DataFrame(coordinates_df,columns = ['x','y','z'])

pickle.dump(coordinates_df,open('image_' + str(index) + '.pkl','wb'))

你能从文件中发布一些样本数据吗？这会很有帮助，因为看起来您正在加载所有内容，然后排除了一些（len（x）<4）示例文件也相对较大，但是您可以看到，我在（len（x）<4）时正在拆分

 58.783   3.416  63.966  59.191   2.573  63.767  59.211   3.704  64.773   5.431
 16.078   1.087   6.188  16.647   1.226   4.706  16.534   1.514   2.794  15.157
 13.977   2.273  14.423  14.305   2.369  15.933  14.342  10.048  12.837   4.644
 10.828  13.338   4.404   9.627  13.361   5.325
 66.377  65.275  64.865
 32.355  31.829  32.630  33.434  32.598  31.898  32.073  30.808  32.343  32.476
 31.750  33.717  30.835  32.893  32.578  34.323  33.549  32.603  34.745  32.201
 32.260  33.192  32.780  30.852  34.067  33.743  33.644  34.571  34.408  31.982