Python 无法处理MD的大型轨迹文件
对于MD模拟,我必须读入一个大的坐标文件,随机选择其中的几个并存储它们Python 无法处理MD的大型轨迹文件,python,pandas,memory,Python,Pandas,Memory,对于MD模拟,我必须读入一个大的坐标文件,随机选择其中的几个并存储它们 import pickle import numpy as np import itertools import pandas as pd df = open('/Users/apple/Downloads/traj-3frames.crd','r').readlines()[1:] df = list(map(lambda x : x.strip().split(' '),df)) image_ends = [i fo
import pickle
import numpy as np
import itertools
import pandas as pd
df = open('/Users/apple/Downloads/traj-3frames.crd','r').readlines()[1:]
df = list(map(lambda x : x.strip().split(' '),df))
image_ends = [i for i,x in enumerate(df) if len(x) < 4]
no_images = 2
images_selected_indices = np.random.choice(len(image_ends),no_images,True)
for index,image in enumerate(images_selected_indices):
image_index = image_ends[image]
if image == 0:
selected_image = df[0:image_index]
else:
previous_image_index = image_ends[image-1]
selected_image = df[previous_image_index + 1:image_index]
flattened_coordinates = list(itertools.chain.from_iterable(selected_image))
total_rows = int(len(flattened_coordinates)/3)
coordinates_df = [flattened_coordinates[i*3:(i+1)*3] for i in range(total_rows)]
coordinates_df = pd.DataFrame(coordinates_df,columns = ['x','y','z'])
pickle.dump(coordinates_df,open('image_' + str(index) + '.pkl','wb'))
你能从文件中发布一些样本数据吗?这会很有帮助,因为看起来您正在加载所有内容,然后排除了一些(len(x)<4)示例文件也相对较大,但是您可以看到,我在(len(x)<4)时正在拆分
58.783 3.416 63.966 59.191 2.573 63.767 59.211 3.704 64.773 5.431
16.078 1.087 6.188 16.647 1.226 4.706 16.534 1.514 2.794 15.157
13.977 2.273 14.423 14.305 2.369 15.933 14.342 10.048 12.837 4.644
10.828 13.338 4.404 9.627 13.361 5.325
66.377 65.275 64.865
32.355 31.829 32.630 33.434 32.598 31.898 32.073 30.808 32.343 32.476
31.750 33.717 30.835 32.893 32.578 34.323 33.549 32.603 34.745 32.201
32.260 33.192 32.780 30.852 34.067 33.743 33.644 34.571 34.408 31.982