Python 3.x 如何按对象洗牌?
我有一个包含图像名称和多个包含特征的列的数据框,图像可以包含多个具有相同图像名称但具有不同列值的行 以下是数据帧的外观:Python 3.x 如何按对象洗牌?,python-3.x,pandas,Python 3.x,Pandas,我有一个包含图像名称和多个包含特征的列的数据框,图像可以包含多个具有相同图像名称但具有不同列值的行 以下是数据帧的外观: image val1 val2 val3 0 image1.png 12 14 15 1 image1.png 10 15 10 2 image2.png 12 -3 7 3 image2.png 17 21 1 4 image6.png 12 12
image val1 val2 val3
0 image1.png 12 14 15
1 image1.png 10 15 10
2 image2.png 12 -3 7
3 image2.png 17 21 1
4 image6.png 12 12 2
5 image6.png 112 12 10
然后我需要按图像名称对图像进行分组,因此我使用groupby()
:
然后我需要将数据拆分为训练集和验证集,因此我执行以下操作:
groups = groups.apply(np.array)
training_set = groups[:separation_index]
valid_set = groups[separation_index:]
问题是我需要在分割之前先洗牌数据(组)
我尝试了
np.random.shuffle(groups)
,但它不起作用,不会产生任何错误,但它不起作用,数据保持相同的顺序。我认为你可以不分组就这样做,而不是将唯一的组名(图像)作为列表,从该列表中随机选择列车图像,然后对数据帧进行索引
df = pd.DataFrame.from_records(
[
{"image": "image1.png", "val1": 12, "val2": 14, "val3": 15},
{"image": "image1.png", "val1": 10, "val2": 15, "val3": 10},
{"image": "image2.png", "val1": 12, "val2": -3, "val3": 7},
{"image": "image2.png", "val1": 17, "val2": 21, "val3": 1},
{"image": "image6.png", "val1": 12, "val2": 12, "val3": 2},
{"image": "image6.png", "val1": 112, "val2": 12, "val3": 10},
]
)
images = df["image"].unique()
train_images = np.random.choice(images, size=2, replace=False)
train_idxs = df["image"].isin(train_images)
train_df = df[train_idxs]
test_df = df[~train_idxs]
print(train_df)
print()
print(test_df)
image val1 val2 val3
0 image1.png 12 14 15
1 image1.png 10 15 10
4 image6.png 12 12 2
5 image6.png 112 12 10
image val1 val2 val3
2 image2.png 12 -3 7
3 image2.png 17 21 1
我想你可以不用分组,而是把唯一的组名(图像)作为一个列表,从列表中随机选择火车图像,然后索引数据帧
df = pd.DataFrame.from_records(
[
{"image": "image1.png", "val1": 12, "val2": 14, "val3": 15},
{"image": "image1.png", "val1": 10, "val2": 15, "val3": 10},
{"image": "image2.png", "val1": 12, "val2": -3, "val3": 7},
{"image": "image2.png", "val1": 17, "val2": 21, "val3": 1},
{"image": "image6.png", "val1": 12, "val2": 12, "val3": 2},
{"image": "image6.png", "val1": 112, "val2": 12, "val3": 10},
]
)
images = df["image"].unique()
train_images = np.random.choice(images, size=2, replace=False)
train_idxs = df["image"].isin(train_images)
train_df = df[train_idxs]
test_df = df[~train_idxs]
print(train_df)
print()
print(test_df)
image val1 val2 val3
0 image1.png 12 14 15
1 image1.png 10 15 10
4 image6.png 12 12 2
5 image6.png 112 12 10
image val1 val2 val3
2 image2.png 12 -3 7
3 image2.png 17 21 1
您可以在熊猫中洗牌数据:
groups = df.groupby('image')
grouped_df = groups.aggregate(np.sum)
# random order for all rows
grouped_df = grouped_df.sample(frac=1)
结果:
In [103]: grouped_df
Out[103]:
val1 val2 val3
image
image2.png 29 18 8
image6.png 124 24 12
image1.png 22 29 25
然后您可以将其索引到:
grouped_df[:separation_index]
grouped_df[separation_index:]
您可以在熊猫中洗牌数据:
groups = df.groupby('image')
grouped_df = groups.aggregate(np.sum)
# random order for all rows
grouped_df = grouped_df.sample(frac=1)
结果:
In [103]: grouped_df
Out[103]:
val1 val2 val3
image
image2.png 29 18 8
image6.png 124 24 12
image1.png 22 29 25
然后您可以将其索引到:
grouped_df[:separation_index]
grouped_df[separation_index:]
这正是我遇到同样问题时所做的。只是我用了
train\u images=np.random.choice(images,replace=False)
@QuangHoang是的,这比我遇到同样问题时做的更好。只是我使用了train\u images=np.random.choice(images,replace=False)
@QuangHoang是的,这样更好