Python 如何更改pytorch的datafolder中的标签?
我首先加载未标记的数据集,如下所示:Python 如何更改pytorch的datafolder中的标签?,python,deep-learning,pytorch,semisupervised-learning,Python,Deep Learning,Pytorch,Semisupervised Learning,我首先加载未标记的数据集,如下所示: unlabeled\u set=DatasetFolder(“food-11/training/unlabeled”,loader=lambda x:Image.open(x),extensions=“jpg”,transform=train\u tfm) 现在,由于我试图进行半监督学习:我试图定义以下函数。输入“dataset”是我刚刚加载的未标记的_集 由于我想将数据集的标签更改为我预测的标签,而不是原始标签(所有原始标签都是1),我该怎么做 我曾尝试使
unlabeled\u set=DatasetFolder(“food-11/training/unlabeled”,loader=lambda x:Image.open(x),extensions=“jpg”,transform=train\u tfm)
现在,由于我试图进行半监督学习:我试图定义以下函数。输入“dataset”是我刚刚加载的未标记的_集
由于我想将数据集的标签更改为我预测的标签,而不是原始标签(所有原始标签都是1),我该怎么做
我曾尝试使用dataset.targets更改标签,但根本不起作用。
以下是我的职责:
import torch
def get_pseudo_labels(dataset, model, threshold=0.07):
# This functions generates pseudo-labels of a dataset using given model.
# It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
# You are NOT allowed to use any models trained on external data for pseudo-labeling.
device = "cuda" if torch.cuda.is_available() else "cpu"
x = []
y = []
# print(dataset.targets[0])
# Construct a data loader.
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)
# Make sure the model is in eval mode.
model.eval()
# Define softmax function.
softmax = nn.Softmax()
counter = 0
# Iterate over the dataset by batches.
for batch in tqdm(data_loader):
img, _ = batch
# Forward the data
# Using torch.no_grad() accelerates the forward process.
with torch.no_grad():
logits = model(img.to(device))
# Obtain the probability distributions by applying softmax on logits.
probs = softmax(logits)
count = 0
# ---------- TODO ----------
# Filter the data and construct a new dataset.
dataset.targets = torch.tensor(dataset.targets)
for p in probs:
if torch.max(p) >= threshold:
if not(counter in x):
x.append(counter)
dataset.targets[counter] = torch.argmax(p)
counter += 1
# Turn off the eval mode.
model.train()
# dat = DataLoader(ImgDataset(x,y), batch_size=batch_size, shuffle=False)
print(dataset.targets[10])
new = torch.utils.data.Subset(dataset, x)
return new```
PyTorch数据集可以返回值的元组,但它们没有固有的“特性”/“目标”区别。您可以这样创建修改后的数据集:
labeled_data=[*zip(数据集,标签)]
数据加载器=数据加载器(标记为数据集,批量大小=批量大小,随机播放=假)
对于IMG,数据加载器中的标签:#每批
...
谢谢您的回答!我已修复以下代码的问题:。为其他人张贴在这里@李彥儒 如果这有助于你解决问题,你介意接受它吗?