Tensorflow 理解log\u uniform\u候选者\u采样器中的真实类

Tensorflow 理解log\u uniform\u候选者\u采样器中的真实类,tensorflow,Tensorflow,使用tf.random.log\u uniform\u候选者\u采样器进行负采样 本教程将true_类设置为context_类 我的实验表明,无论我为true_类设置了什么,该函数都会产生良好的结果 > tf.random.log_uniform_candidate_sampler( true_classes=[[1]], num_true=1, num_sampled=num_ns, unique=True, range_

使用
tf.random.log\u uniform\u候选者\u采样器
进行负采样

本教程将true_类设置为context_类

我的实验表明,无论我为true_类设置了什么,该函数都会产生良好的结果

> tf.random.log_uniform_candidate_sampler( true_classes=[[1]],
                num_true=1, num_sampled=num_ns, 
                unique=True, range_max=vocab_size)
[0, 1, 7, 5]

> tf.random.log_uniform_candidate_sampler( true_classes=[[2]],
                num_true=1, num_sampled=num_ns, 
                unique=True, range_max=vocab_size)
[0, 6, 2, 5]

在这个函数中,
true\u类
是什么意思?

教程中的一行:

您可以对一个skip grams的目标词调用该函数并传递 将上下文单词作为真实类,以将其从采样中排除

这是误导

true_类在此函数中的含义是什么

函数返回此函数中定义的
true\u预期\u计数

true\u类
似乎仅用于计算
true\u预期计数
。所以这个函数不排除负类。每个标签都有被抽样的概率

我复制了一个可以进行实验的示例代码(以防链接出现问题):

# Do sampling 1000 times using true_classes [0, 8]
sample_func = lambda ii: tf.random.log_uniform_candidate_sampler(true_classes=[[ii]], num_true=1, num_sampled=4, unique=True, range_max=8, seed=42)
dd = {ii : np.stack([sample_func(ii)[0].numpy() for jj in range(1000)]) for ii in range(8)}
# Calculate the distribution in each true_class
for ii in dd:
    print("true_class:", ii, ", negative value_counts:", pd.value_counts(dd[ii].flatten()).to_dict())
# true_class: 0 , negative value_counts: {0: 871, 1: 722, 2: 584, 3: 466, 4: 402, 5: 329, 7: 319, 6: 307}
# true_class: 1 , negative value_counts: {0: 867, 1: 695, 2: 571, 3: 485, 4: 411, 5: 380, 6: 316, 7: 275}
# true_class: 2 , negative value_counts: {0: 869, 1: 716, 2: 541, 3: 488, 4: 389, 5: 357, 6: 321, 7: 319}
# true_class: 3 , negative value_counts: {0: 877, 1: 715, 2: 582, 3: 482, 4: 394, 5: 355, 6: 318, 7: 277}
# true_class: 4 , negative value_counts: {0: 883, 1: 716, 2: 566, 3: 489, 4: 394, 5: 367, 6: 316, 7: 269}
# true_class: 5 , negative value_counts: {0: 862, 1: 717, 2: 583, 3: 496, 4: 376, 5: 357, 6: 315, 7: 294}
# true_class: 6 , negative value_counts: {0: 859, 1: 725, 2: 575, 3: 482, 4: 413, 5: 356, 6: 302, 7: 288}
# true_class: 7 , negative value_counts: {0: 880, 1: 724, 2: 555, 3: 488, 4: 425, 5: 324, 7: 302, 6: 302}

# Result of `true_expected_count`
print({ii : np.mean([sample_func(ii)[1].numpy() for jj in range(1000)]) for ii in range(8)})
# {0: 0.99967235, 1: 0.7245632, 2: 0.5737029, 3: 0.47004792, 4: 0.3987442, 5: 0.34728608, 6: 0.3084587, 7: 0.27554017}