Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/337.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 张量流分层抽样误差_Python_Python 3.x_Machine Learning_Tensorflow - Fatal编程技术网

Python 张量流分层抽样误差

Python 张量流分层抽样误差,python,python-3.x,machine-learning,tensorflow,Python,Python 3.x,Machine Learning,Tensorflow,我试图在Tensorflow中使用tf.contrib.training.分层样本来平衡课堂。我在下面做了一个快速的例子来测试它,以一种平衡的方式从两个不平衡的类中抽取样本并进行验证,但是我得到了一个错误 import tensorflow as tf from tensorflow.python.framework import ops from tensorflow.python.framework import dtypes batch_size = 10 data = ['a']*99

我试图在Tensorflow中使用
tf.contrib.training.分层样本
来平衡课堂。我在下面做了一个快速的例子来测试它,以一种平衡的方式从两个不平衡的类中抽取样本并进行验证,但是我得到了一个错误

import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes

batch_size = 10
data = ['a']*9990+['b']*10
labels = [1]*9990+[0]*10
data_tensor = ops.convert_to_tensor(data, dtype=dtypes.string)
label_tensor = ops.convert_to_tensor(labels)
target_probs = [0.5,0.5]
data_batch, label_batch = tf.contrib.training.stratified_sample(
    data_tensor, label_tensor, target_probs, batch_size,
    queue_capacity=2*batch_size)

with tf.Session() as sess:
    d,l = sess.run(data_batch,label_batch)
print('percentage "a" = %.3f' % (np.sum(l)/len(l)))
我得到的错误是:

Traceback (most recent call last):   
File "/home/jason/code/scrap.py", line 56, in <module>
    test_stratified_sample()   
File "/home/jason/code/scrap.py", line 47, in test_stratified_sample
    queue_capacity=2*batch_size)   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/contrib/training/python/training/sampling_ops.py", line 191, in stratified_sample
    with ops.name_scope(name, 'stratified_sample', tensors + [labels]):   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/math_ops.py", line 829, in binary_op_wrapper
    y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 676, in convert_to_tensor
    as_ref=False)   File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 741, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/constant_op.py", line 113, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_util.py", line 374, in make_tensor_proto
    _AssertCompatible(values, dtype)   
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__)) TypeError: Expected string, got list containing Tensors of type '_Message' instead.
回溯(最近一次呼叫最后一次):
文件“/home/jason/code/scrap.py”,第56行,在
测试分层样本()
文件“/home/jason/code/scrap.py”,第47行,在测试样本中
队列容量=2*批量大小)
文件“/usr/local/lib/python3.4/dist packages/tensorflow/contrib/training/python/training/sampling_ops.py”,第191行,分层样本
使用ops.name\u范围(名称“分层样本”,张量+[标签]):
文件“/usr/local/lib/python3.4/dist packages/tensorflow/python/ops/math_ops.py”,第829行,二进制_op_包装
y=ops.convert_to_tensor(y,dtype=x.dtype.base_dtype,name=“y”)
文件“/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py”,第676行,在convert_-to_-tensor中
as_ref=False)文件“/usr/local/lib/python3.4/dist packages/tensorflow/python/framework/ops.py”,第741行,内部_convert_to_tensor
ret=conversion\u func(值,dtype=dtype,name=name,as\u ref=as\u ref)
文件“/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/constant\u op.py”,第113行,在常量张量转换函数中
返回常量(v,dtype=dtype,name=name)
文件“/usr/local/lib/python3.4/dist packages/tensorflow/python/framework/constant_op.py”,第102行,常量
tensor_util.make_tensor_proto(值,dtype=dtype,shape=shape,verify_shape=verify_shape))
文件“/usr/local/lib/python3.4/dist packages/tensorflow/python/framework/tensor\u util.py”,第374行,在make\u tensor\u proto中
_资产可兼容(值、数据类型)
文件“/usr/local/lib/python3.4/dist packages/tensorflow/python/framework/tensor_util.py”,第302行,在_AssertCompatible中
(dtype.name,repr(不匹配),type(不匹配)。_name__;)TypeError:应为字符串,获取的列表包含“_Message”类型的张量。
这个错误不能解释我做错了什么。我还尝试将原始数据和标签放入(不转换为张量),并尝试使用
tf.train.slice\u input\u producer
创建数据和标签张量的初始队列


有人得到了分层样本工作吗?我找不到任何示例。

我已将代码修改为适合我的内容。更改摘要:

  • 使用
    enqueue\u many=True
    将一批具有不同标签的示例排队。否则,它需要一个单标量标签张量(当队列运行者计算时,它可能是随机的)
  • 第一个参数应该是张量列表。它应该有一个更好的错误消息(我想这就是你遇到的)。请发送请求或在Github上打开问题,以获得更好的错误消息
  • 启动队列运行程序。否则,使用队列的代码将死锁。或者使用
    Estimator
    s或
    MonitoredSession
    ,这样您就不必担心这个问题
  • (根据评论编辑)
    分层样本
    不洗牌数据,只接受/拒绝!因此,如果您的数据不是随机的,请考虑将其通过“代码”> SLICHYLIN PUPUT/<代码>(<代码> EnQueLeIO多= false )或“代码> SUFFLYLY批次(<代码> EnQueLeIn多=真< /代码>),如果您想以随机顺序出现,则在采样之前。
修改的代码(根据Jason的评论改进):

产出:

percentage "a" = 0.480
percentage "a" = 0.440
percentage "a" = 0.580
percentage "a" = 0.570
percentage "a" = 0.580
percentage "a" = 0.520
percentage "a" = 0.480
percentage "a" = 0.460
percentage "a" = 0.390
percentage "a" = 0.530
Overall: 0.503

如果我将代码中的
data=['a']*9990+['b']*10个标签=[1]*9990+[0]*10
更改为
data=['a']*9000+['b']*1000个标签=[1]*9000+[0]*1000
,它将中断并只生成类1示例(“a”s)。您的代码确实可以像发布的那样工作,但我不明白为什么上面的更改(这显然使它更现实,因为批大小远远小于两个类中的数量)会破坏它。它也更加平衡,但结果是完全不平衡的。奇妙的一点,应该抓住这一点<代码>分层样本
不洗牌,只接受/拒绝。因此,如果输入的顺序是非随机的,那么输出也会是随机的。我在示例中添加了一个洗牌步骤,在出列后使用高
min\u来确保数据在采样前洗牌。这是一个问题,即使有更高的不平衡,它只是被隐藏了,因为这么多的大多数阶级被丢弃了。这是有道理的。谢谢。为了完整性,因为我更喜欢单个示例输出(用于文件加载、扩充等),我将
shuffled\u数据,shuffled\u标签=tf.train.shuffle\u批(…)
替换为
shuffled\u数据,shuffled\u标签=tf.train.slice\u输入\u生产者([数据张量,标签张量],shuffle=True,容量=3*批大小)
并将enqueue\u many设置为false。这是更快的(~9秒vs.120秒),因为它拒绝单个示例,而不是完整的批次。哈,这是相当快的一点。我对答案进行了编辑,以纳入您的改进。我有点惊讶,因为分层样本会重新批处理数据,所以它不会拒绝整个批处理。
percentage "a" = 0.480
percentage "a" = 0.440
percentage "a" = 0.580
percentage "a" = 0.570
percentage "a" = 0.580
percentage "a" = 0.520
percentage "a" = 0.480
percentage "a" = 0.460
percentage "a" = 0.390
percentage "a" = 0.530
Overall: 0.503