Multithreading 线程和线程池中的内存消耗_Multithreading_Memory_Rust_Threadpool_Free

Multithreading 线程和线程池中的内存消耗

multithreading memory rust

Multithreading 线程和线程池中的内存消耗,multithreading,memory,rust,threadpool,free,Multithreading,Memory,Rust,Threadpool,Free,我有一个内存消耗问题。在池线程场景中，我有许多任务（>20000）。我的问题是内存消耗非常非常高。最终结果通常大于40MB，但内存消耗大于60GB 我很好奇，因为线程应该在完成后释放内存。但他们似乎没有这样做实际例行程序应过滤数据集。零件在线程中生成和释放。过滤后的数据是发送到通道的数据我用heaptrack检查了一下，发现内存呈线性增长。主内存消耗源于get_u[left | right]\u side（）函数。这没关系但是，我希望线程在完成后释放已使用的内存。如果他们愿意，可能会有大量

我有一个内存消耗问题。在池线程场景中，我有许多任务（>20000）。我的问题是内存消耗非常非常高。最终结果通常大于40MB，但内存消耗大于60GB

我很好奇，因为线程应该在完成后释放内存。但他们似乎没有这样做

实际例行程序应过滤数据集。零件在线程中生成和释放。过滤后的数据是发送到通道的数据

我用heaptrack检查了一下，发现内存呈线性增长。主内存消耗源于

get_u[left | right]\u side（）

函数。这没关系

但是，我希望线程在完成后释放已使用的内存。如果他们愿意，可能会有大量批次（

batch\u nb

）没有任何问题，但情况似乎并非如此。我错过了什么

代码如下：

    // set is the input and a Vec<Vec<LargeDataStructure>>

    // The `TupleGenerator` produces a vector (a batch) of tuples (with `get_next_batch()`)
    //  instead of calculating all the tuples at once. If I'd do this, the memory would be
    // full very fast. 
    let tuple_gen = Arc::new(Mutex::new(TupleGenerator::new(&set)));

    // arc the set
    let set = Arc::new(set);

    // I divide the `set` into chunks and calculate each chunk in a thread
    let batch_size = calc_batch_size(&final_size, &num_cpus::get(), nb_of_sets);

    // the resulting vector
    let result: Arc<Mutex<Vec<Vec<LargeDataStructure>>>> = Arc::new(Mutex::new(vec![]));

    let pool = ThreadPool::new(num_cpus::get() as usize - 1);

    let (tx, rx) = channel();

    for bn in 0..batch_nb + 1 {

        let intermediate_ = intermediate.clone();
        let set_ = set.clone();
        let tuple_gen_ = tuple_gen.clone();
        let tx_ = tx.clone();

        pool.execute(move || {
            let tuples = tuple_gen_.lock().unwrap().get_next_batch();

            let mut tmp_result: Vec<Vec<SumComposition>> = vec![];
            
            // this for-loop consumes most of the memory
            for tuple in tuples {

                // generate a sumcomposition tuple from the index tuple
                let mut tmp = vec![];
                for (j, index) in tuple.iter().enumerate() {
                    tmp.push(set_[j][*index].clone());
                }

                // if CONSTRAINTS are given, check it
                if intermediate_.has_suchthat {
                    let left = get_left_side(&tmp, &intermediate_.terms); // -> LargeDataStructure
                    let right = get_right_side(&tmp, &intermediate_.terms); // -> LargeDataStructure
                    if left == right {
                        tmp_result.push(tmp);
                    }
                } else {
                    tmp_result.push(tmp);
                }
            }
            // everything should be freed here

            // the result is send to the parent thread
            tx_.send(tmp_result).unwrap();
        });
    }

    for nb in 0..batch_nb + 1 {
        let tmp_result = rx.recv().unwrap();
        result.extend(tmp_result);
    }

    Ok(result)

//set是输入和一个Vec
//“TupleGenerator”生成一个元组向量（一批）（带有“get_next_batch（）”）
//而不是一次计算所有元组。如果我这么做的话，记忆就会
//满的很快。
让tuple_gen=Arc:：new（Mutex:：new（TupleGenerator:：new（&set））；
//弧形设置
设为集合=弧：：新（集合）；
//我将“set”划分为多个块，并计算线程中的每个块
让batch_size=calc_batch_size（&final_size，&num_CPU:：get（），nb_of_set）；
//结果向量
let结果：Arc=Arc:：new（Mutex:：new（vec！[]）；
让pool=ThreadPool:：new（num_cpu:：get（）作为usize-1）；
let（tx，rx）=信道（）；
对于0.批次中的bn\u nb+1{
让intermediate=intermediate.clone（）；
让set=set.clone（）；
让tuple_gen=tuple_gen.clone（）；
让tx_uux=tx.clone（）；
池。执行（移动| |）{
让tuples=tuple\u gen\u.lock（）.unwrap（）.get\u next\u batch（）；
让mut tmp_结果：Vec=Vec！[]；
//这个for循环消耗了大部分内存
对于元组中的元组{
//从索引元组生成sumcomposition元组
让mut tmp=vec！[]；
对于tuple.iter（）.enumerate（）中的（j，index）{
push（set_j[j][*index].clone（））；
}
//如果给定了约束，请检查它
如果中间人有这样的话{
left=get_left_side（&tmp，&intermediate_.terms）；//->大数据结构
let right=get_right_side（&tmp，&intermediate_.terms）；//->LargeDataStructure
如果左==右{
tmp_结果推送（tmp）；
}
}否则{
tmp_结果推送（tmp）；
}
}
//这里的一切都应该被释放
//结果将发送到父线程
发送（tmp_结果）.unwrap（）；
});
}
对于0.批次中的nb\u nb+1{
让tmp_result=rx.recv（）.unwrap（）；
结果。扩展（tmp_结果）；
}
好（结果）

我不知道，但是当创建提交到线程池的任务时，您的主

for

循环似乎正在“克隆”一些数据结构。这些“克隆”有多大？克隆数据是否可以推迟到每个任务实际开始执行时再进行？@SolomonSlow被克隆的数据位于弧中，因此只有弧被克隆（这应该只是一个指针）。可能是使用

Arc:：clone

来确定的。但是，基于

tuple

info，任务内部有实际的副本，这是持久的，因为它是结果的一部分，从未真正收集到。有多少线程被添加到池中？使用

num\u CPU:：get\u pysical（）

是否会产生不同的行为？我注意到

num_CPU

页面上有这样一句话：“有时CPU会夸大它包含的CPU数量，因为当有更多线程时，它可以使用处理器技巧来提高性能。”