Concurrency 如何对rust中的文件执行线程安全IO和缓存？_Concurrency_Rust_Io_File Access

Concurrency 如何对rust中的文件执行线程安全IO和缓存？

concurrency rust io

Concurrency 如何对rust中的文件执行线程安全IO和缓存？,concurrency,rust,io,file-access,Concurrency,Rust,Io,File Access,背景：我正在编写一个web服务器，我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中（这些段的大小可以高达10MB）。大概是这样的： pub async fn segment_handler(segment: String) { if is_cached(&segment) { return get_from_cache(segment) } // Do some computation to get the result. let re

背景：

我正在编写一个web服务器，我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中（这些段的大小可以高达10MB）。大概是这样的：

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

既然具有不同

段的多个线程可以调用段处理程序，那么fs:：write
线程安全吗？如果不是，我们就不能使用互斥，因为每次调用的段：String
可能不同，使用互斥将使性能更差。我需要类似互斥的东西，但只在segment:String
上。这个问题的解决办法是什么
环境：

锈蚀：1.47
Web服务器：warp
代码用于：使用ffmpeg的HLS流
repo：（缓存尚未实现）
您发布的代码没有编译，因为没有fs:：create
之类的东西，不过幸运的是您根本不需要它。该函数为您创建文件
至少在Linux上，从多个不同线程在同一路径上同时调用fs:：write
将导致包含传递给其中一个fs:：write
调用的内容的文件。请注意，如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它，则可能会导致多个线程重新计算相同的值，然后所有线程都将其写入文件

您应该知道，由于您使用的是async/await，因此不允许使用std:：fs
模块，因为它会阻塞线程。您可以这样使用tokio:：fs:：write
：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

另一个正确的选择是这样使用：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

在Tokio的文档中，您可以阅读更多关于为什么必须正确处理这种阻塞的信息
Tokio能够通过反复交换每个线程上当前正在运行的任务，在几个线程上同时运行多个任务。但是，这种交换只能发生在.await
点，因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题，Tokio提供了两种线程：核心线程和阻塞线程。核心线程是所有异步代码运行的地方，默认情况下，Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的，可用于运行阻塞代码，否则会阻止其他任务运行
要生成阻塞任务，应使用函数
注意，我已经链接到了Tokio0.2的文档，因为warp还不支持Tokio0.3

如果函数在第一次调用完成之前被多次调用，为了防止多次计算该值，可以使用一种基于存储在互斥锁后面的HashMap
的技术，如下所示：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

完整示例可在上找到
由于互斥，此方法是完全线程安全的。请注意，这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息，请参阅Tokio教程。
您发布的代码不可编译，因为没有fs:：create
，不过幸运的是，您根本不需要它。该函数为您创建文件
至少在Linux上，从多个不同线程在同一路径上同时调用fs:：write
将导致包含传递给其中一个fs:：write
调用的内容的文件。请注意，如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它，则可能会导致多个线程重新计算相同的值，然后所有线程都将其写入文件

您应该知道，由于您使用的是async/await，因此不允许使用std:：fs
模块，因为它会阻塞线程。您可以这样使用tokio:：fs:：write
：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

另一个正确的选择是这样使用：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

在Tokio的文档中，您可以阅读更多关于为什么必须正确处理这种阻塞的信息
Tokio能够通过反复交换每个线程上当前正在运行的任务，在几个线程上同时运行多个任务。但是，这种交换只能发生在.await
点，因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题，Tokio提供了两种线程：核心线程和阻塞线程。核心线程是所有异步代码运行的地方，默认情况下，Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的，可用于运行阻塞代码，否则会阻止其他任务运行
要生成阻塞任务，应使用函数
注意，我已经链接到了Tokio0.2的文档，因为warp还不支持Tokio0.3

如果函数在第一次调用完成之前被多次调用，为了防止多次计算该值，可以使用一种基于存储在互斥锁后面的HashMap
的技术，如下所示：
pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}

pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}

use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}

完整示例可在上找到
由于互斥，此方法是完全线程安全的。请注意，这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息，请参阅Tokio教程。
1。我计划在写入文件后将缓存状态存储在hashmap中。因此，它不会仅仅基于一个文件的存在。2.但即使使用上述实现和tokio:：task:：spawn_阻塞
，对同一段的并发调用仍有可能尝试写入相同的文件、相同的内容。这可以避免吗？我添加了另一个部分来描述如何避免这种情况。这看起来很整洁。为了（过度）简化，几个hashmap上的互斥将通过执行以下操作来帮助管理状态：允许计算/获取缓存的res