Concurrency 如何对rust中的文件执行线程安全IO和缓存?

Concurrency 如何对rust中的文件执行线程安全IO和缓存?,concurrency,rust,io,file-access,Concurrency,Rust,Io,File Access,背景: 我正在编写一个web服务器,我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中(这些段的大小可以高达10MB)。大概是这样的: pub async fn segment_handler(segment: String) { if is_cached(&segment) { return get_from_cache(segment) } // Do some computation to get the result. let re

背景:

我正在编写一个web服务器,我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中(这些段的大小可以高达10MB)。大概是这样的:

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
既然具有不同
段的多个线程可以调用
段处理程序
,那么
fs::write
线程安全吗?如果不是,我们就不能使用互斥,因为每次调用的
段:String
可能不同,使用互斥将使性能更差。我需要类似互斥的东西,但只在
segment:String
上。这个问题的解决办法是什么

环境:

  • 锈蚀:1.47
  • Web服务器:warp
  • 代码用于:使用ffmpeg的HLS流
  • repo:(缓存尚未实现)

您发布的代码没有编译,因为没有
fs::create
之类的东西,不过幸运的是您根本不需要它。该函数为您创建文件

至少在Linux上,从多个不同线程在同一路径上同时调用
fs::write
将导致包含传递给其中一个
fs::write
调用的内容的文件。请注意,如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它,则可能会导致多个线程重新计算相同的值,然后所有线程都将其写入文件


您应该知道,由于您使用的是async/await,因此不允许使用
std::fs
模块,因为它会阻塞线程。您可以这样使用
tokio::fs::write

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
另一个正确的选择是这样使用:

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
在Tokio的文档中,您可以阅读更多关于为什么必须正确处理这种阻塞的信息

Tokio能够通过反复交换每个线程上当前正在运行的任务,在几个线程上同时运行多个任务。但是,这种交换只能发生在
.await
点,因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题,Tokio提供了两种线程:核心线程和阻塞线程。核心线程是所有异步代码运行的地方,默认情况下,Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的,可用于运行阻塞代码,否则会阻止其他任务运行

要生成阻塞任务,应使用函数

注意,我已经链接到了Tokio0.2的文档,因为warp还不支持Tokio0.3


如果函数在第一次调用完成之前被多次调用,为了防止多次计算该值,可以使用一种基于存储在互斥锁后面的
HashMap
的技术,如下所示:

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
完整示例可在上找到


由于互斥,此方法是完全线程安全的。请注意,这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息,请参阅Tokio教程。

您发布的代码不可编译,因为没有
fs::create
,不过幸运的是,您根本不需要它。该函数为您创建文件

至少在Linux上,从多个不同线程在同一路径上同时调用
fs::write
将导致包含传递给其中一个
fs::write
调用的内容的文件。请注意,如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它,则可能会导致多个线程重新计算相同的值,然后所有线程都将其写入文件


您应该知道,由于您使用的是async/await,因此不允许使用
std::fs
模块,因为它会阻塞线程。您可以这样使用
tokio::fs::write

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
另一个正确的选择是这样使用:

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
在Tokio的文档中,您可以阅读更多关于为什么必须正确处理这种阻塞的信息

Tokio能够通过反复交换每个线程上当前正在运行的任务,在几个线程上同时运行多个任务。但是,这种交换只能发生在
.await
点,因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题,Tokio提供了两种线程:核心线程和阻塞线程。核心线程是所有异步代码运行的地方,默认情况下,Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的,可用于运行阻塞代码,否则会阻止其他任务运行

要生成阻塞任务,应使用函数

注意,我已经链接到了Tokio0.2的文档,因为warp还不支持Tokio0.3


如果函数在第一次调用完成之前被多次调用,为了防止多次计算该值,可以使用一种基于存储在互斥锁后面的
HashMap
的技术,如下所示:

pub async fn segment_handler(segment: String) {
   if is_cached(&segment) {
       return get_from_cache(segment)
   }
   // Do some computation to get the result.
   let result = do_some_large_computation(segment);
   // Cache this result to a file.
   let file_name = &format!("./cache/{}", &segment);
   fs::create(file_name);
   fs::write(file_name, result).expect("Unable to write file");
   result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
         return get_from_cache(segment)
    }
    // Do some computation to get the result.
    let result = do_some_large_computation(segment);
    // Cache this result to a file.
    let file_name = &format!("./cache/{}", &segment);
    tokio::fs::write(file_name, result).await.expect("Unable to write file");
    result
}
pub async fn segment_handler(segment: String) {
    if is_cached {
        return get_from_cache(segment)
    }
    tokio::task::spawn_blocking(move || {
        // Do some computation to get the result.
        let result = do_some_large_computation(segment);
        // Cache this result to a file.
        let file_name = &format!("./cache/{}", &segment);
        tokio::fs::write(file_name, result).await.expect("Unable to write file");
        result
    }).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;

pub struct Cache {
    inner: Mutex<Inner>,
}
struct Inner {
    cached: HashMap<String, CachedType>,
    pending: HashMap<String, broadcast::Sender<CachedType>>,
}

pub enum TryCached {
    Exists(CachedType),
    Pending(broadcast::Receiver<CachedType>),
    New(),
}

impl Cache {
    pub fn try_get(&self, key: &str) -> TryCached {
        let mut inner = self.inner.lock().unwrap();
        if let Some(value) = inner.cached.get(key) {
            // To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
            TryCached::Exists(value.clone())
        } else if let Some(pending) = inner.pending.get(key) {
            TryCached::Pending(pending.subscribe())
        } else {
            let (channel, _) = broadcast::channel(1);
            inner.pending.insert(key.to_string(), channel);
            TryCached::New()
        }
    }
    pub fn put_computed(&self, key: String, value: CachedType) {
        let mut inner = self.inner.lock().unwrap();
        if let Some(chan) = inner.pending.remove(&key) {
            chan.send(value.clone());
        }
        inner.cached.insert(key, value);
    }
}
完整示例可在上找到


由于互斥,此方法是完全线程安全的。请注意,这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息,请参阅Tokio教程。

1。我计划在写入文件后将缓存状态存储在hashmap中。因此,它不会仅仅基于一个文件的存在。2.但即使使用上述实现和
tokio::task::spawn_阻塞
,对同一段的并发调用仍有可能尝试写入相同的文件、相同的内容。这可以避免吗?我添加了另一个部分来描述如何避免这种情况。这看起来很整洁。为了(过度)简化,几个hashmap上的互斥将通过执行以下操作来帮助管理状态:允许计算/获取缓存的res