Concurrency 如何对rust中的文件执行线程安全IO和缓存?
背景: 我正在编写一个web服务器,我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中(这些段的大小可以高达10MB)。大概是这样的:Concurrency 如何对rust中的文件执行线程安全IO和缓存?,concurrency,rust,io,file-access,Concurrency,Rust,Io,File Access,背景: 我正在编写一个web服务器,我们在其中处理不同的段。我想将这些不同的段缓存在不同的文件中(这些段的大小可以高达10MB)。大概是这样的: pub async fn segment_handler(segment: String) { if is_cached(&segment) { return get_from_cache(segment) } // Do some computation to get the result. let re
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
既然具有不同段的多个线程可以调用段处理程序,那么fs::write
线程安全吗?如果不是,我们就不能使用互斥,因为每次调用的段:String
可能不同,使用互斥将使性能更差。我需要类似互斥的东西,但只在segment:String
上。这个问题的解决办法是什么
环境:
- 锈蚀:1.47
- Web服务器:warp
- 代码用于:使用ffmpeg的HLS流
- repo:(缓存尚未实现)
您发布的代码没有编译,因为没有fs::create
之类的东西,不过幸运的是您根本不需要它。该函数为您创建文件
至少在Linux上,从多个不同线程在同一路径上同时调用fs::write
将导致包含传递给其中一个fs::write
调用的内容的文件。请注意,如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它,则可能会导致多个线程重新计算相同的值,然后所有线程都将其写入文件
您应该知道,由于您使用的是async/await,因此不允许使用std::fs
模块,因为它会阻塞线程。您可以这样使用tokio::fs::write
:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
另一个正确的选择是这样使用:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
在Tokio的文档中,您可以阅读更多关于为什么必须正确处理这种阻塞的信息
Tokio能够通过反复交换每个线程上当前正在运行的任务,在几个线程上同时运行多个任务。但是,这种交换只能发生在.await
点,因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题,Tokio提供了两种线程:核心线程和阻塞线程。核心线程是所有异步代码运行的地方,默认情况下,Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的,可用于运行阻塞代码,否则会阻止其他任务运行
要生成阻塞任务,应使用函数
注意,我已经链接到了Tokio0.2的文档,因为warp还不支持Tokio0.3
如果函数在第一次调用完成之前被多次调用,为了防止多次计算该值,可以使用一种基于存储在互斥锁后面的HashMap
的技术,如下所示:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
完整示例可在上找到
由于互斥,此方法是完全线程安全的。请注意,这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息,请参阅Tokio教程。您发布的代码不可编译,因为没有fs::create
,不过幸运的是,您根本不需要它。该函数为您创建文件
至少在Linux上,从多个不同线程在同一路径上同时调用fs::write
将导致包含传递给其中一个fs::write
调用的内容的文件。请注意,如果使用文件的存在性来确定是否需要从缓存中读取或重新计算它,则可能会导致多个线程重新计算相同的值,然后所有线程都将其写入文件
您应该知道,由于您使用的是async/await,因此不允许使用std::fs
模块,因为它会阻塞线程。您可以这样使用tokio::fs::write
:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
另一个正确的选择是这样使用:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
在Tokio的文档中,您可以阅读更多关于为什么必须正确处理这种阻塞的信息
Tokio能够通过反复交换每个线程上当前正在运行的任务,在几个线程上同时运行多个任务。但是,这种交换只能发生在.await
点,因此花费很长时间而没有到达.await的代码将阻止其他任务运行。为了解决这个问题,Tokio提供了两种线程:核心线程和阻塞线程。核心线程是所有异步代码运行的地方,默认情况下,Tokio将为每个CPU核心生成一个线程。阻塞线程是按需生成的,可用于运行阻塞代码,否则会阻止其他任务运行
要生成阻塞任务,应使用函数
注意,我已经链接到了Tokio0.2的文档,因为warp还不支持Tokio0.3
如果函数在第一次调用完成之前被多次调用,为了防止多次计算该值,可以使用一种基于存储在互斥锁后面的HashMap
的技术,如下所示:
pub async fn segment_handler(segment: String) {
if is_cached(&segment) {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
fs::create(file_name);
fs::write(file_name, result).expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}
pub async fn segment_handler(segment: String) {
if is_cached {
return get_from_cache(segment)
}
tokio::task::spawn_blocking(move || {
// Do some computation to get the result.
let result = do_some_large_computation(segment);
// Cache this result to a file.
let file_name = &format!("./cache/{}", &segment);
tokio::fs::write(file_name, result).await.expect("Unable to write file");
result
}).await.unwrap("Panic in spawn_blocking")
}
use std::collections::HashMap;
use std::sync::Mutex;
use tokio::sync::broadcast;
pub struct Cache {
inner: Mutex<Inner>,
}
struct Inner {
cached: HashMap<String, CachedType>,
pending: HashMap<String, broadcast::Sender<CachedType>>,
}
pub enum TryCached {
Exists(CachedType),
Pending(broadcast::Receiver<CachedType>),
New(),
}
impl Cache {
pub fn try_get(&self, key: &str) -> TryCached {
let mut inner = self.inner.lock().unwrap();
if let Some(value) = inner.cached.get(key) {
// To avoid clone, use HashMap<String, Arc<CachedType>> and clone anyway.
TryCached::Exists(value.clone())
} else if let Some(pending) = inner.pending.get(key) {
TryCached::Pending(pending.subscribe())
} else {
let (channel, _) = broadcast::channel(1);
inner.pending.insert(key.to_string(), channel);
TryCached::New()
}
}
pub fn put_computed(&self, key: String, value: CachedType) {
let mut inner = self.inner.lock().unwrap();
if let Some(chan) = inner.pending.remove(&key) {
chan.send(value.clone());
}
inner.cached.insert(key, value);
}
}
完整示例可在上找到
由于互斥,此方法是完全线程安全的。请注意,这将使用同步互斥体而不是异步互斥体。要了解更多关于这一点的信息,请参阅Tokio教程。1。我计划在写入文件后将缓存状态存储在hashmap中。因此,它不会仅仅基于一个文件的存在。2.但即使使用上述实现和tokio::task::spawn_阻塞
,对同一段的并发调用仍有可能尝试写入相同的文件、相同的内容。这可以避免吗?我添加了另一个部分来描述如何避免这种情况。这看起来很整洁。为了(过度)简化,几个hashmap上的互斥将通过执行以下操作来帮助管理状态:允许计算/获取缓存的res