Java 如何在缓存中同步文件创建（不是写入文件，只是创建）_Java_File Io_Concurrency_Synchronized_Concurrent Programming

Java 如何在缓存中同步文件创建（不是写入文件，只是创建）

java file-io concurrency

Java 如何在缓存中同步文件创建（不是写入文件，只是创建）,java,file-io,concurrency,synchronized,concurrent-programming,Java,File Io,Concurrency,Synchronized,Concurrent Programming,我有一个存储旧文件的存储库，就像存档一样。用户使用简单的web应用程序获取这些文件。我在运行web应用的服务器上维护一个简单的文件系统缓存。至少它看起来很简单，而这只是一个想法：）我需要同步该缓存中的文件创建，一次只允许一个线程从存档中获取相同的文件所有其他碰巧需要该文件的线程都必须等待第一个线程将其写入缓存，然后从缓存中获取该文件。起初我使用File.exists（）方法，但这并不好，因为在线程（锁所有者）创建一个空文件后，它会立即返回true（这样它就可以从存储库流开始写入）我

我有一个存储旧文件的存储库，就像存档一样。
用户使用简单的web应用程序获取这些文件。
我在运行web应用的服务器上维护一个简单的文件系统缓存。
至少它看起来很简单，而这只是一个想法：）

我需要同步该缓存中的文件创建，一次只允许一个线程从存档中获取相同的文件

所有其他碰巧需要该文件的线程都必须等待第一个线程将其写入缓存，然后从缓存中获取该文件。
起初我使用File.exists（）方法，但这并不好，因为在线程（锁所有者）创建一个空文件后，它会立即返回true（这样它就可以从存储库流开始写入）

我不确定这是否是正确的方法，但我使用静态映射（将文件ID映射到syncDummyObject）来跟踪当前正在获取的文件。
然后，我（尝试）在syncDummyObject上同步文件获取

这样做对吗？代码正在运行，但在我将其投入生产之前，我需要确保它运行良好

我考虑过使用暂存目录，在其中创建文件，并在文件完成后在缓存中传输它们，但这将打开另一组问题

为了更好的可读性，我删除了日志记录和错误处理的非相关部分

谢谢

public class RepoFileFetcher{

    private static volatile ConcurrentHashMap<String, Object> syncStrings = new ConcurrentHashMap<String, Object>();    
    private static final Object mapSync = new Object(); // map access sync
    private Boolean isFileBeingCreated = new Boolean(false);
    private Boolean isFileReadyInCache = new Boolean(false);


    public File getFileById(MikFileIdentifier cxfi){        
        File theFile = null; // file I'm going to return in the end

        try{
            Object syncObject = null;

            // sync map access
            synchronized(mapSync){

                if(syncStrings.containsKey(cxfi.getFilePath())){
                    // if the key exists in the map it means that
                    // it's being created by another thread
                    // fetch the object from the map 
                    // and use it to wait until file is created in cache
                    syncObject = syncStrings.get(cxfi.getFilePath());

                    isFileBeingCreated = true;


                }else if(!(new File(cxfi.getFilePath())).exists()){
                    // if it doesn't exist in map nor in cache it means that
                    // I'm the first one that fetches it from repo
                    // create new dummyLockObject and put it in the map
                    syncObject = new Object();
                    syncStrings.put(cxfi.getFilePath(), syncObject);

                }else{
                    // if it's not being created and exists in cache
                    // set flag so I can fetch if from the cache bellow
                    isFileReadyInCache = true;
                }
            }


            // potential problem that I'm splitting the critical section in half,
            // but I don't know how to avoid locking the whole fetching process
            // I want to lock only on the file that's being fetched, not fetching of all files (which I'd get if the mapSync was still locked)
            // What if, at this very moment, some other thread starts fetching the file and isFileBeingCreated becomes stale? Is it enough to check whether I succeeded renaming it and if not then fetch from cache? 


            if(!isFileBeingCreated && !isFileReadyInCache){

                // skip fetching from repo if another thread is currently fetching it
                // sync only on that file's map object

                synchronized(syncObject){
                    File pFile = new File(cxfi.getFilePath());
                    pFile.createNewFile();

                    // ...
                    // ... the part where I write to pFile from repo stream
                    // ...

                    if(!pFile.renameTo(theFile)){
                        // file is created by someone else 
                        // fetch it from cache
                        theFile = fetchFromCache(cxfi, syncObject);
                    }

                    syncStrings.remove(cxfi.getFilePath());

                    // notify all threads in queue that the file creation is over
                    syncObject.notifyAll();
                }//sync

            }else{

                theFile = fetchFromCache(cxfi, syncObject);
            }

            return theFile;


        }catch(...{
            // removed for better readability
        }finally{
            // remove from the map, otherwise I'll lock that file indefinitely
            syncStrings.remove(cxfi.getFilePath());
        }

        return null;
    }


    /**
     * Fetches the file from cache
     * @param cxfi File identification object
     * @param syncObject Used to obtain lock on file
     * @return File from cache
     * @throws MikFileSynchronizationException
     * @author mbonaci
     */
    private File fetchFromCache(FileIdentifier cxfi, Object syncObject)
            throws MikFileSynchronizationException{

        try{
            // wait till lock owner finishes creating the file
            // then fetch it from the cache

            synchronized(syncObject){   

                // wait until lock owner removes dummyObject from the map
                // while(syncStrings.containsKey(cxfi.getFilePath()))
                // syncObject.wait();                   

                File existingFile = new File(cxfi.getFilePath());
                if(existingFile.exists()){
                    return existingFile;
                }else{
                    // this should never happen
                    throw new MikFileSynchronizationException();
                }
            }

        }catch(InterruptedException ie){
            logger.error("Synchronization error", ie);
        }
        return null;
    }

我可以建议一些调整：

您已经在使用

ConcurrentHashMap

，不需要额外的锁

我会将“文件”包装在一个更智能的对象中，它有自己的同步。因此，您可以执行以下操作：

ConcurrentHashMap<String, CachedFile> myCache = ConcurrentHashMap<>();
CachedFile newFile = new CachedFile(<path>);
CachedFile file = myCache.putIfAbsent(<path>, newFile);
// Use the new file if it did not exist
if (file == null) file = newFile;
// This will be no-op if already cached, or will block is someone is caching this file.
file.cache();
// Now return the cached file.
return file.getFile();

使用路径和包装文件的“智能”对象在映射上调用
```
putIfAbsent（）
```
以上内容将返回值（如果路径不存在，则返回新值，或者返回现有包装器）
在包装器中有一个状态，知道它是否已经被缓存
调用
```
cache（）
```
，检查是否已缓存，如果已缓存，则不执行任何操作，否则将缓存
然后从包装器返回“文件”（例如
```
getFile（）
```
方法）

然后确保在包装器中为公共函数使用一个锁，这意味着当

cache（）

同时发生时，该锁将被阻塞

这是一张草图：

class CachedFile
{
  File realFile;
  // Initially not cached
  boolean cached = false;

  // Construct with file

  public synchronized boolean isCached()
  { return cached; }

  public synchronized void cache()
  {
    if (!cached)
    {
      // now load - safe in the knowledge that no one can get the file (or cache())
      ..
      cached = true; // done
    }
  }

  public synchronized <File> getFile()
  {
    // return the "file"
  }
}

类缓存文件
{
文件realFile；
//最初未缓存
布尔值=假；
//用文件构造
公共同步布尔值isCached（）
{返回缓存；}
公共同步的空缓存（）
{
如果（！缓存）
{
//现在，知道没有人可以获取文件（或缓存（））后，就可以使用load-safe了
..
cached=true；//完成
}
}
公共同步的getFile（）
{
//返回“文件”
}
}

现在，您的代码类似于：

ConcurrentHashMap<String, CachedFile> myCache = ConcurrentHashMap<>();
CachedFile newFile = new CachedFile(<path>);
CachedFile file = myCache.putIfAbsent(<path>, newFile);
// Use the new file if it did not exist
if (file == null) file = newFile;
// This will be no-op if already cached, or will block is someone is caching this file.
file.cache();
// Now return the cached file.
return file.getFile();

ConcurrentHashMap myCache=ConcurrentHashMap（）；
CachedFile newFile=新CachedFile（）；
CachedFile=myCache.putIfAbsent（，newFile）；
//如果新文件不存在，请使用它
如果（file==null）file=newFile；
//如果已经缓存，这将是无操作，或者如果有人正在缓存此文件，则将阻止操作。
file.cache（）；
//现在返回缓存文件。
返回file.getFile（）；

我的建议有意义吗？

在示例缓存代码中，您应该检查putIfAbsent的返回值。如果不存在条目，它将为null，在下一行（file.cache（））中生成一个NPE。类似CachedFile=new CachedFile（）的内容；CachedFile fetched=myCache.putIfAbsent（，文件）；file=fetched==null？文件：获取；会有帮助的。@Nim Pyranja谢谢你。所以你认为我应该在地图中保持缓存状态？正如您所看到的，我只是将其用作确定当前是否正在获取文件的临时方法。在我从repo获取一个文件后，我立即从映射中删除了元素。然后，您将如何处理通过我的（unix）缓存维护脚本从缓存中删除文件的事件？很难使用与脚本中相同的算法，因为“这取决于FS可用空间而不是过期时间”@mbonaci，这取决于您所说的“缓存”，传统上，这样做的目的是将文件加载/保存到内存中，以便后续获取文件的调用不必再次从磁盘加载-这对于小文件/频繁获取的文件很好。上面的

myCache

就是一个非常简单的例子（假设

cache（）

函数将内容读入

字符串

或类似的内容）。有多种缓存失效策略（例如，您可以设置一个简单的计时器-每次访问都会重置计时器，如果计时器过期，则从缓存中删除文件。）……TBH，如果文件真的那么小，我看不出“缓存”的意义，只需将文件从磁盘上拖到用户手中。性能上的任何差异都可能很小…@Nim就映射并发性而言，我知道CHM是在方法调用级别同步的，但我使用synchronized block将exists（）和put（）绑定到同一个操作中。我猜您的putIfAbsent（）这个建议解决了这个问题……我会尝试实施你们的解决方案，并尽快回复你们（如果可能的话），以防我陷入困境。