Java HDFS保证从文件读取/写入数据
我们希望保证消费者进程在生产者完成对HDFS中文件的写入后读取生产者创建的数据。下面是应用程序中使用的一种方法,我们正在努力改进 制作人:Java HDFS保证从文件读取/写入数据,java,file-io,hadoop,hdfs,Java,File Io,Hadoop,Hdfs,我们希望保证消费者进程在生产者完成对HDFS中文件的写入后读取生产者创建的数据。下面是应用程序中使用的一种方法,我们正在努力改进 制作人: private void produce(String file, int sleepSeconds) throws Exception { Configuration conf = new Configuration(); conf.addResource(new Path( "C:\\dev
private void produce(String file, int sleepSeconds) throws Exception {
Configuration conf = new Configuration();
conf.addResource(new Path(
"C:\\dev\\software\\hadoop-0.22.0-src\\conf\\core-site.xml"));
conf.set("fs.defaultFS", "hdfs://XXX:9000");
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path(file);
if (fileSystem.exists(path)) {
fileSystem.delete(path, false);
}
System.out.println("Creating file");
FSDataOutputStream out = fileSystem.create(path);
System.out.println("Writing data");
out.writeUTF("--data--");
System.out.println("Sleeping");
Thread.sleep(sleepSeconds * 1000L);
System.out.println("Writing data");
out.writeUTF("--data--");
System.out.println("Flushing");
out.flush();
out.close();
fileSystem.close();
System.out.println("Releasing lock on file");
}
消费者:
private void consume(String file) throws Exception {
Configuration conf = new Configuration();
conf.addResource(new Path(
"C:\\dev\\software\\hadoop-0.22.0-src\\conf\\core-site.xml"));
conf.set("fs.defaultFS", "hdfs://XXX:9000");
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path(file);
if (fileSystem.exists(path)) {
System.out.println("File exists");
} else {
System.out.println("File doesn't exist");
return;
}
FSDataOutputStream fsOut = null;
while (fsOut == null) {
try {
fsOut = fileSystem.append(path);
} catch (IOException e) {
Thread.sleep(1000);
}
}
FSDataInputStream in = fileSystem.open(path);
OutputStream out = new BufferedOutputStream(System.out);
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
in.close();
out.close();
if (fsOut != null)
fsOut.close();
fileSystem.close();
System.out.println("Releasing lock on file");
}
流程应如何运行的要求如下:
关于如何使用HDFS java API在保证读卡器不丢失数据的同时改进此代码/设计的任何建议?一种解决方案是使用临时后缀/前缀写入文件,并在写入完成后重命名该文件: 例如,输出到文件file1.txt:
- 写入名为
或.file1.txt
file1.txt.tmp
- 完成后关闭文件
- 将.file1.txt或
重命名为file1.txt.tmp
file1.txt
- 与此同时,消费者正在等待
变得可用file1.txt