Hadoop 从AWS上EMR上的jar写入S3中的文件
是否有任何方法可以将Java jar中的文件写入S3文件夹,在S3文件夹中可以写入reduce文件?我试过这样的方法:Hadoop 从AWS上EMR上的jar写入S3中的文件,hadoop,amazon-web-services,amazon-s3,mapreduce,elastic-map-reduce,Hadoop,Amazon Web Services,Amazon S3,Mapreduce,Elastic Map Reduce,是否有任何方法可以将Java jar中的文件写入S3文件夹,在S3文件夹中可以写入reduce文件?我试过这样的方法: FileSystem fs = FileSystem.get(conf); FSDataOutputStream FS = fs.create(new Path("S3 folder output path"+"//Result.txt")); PrintWriter writer = new PrintWriter(FS);
FileSystem fs = FileSystem.get(conf);
FSDataOutputStream FS = fs.create(new Path("S3 folder output path"+"//Result.txt"));
PrintWriter writer = new PrintWriter(FS);
writer.write(averageDelay.toString());
writer.close();
FS.close();
这里Result.txt是我想写的新文件。回答我自己的问题:- 我发现了错误。我应该将S3文件夹路径的URI传递给文件系统对象,如下所示:-
FileSystem fileSystem = FileSystem.get(URI.create(otherArgs[1]),conf);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(otherArgs[1]+"//Result.txt"));
PrintWriter writer = new PrintWriter(fsDataOutputStream);
writer.write("\n Average Delay:"+averageDelay);
writer.close();
fsDataOutputStream.close();
FileSystem FileSystem=FileSystem.get(URI.create(otherArgs[1]),new JobConf(.class));
FSDataOutputStream FSDataOutputStream=fileSystem.create(新建)
路径(otherArgs[1]+“//Result.txt”);
PrintWriter writer=新的PrintWriter(fsDataOutputStream);
writer.write(“\n平均延迟:“+averageDelay”);
writer.close();
fsDataOutputStream.close();
这就是我在上面的代码块中处理conf变量的方式,它工作起来很有魅力。这里有另一种在Java中使用AWS S3 putObject和字符串缓冲区的方法
... AmazonS3 s3Client;
public void reduce(Text key, java.lang.Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws Exception {
UUID fileUUID = UUID.randomUUID();
SimpleDateFormat sdf = new SimpleDateFormat("yyy-MM-dd");
sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
String fileName = String.format("nightly-dump/%s/%s-%s",sdf.format(new Date()), key, fileUUID);
log.info("Filename = [{}]", fileName);
String content = "";
int count = 0;
for (Text value : values) {
count++;
String s3Line = value.toString();
content += s3Line + "\n";
}
log.info("Count = {}, S3Lines = \n{}", count, content);
PutObjectResult putObjectResult = s3Client.putObject(S3_BUCKETNAME, fileName, content);
log.info("Put versionId = {}", putObjectResult.getVersionId());
reduceWriteContext("1", "1");
context.setStatus("COMPLETED");
}
。。。亚马逊S3客户端;
公共void reduce(文本键、java.lang.Iterable值、Reducer.Context上下文)引发异常{
UUID fileUUID=UUID.randomUUID();
SimpleDataFormat sdf=新SimpleDataFormat(“yyy-MM-dd”);
sdf.setTimeZone(TimeZone.getTimeZone(“UTC”));
String fileName=String.format(“夜间转储/%s/%s-%s”,sdf.format(new Date()),key,fileUUID);
info(“Filename=[{}]”,Filename);
字符串内容=”;
整数计数=0;
用于(文本值:值){
计数++;
字符串s3Line=value.toString();
content+=s3Line+“\n”;
}
info(“Count={},S3Lines=\n{}”,Count,content);
PutObjectResult PutObjectResult=s3Client.putObject(S3_BUCKETNAME、文件名、内容);
info(“Put versionId={}”,putObjectResult.getVersionId());
简化书面文本(“1”、“1”);
上下文。设置状态(“已完成”);
}
btw,为什么不使用?它与您正在执行的方法一样具有可移植性,但对于长时间运行的作业可能更有用。您的代码中有什么conf
?什么是其他参数[1]
?此代码对以后查找问题的人没有帮助
... AmazonS3 s3Client;
public void reduce(Text key, java.lang.Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context) throws Exception {
UUID fileUUID = UUID.randomUUID();
SimpleDateFormat sdf = new SimpleDateFormat("yyy-MM-dd");
sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
String fileName = String.format("nightly-dump/%s/%s-%s",sdf.format(new Date()), key, fileUUID);
log.info("Filename = [{}]", fileName);
String content = "";
int count = 0;
for (Text value : values) {
count++;
String s3Line = value.toString();
content += s3Line + "\n";
}
log.info("Count = {}, S3Lines = \n{}", count, content);
PutObjectResult putObjectResult = s3Client.putObject(S3_BUCKETNAME, fileName, content);
log.info("Put versionId = {}", putObjectResult.getVersionId());
reduceWriteContext("1", "1");
context.setStatus("COMPLETED");
}