Hadoop 使用AmazonS3作为输入、输出并将中间结果存储在EMR map REDUCT作业中_Hadoop_Amazon Web Services_Amazon S3_Mapreduce_Amazon Emr

Hadoop 使用AmazonS3作为输入、输出并将中间结果存储在EMR map REDUCT作业中

hadoop amazon-web-services amazon-s3 mapreduce

Hadoop 使用AmazonS3作为输入、输出并将中间结果存储在EMR map REDUCT作业中,hadoop,amazon-web-services,amazon-s3,mapreduce,amazon-emr,Hadoop,Amazon Web Services,Amazon S3,Mapreduce,Amazon Emr,我正在尝试将AmazonS3存储与EMR结合使用。然而，当我当前运行我的代码时，我会遇到多个错误，如 java.lang.IllegalArgumentException: This file system object (hdfs://10.254.37.109:9000) does not support access to the request path 's3n://energydata/input/centers_200_10k_norm.csv' You possibly c

我正在尝试将AmazonS3存储与EMR结合使用。然而，当我当前运行我的代码时，我会遇到多个错误，如

java.lang.IllegalArgumentException: This file system object (hdfs://10.254.37.109:9000)    does not support access to the request path 's3n://energydata/input/centers_200_10k_norm.csv' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:384)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:429)
at edu.stanford.cs246.hw2.KMeans$CentroidMapper.setup(KMeans.java:112)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

在main中，我像这样设置输入和输出路径，并将s3n://energydata/input/centers\u 200\u 10k\u norm.csv放入配置文件中，我在mapper和reducer中检索该文件：

FileSystem fs = FileSystem.get(conf);
conf.set(CFILE, inPath); //inPath in this case is s3n://energydata/input/centers_200_10k_norm.csv
FileInputFormat.addInputPath(job, new Path(inputDir));
FileOutputFormat.setOutputPath(job, new Path(outputDir));

上述错误发生在我尝试访问CFILE（s3n://energydata/input/centers\u 200\u 10k\u norm.csv）的映射器和还原器中的特定示例。以下是我尝试获取路径的方式：

FileSystem fs = FileSystem.get(context.getConfiguration());
Path cFile = new Path(context.getConfiguration().get(CFILE));
DataInputStream d = new DataInputStream(fs.open(cFile));  ---->Error

s3n://energydata/input/centers_200_10k_norm.csv是程序的输入参数之一，当我启动EMR作业时，我将输入和输出目录指定为s3n://energydata/input和s3n://energydata/output

我试着按照中的建议去做，但还是有错误。任何帮助都将不胜感激

谢谢

试试看：

Path cFile = new Path(context.getConfiguration().get(CFILE));
FileSystem fs = cFile.getFileSystem(context.getConfiguration());
DataInputStream d = new DataInputStream(fs.open(cFile));

请尝试：

Path cFile = new Path(context.getConfiguration().get(CFILE));
FileSystem fs = cFile.getFileSystem(context.getConfiguration());
DataInputStream d = new DataInputStream(fs.open(cFile));

谢谢。实际上，我使用以下代码修复了它：

String uriStr =  "s3n://energydata/centroid/";
URI uri = URI.create(uriStr);
FileSystem fs = FileSystem.get(uri, context.getConfiguration());    
Path cFile = new Path(context.getConfiguration().get(CFILE));  
DataInputStream d = new DataInputStream(fs.open(cFile));

谢谢。实际上，我使用以下代码修复了它：

String uriStr =  "s3n://energydata/centroid/";
URI uri = URI.create(uriStr);
FileSystem fs = FileSystem.get(uri, context.getConfiguration());    
Path cFile = new Path(context.getConfiguration().get(CFILE));  
DataInputStream d = new DataInputStream(fs.open(cFile));

谢谢实际上，我通过使用以下代码修复了它：uriStr=“s3n://energydata/output/”；URI=URI.create（uriStr）；FileSystem fs=FileSystem.get（uri，context.getConfiguration（））；Path cFile=新路径（context.getConfiguration（）.get（cFile））；DataInputStream d=新的DataInputStream（fs.open（cFile））；是的，这也是一个类似的修正。主要是在OP中，文件系统句柄是默认的。Path.getFileSystem或FileSystem.get（Path，conf）获取特定路径的文件系统。实际上，我通过使用以下代码修复了它：uriStr=“s3n://energydata/output/”；URI=URI.create（uriStr）；FileSystem fs=FileSystem.get（uri，context.getConfiguration（））；Path cFile=新路径（context.getConfiguration（）.get（cFile））；DataInputStream d=新的DataInputStream（fs.open（cFile））；是的，这也是一个类似的修正。主要是在OP中，文件系统句柄是默认的。Path.getFileSystem或FileSystem.get（Path，conf）获取特定路径的文件系统