Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/django/21.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 复制块到文件映射下的HDFS_Hadoop_Hdfs - Fatal编程技术网

Hadoop 复制块到文件映射下的HDFS

Hadoop 复制块到文件映射下的HDFS,hadoop,hdfs,Hadoop,Hdfs,HDFS文件系统显示,由于机架故障,集群上大约有600K个块未被复制。在HDFS恢复之前,有没有办法知道如果这些块丢失,哪些文件会受到影响? 我无法执行“fsck/”,因为集群非常大。Namenode UI列出缺失的块,JMX日志列出损坏/缺失的块。UI和JMX只显示未复制块的数量 有两种方法可以查看未复制的块/文件:使用fsck或WebHDFS API 使用WebHDFS REST API curl -i "http://<HOST>:<PORT>/webhdfs/v

HDFS文件系统显示,由于机架故障,集群上大约有600K个块未被复制。在HDFS恢复之前,有没有办法知道如果这些块丢失,哪些文件会受到影响?
我无法执行“fsck/”,因为集群非常大。

Namenode UI列出缺失的块,JMX日志列出损坏/缺失的块。UI和JMX只显示未复制块的数量

有两种方法可以查看未复制的块/文件:使用fsck或WebHDFS API

使用WebHDFS REST API

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"
curl-i“http://:/webhdfs/v1/?op=LISTSTATUS”
这将返回带有FileStatuses JSON对象的响应。解析JSON对象并筛选复制小于配置值的文件

请在下面找到NN返回的示例响应:

curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<PATH_OF_DIRECTORY>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)

{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
curl-i“http://:/webhdfs/v1/?op=LISTSTATUS”
HTTP/1.1200ok
缓存控制:没有缓存
内容类型:application/json
传输编码:分块
服务器:Jetty(6.1.26.hwx)
{“文件状态”:{“文件状态”:[
{“accessTime”:1489059994224,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158298,“group”:“hdfs”,“length”:0,“modificationTime”:1489059994227,“owner”:“XXX”,“pathSuffix”:“_SUCCESS”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059969939,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158053,“group”:“hdfs”,“length”:0,“modificationTime”:1489059986846,“owner”:“XXX”,“pathSuffix”:“part-m-00000”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059982614,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158225,“group”:“hdfs”,“length”:0,“modificationTime”:1489059993497,“owner”:“XXX”,“pathSuffix”:“part-m-00001”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059977524,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158188,“group”:“hdfs”,“length”:0,“modificationTime”:1489059983034,“owner”:“XXX”,“pathSuffix”:“part-m-00002”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”}}}
如果文件数量更多,还可以使用
?op=LISTSTATUS\u BATCH&startAfter=


参考资料:

您的问题有更好的解决方案

快跑

hdfs dfsadmin -metasave <filename>
hdfs-dfsadmin-metasave
所有未复制的元数据块文件路径和其他信息都将存储到一个文件中,您可以直接查看该文件


对我来说,这似乎是一个更好的选择

NameNode UI不应该显示这些文件吗?@cricket\u 007 name node UI只是给了我复制不足的块的计数。它仅在块的所有副本丢失时才列出文件名。