Hadoop 复制块到文件映射下的HDFS
HDFS文件系统显示,由于机架故障,集群上大约有600K个块未被复制。在HDFS恢复之前,有没有办法知道如果这些块丢失,哪些文件会受到影响?Hadoop 复制块到文件映射下的HDFS,hadoop,hdfs,Hadoop,Hdfs,HDFS文件系统显示,由于机架故障,集群上大约有600K个块未被复制。在HDFS恢复之前,有没有办法知道如果这些块丢失,哪些文件会受到影响? 我无法执行“fsck/”,因为集群非常大。Namenode UI列出缺失的块,JMX日志列出损坏/缺失的块。UI和JMX只显示未复制块的数量 有两种方法可以查看未复制的块/文件:使用fsck或WebHDFS API 使用WebHDFS REST API curl -i "http://<HOST>:<PORT>/webhdfs/v
我无法执行“fsck/”,因为集群非常大。Namenode UI列出缺失的块,JMX日志列出损坏/缺失的块。UI和JMX只显示未复制块的数量 有两种方法可以查看未复制的块/文件:使用fsck或WebHDFS API 使用WebHDFS REST API
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"
curl-i“http://:/webhdfs/v1/?op=LISTSTATUS”
这将返回带有FileStatuses JSON对象的响应。解析JSON对象并筛选复制小于配置值的文件
请在下面找到NN返回的示例响应:
curl -i "http://<NN_HOST>:<HTTP_PORT>/webhdfs/v1/<PATH_OF_DIRECTORY>?op=LISTSTATUS"
HTTP/1.1 200 OK
Cache-Control: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)
{"FileStatuses":{"FileStatus":[
{"accessTime":1489059994224,"blockSize":134217728,"childrenNum":0,"fileId":209158298,"group":"hdfs","length":0,"modificationTime":1489059994227,"owner":"XXX","pathSuffix":"_SUCCESS","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059969939,"blockSize":134217728,"childrenNum":0,"fileId":209158053,"group":"hdfs","length":0,"modificationTime":1489059986846,"owner":"XXX","pathSuffix":"part-m-00000","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059982614,"blockSize":134217728,"childrenNum":0,"fileId":209158225,"group":"hdfs","length":0,"modificationTime":1489059993497,"owner":"XXX","pathSuffix":"part-m-00001","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"},
{"accessTime":1489059977524,"blockSize":134217728,"childrenNum":0,"fileId":209158188,"group":"hdfs","length":0,"modificationTime":1489059983034,"owner":"XXX","pathSuffix":"part-m-00002","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
curl-i“http://:/webhdfs/v1/?op=LISTSTATUS”
HTTP/1.1200ok
缓存控制:没有缓存
内容类型:application/json
传输编码:分块
服务器:Jetty(6.1.26.hwx)
{“文件状态”:{“文件状态”:[
{“accessTime”:1489059994224,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158298,“group”:“hdfs”,“length”:0,“modificationTime”:1489059994227,“owner”:“XXX”,“pathSuffix”:“_SUCCESS”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059969939,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158053,“group”:“hdfs”,“length”:0,“modificationTime”:1489059986846,“owner”:“XXX”,“pathSuffix”:“part-m-00000”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059982614,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158225,“group”:“hdfs”,“length”:0,“modificationTime”:1489059993497,“owner”:“XXX”,“pathSuffix”:“part-m-00001”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”},
{“accessTime”:1489059977524,“blockSize”:134217728,“childrenNum”:0,“fileId”:209158188,“group”:“hdfs”,“length”:0,“modificationTime”:1489059983034,“owner”:“XXX”,“pathSuffix”:“part-m-00002”,“permission”:“644”,“replication”:3,“storagePolicy”:0,“type”:“FILE”}}}
如果文件数量更多,还可以使用?op=LISTSTATUS\u BATCH&startAfter=
参考资料:您的问题有更好的解决方案 快跑
hdfs dfsadmin -metasave <filename>
hdfs-dfsadmin-metasave
所有未复制的元数据块文件路径和其他信息都将存储到一个文件中,您可以直接查看该文件
对我来说,这似乎是一个更好的选择NameNode UI不应该显示这些文件吗?@cricket\u 007 name node UI只是给了我复制不足的块的计数。它仅在块的所有副本丢失时才列出文件名。