Text 如何提取编辑器仅显示为一行的大型文本文件的内容_Text_Sed_Grep

Text 如何提取编辑器仅显示为一行的大型文本文件的内容

text sed grep

Text 如何提取编辑器仅显示为一行的大型文本文件的内容,text,sed,grep,Text,Sed,Grep,我想从大型JSON文件中提取内容，这些文件在编辑器中显示为一行（因此我不能在一行的基础上操作），例如例如，有没有一种方法（sed、grep等？）我可以搜索单词000000523573并打印单词前后的100个字符？data.txt: {"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width":

我想从大型JSON文件中提取内容，这些文件在编辑器中显示为一行（因此我不能在一行的基础上操作），例如

例如，有没有一种方法（sed、grep等？）我可以搜索单词

000000523573

并打印单词前后的100个字符？

data.txt:

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

命令：

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

输出：

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

data.txt:

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

命令：

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

输出：

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}, {"license": 2, "file_name": "COCO_test2014_000000523574.jpg", "coco_url": "http://mscoco.org/images/523574", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523574}

cat data.txt | sed 's/\},\s{/}\n{/g' | grep "000000523573"

{"license": 2, "file_name": "COCO_test2014_000000523573.jpg", "coco_url": "http://mscoco.org/images/523573", "height": 500, "width": 423, "date_captured": "2013-11-14 12:21:59", "id": 523573}

是您希望用于本机解析JSON的工具。如果是结构化格式，不要将其视为随机文本

$ jq . < input.json
{
  "license": 2,
  "file_name": "COCO_test2014_000000523573.jpg",
  "coco_url": "http://mscoco.org/images/523573",
  "height": 500,
  "width": 423,
  "date_captured": "2013-11-14 12:21:59",
  "id": 523573
}
$ jq .height < input.json
500

这里的符号是。。。解释的时间比一个简单的答案更有意义。如果您对使用此工具感兴趣，请务必查看JQ查询结构。

是您希望用于本机解析JSON的工具。如果是结构化格式，不要将其视为随机文本

$ jq . < input.json
{
  "license": 2,
  "file_name": "COCO_test2014_000000523573.jpg",
  "coco_url": "http://mscoco.org/images/523573",
  "height": 500,
  "width": 423,
  "date_captured": "2013-11-14 12:21:59",
  "id": 523573
}
$ jq .height < input.json
500

这里的符号是。。。解释的时间比一个简单的答案更有意义。如果您对使用此工具感兴趣，请务必查看JQ查询结构。

如中所示，JQ绝对是您的最佳选择

至于您的确切问题（“搜索单词

000000523573

并打印前面的100个字符和后面的200个字符”）：您可以使用

grep-o

，如下所示：

grep -Eo '.{100}000000523573.{200}' infile

这有几个缺点：

如果
```
000000523573
```
出现在距离文件开头100个字符之前或距离文件结尾200个字符之后，则将忽略该文件
如果两个匹配项之间的距离小于300个字符，则将忽略后面的匹配项（重叠匹配项不由
```
grep-o
```
考虑）

通过放宽“在出现之前/之后最多打印100/200个字符”的要求，可以在一定程度上缓解这些问题：

但是，正确的方法还是使用jq。另请参见。

如中所示，jq绝对是您的最佳选择