带有多个管道的json文件的正则表达式

带有多个管道的json文件的正则表达式,json,regex,bash,unix,Json,Regex,Bash,Unix,我使用以下命令在unix中获取json: wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json 这给了我以下输出格式(显然每次都有不同的结果): 其中,数组子元素的每个元素都是一个对象,其结构如下: { "kind": "...", "data": { ... } } 下面是一个完整的.json get示例(正文太长,无法直接发布: 我需要在数组子元素的每个元素中打印完整的数据对象。我知道我至少需要两次管道,首先获取子元素

我使用以下命令在unix中获取json:

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json
这给了我以下输出格式(显然每次都有不同的结果):

其中,数组子元素的每个元素都是一个对象,其结构如下:

{
 "kind": "...",
 "data": {
 ...
 }
}
下面是一个完整的.json get示例(正文太长,无法直接发布:

我需要在数组子元素的每个元素中打印完整的数据对象。我知道我至少需要两次管道,首先获取子元素[…],然后从那里获取数据{…},这就是我目前所拥有的:

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

我对正则表达式不熟悉,所以我不知道如何处理在我正在渲染的元素中使用括号或大括号。上面的行没有在shell中打印任何内容,我不知道为什么。非常感谢您提供的任何帮助。

如果您想获得子数组,请尝试此方法,但我不确定这是否是您想要的

wget -O - https://www.reddit.com/r/NetflixBestOf/.json | sed -n '/children/,/],/p'

如果你想得到儿童数组,试试这个,但我不确定这是你想要的

wget -O - https://www.reddit.com/r/NetflixBestOf/.json | sed -n '/children/,/],/p'
代码

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'
wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.
关于regex的一些信息

* == zero or more time
+ == one or more time
? == zero or one time
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_)
\d == all numbers from 0 to 9
\r == carriage return
\n == new line character (line feed)
\ == escape special characters so they can to be read as normal characters
[...] == search for character class. Example: [abc] search for a or b or c
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.
\K == match start at this position.
无论如何,您可以从这里阅读有关regex的更多信息:

现在我可以试着解释代码了

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'
wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.
代码

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'
wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.
关于regex的一些信息

* == zero or more time
+ == one or more time
? == zero or one time
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_)
\d == all numbers from 0 to 9
\r == carriage return
\n == new line character (line feed)
\ == escape special characters so they can to be read as normal characters
[...] == search for character class. Example: [abc] search for a or b or c
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.
\K == match start at this position.
无论如何,您可以从这里阅读有关regex的更多信息:

现在我可以试着解释代码了

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'
wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

您是否愿意使用第三方实用程序?我通常使用jq二进制来轻松解析json数据。根据您的要求,您只需将json数据传递给jq,jq具有内部查询语言:cat/tmp/data | jq'.data.children |.[]'(此处/tmp/data包含完整的json)。通过使用这些实用程序,您实际上可以通过较短的查询和高级功能(如原始输出、查询等)完成工作。好的,最终目标是获取数据{}不是唯一的目标,这次正好是一个JSON格式,但是我想知道如何通过ReGEX来处理任何文件。ReGEX是唯一的选择吗?在我看来,ReGEX不是这个工作的合适工具。你会用JSON包来考虑Python之类的东西吗?你能使用第三方实用程序吗?我通常使用。jq binary可以轻松解析json数据。根据您的需求,您只需将json数据传递给jq,jq有一个内部查询语言:cat/tmp/data | jq'.data.children |.[]'(此处/tmp/data包含完整的json)。通过使用这些实用程序,您实际上可以通过较短的查询和高级功能(如原始输出、查询等)完成工作。好的,最终目标是获取数据{}这不是唯一的目标,这次正好是JSON格式,但是我想知道如何通过ReGEX来处理任何文件。ReGEX是你唯一的选择吗?在我看来,ReGEX不是这个工作的合适工具。你会用JSON包来考虑Python之类的东西吗?谢谢你的详细解释,非常有帮助。p问题,如果使用不带perl正则表达式语法的egrep,会有什么区别?请看这里:感谢您的详细解释,这非常有帮助。接下来的问题,如果使用不带perl正则表达式语法的egrep,会有什么区别?请看这里: