带有多个管道的json文件的正则表达式_Json_Regex_Bash_Unix

带有多个管道的json文件的正则表达式

json regex bash unix

带有多个管道的json文件的正则表达式,json,regex,bash,unix,Json,Regex,Bash,Unix,我使用以下命令在unix中获取json： wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json 这给了我以下输出格式（显然每次都有不同的结果）：其中，数组子元素的每个元素都是一个对象，其结构如下： { "kind": "...", "data": { ... } } 下面是一个完整的.json get示例（正文太长，无法直接发布：我需要在数组子元素的每个元素中打印完整的数据对象。我知道我至少需要两次管道，首先获取子元素

我使用以下命令在unix中获取json：

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json

这给了我以下输出格式（显然每次都有不同的结果）：

其中，数组子元素的每个元素都是一个对象，其结构如下：

{
 "kind": "...",
 "data": {
 ...
 }
}

下面是一个完整的.json get示例（正文太长，无法直接发布：

我需要在数组子元素的每个元素中打印完整的数据对象。我知道我至少需要两次管道，首先获取子元素[…]，然后从那里获取数据{…}，这就是我目前所拥有的：

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

我对正则表达式不熟悉，所以我不知道如何处理在我正在渲染的元素中使用括号或大括号。上面的行没有在shell中打印任何内容，我不知道为什么。非常感谢您提供的任何帮助。

如果您想获得子数组，请尝试此方法，但我不确定这是否是您想要的

wget -O - https://www.reddit.com/r/NetflixBestOf/.json | sed -n '/children/,/],/p'

如果你想得到儿童数组，试试这个，但我不确定这是你想要的

wget -O - https://www.reddit.com/r/NetflixBestOf/.json | sed -n '/children/,/],/p'

代码

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

关于regex的一些信息

* == zero or more time
+ == one or more time
? == zero or one time
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_)
\d == all numbers from 0 to 9
\r == carriage return
\n == new line character (line feed)
\ == escape special characters so they can to be read as normal characters
[...] == search for character class. Example: [abc] search for a or b or c
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.
\K == match start at this position.

无论如何，您可以从这里阅读有关regex的更多信息：

现在我可以试着解释代码了

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

代码

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

关于regex的一些信息

* == zero or more time
+ == one or more time
? == zero or one time
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_)
\d == all numbers from 0 to 9
\r == carriage return
\n == new line character (line feed)
\ == escape special characters so they can to be read as normal characters
[...] == search for character class. Example: [abc] search for a or b or c
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.
\K == match start at this position.

无论如何，您可以从这里阅读有关regex的更多信息：

现在我可以试着解释代码了

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

您是否愿意使用第三方实用程序？我通常使用jq二进制来轻松解析json数据。根据您的要求，您只需将json数据传递给jq，jq具有内部查询语言：cat/tmp/data | jq'.data.children |.[]'（此处/tmp/data包含完整的json）。通过使用这些实用程序，您实际上可以通过较短的查询和高级功能（如原始输出、查询等）完成工作。好的，最终目标是获取数据{}不是唯一的目标，这次正好是一个JSON格式，但是我想知道如何通过ReGEX来处理任何文件。ReGEX是唯一的选择吗？在我看来，ReGEX不是这个工作的合适工具。你会用JSON包来考虑Python之类的东西吗？你能使用第三方实用程序吗？我通常使用。jq binary可以轻松解析json数据。根据您的需求，您只需将json数据传递给jq，jq有一个内部查询语言：cat/tmp/data | jq'.data.children |.[]'（此处/tmp/data包含完整的json）。通过使用这些实用程序，您实际上可以通过较短的查询和高级功能（如原始输出、查询等）完成工作。好的，最终目标是获取数据{}这不是唯一的目标，这次正好是JSON格式，但是我想知道如何通过ReGEX来处理任何文件。ReGEX是你唯一的选择吗？在我看来，ReGEX不是这个工作的合适工具。你会用JSON包来考虑Python之类的东西吗？谢谢你的详细解释，非常有帮助。p问题，如果使用不带perl正则表达式语法的egrep，会有什么区别？请看这里：感谢您的详细解释，这非常有帮助。接下来的问题，如果使用不带perl正则表达式语法的egrep，会有什么区别？请看这里：