Regex 要匹配SSH url部分的正则表达式_Regex_Url_Ssh_Matching

Regex 要匹配SSH url部分的正则表达式

regex url ssh

Regex 要匹配SSH url部分的正则表达式,regex,url,ssh,matching,Regex,Url,Ssh,Matching,给定以下SSH URL： git@github.com:james/example git@github.com:007/example git@github.com:22/james/example git@github.com:22/007/example 如何提取以下内容： {user}@{host}:{optional port}{path (user/repo)} 正如您在示例中看到的，其中一个用户名是数字而不是端口。我不知道该怎么解决这个问题。端口并不总是在URL中我现在的正则

给定以下SSH URL：

git@github.com:james/example
git@github.com:007/example
git@github.com:22/james/example
git@github.com:22/007/example

如何提取以下内容：

{user}@{host}:{optional port}{path (user/repo)}

正如您在示例中看到的，其中一个用户名是数字而不是端口。我不知道该怎么解决这个问题。端口并不总是在URL中

我现在的正则表达式是：

^(?P<user>[^@]+)@(?P<host>[^:\s]+)?:(?:(?P<port>\d{1,5})\/)?(?P<path>[^\\].*)$

^（？P[^@]+）@（？P[^:\s]+）：（？：（？：（？P\d{1,5}）\/）？（？P[^\\]$

不知道还有什么可以尝试。

懒惰的量词来拯救

这似乎运行良好，并满足可选端口：

^
(?P<user>.*?)@
(?P<host>.*?):
(?:(?P<port>.*?)/)?
(?P<path>.*?/.*?)
$

^
（？第页*）@
（？P.*）：
（？：（？P.*）/）？
（？P.*？/.*
$

换行符不是正则表达式的一部分，因为启用了

/x

修饰符。如果未使用

/x

，请删除所有换行符

感谢您对。

懒惰量词的拯救

这似乎运行良好，并满足可选端口：

^
(?P<user>.*?)@
(?P<host>.*?):
(?:(?P<port>.*?)/)?
(?P<path>.*?/.*?)
$

^
（？第页*）@
（？P.*）：
（？：（？P.*）/）？
（？P.*？/.*
$

换行符不是正则表达式的一部分，因为启用了

/x

修饰符。如果未使用

/x

，请删除所有换行符

感谢您的帮助。

如果您使用的是

Python

，您可以编写自己的解析器：

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = """git@github.com:james/example
git@github.com:007/example
git@github.com:22/james/example
git@github.com:22/007/example"""

class GitVisitor(NodeVisitor):
    grammar = Grammar(
        r"""
        expr        = user at domain colon rest

        user        = word+
        domain      = ~"[^:]+"
        rest        = (port path) / path

        path        = word slash word
        port        = digits slash

        slash       = "/"
        colon       = ":"
        at          = "@"
        digits      = ~"\d+"
        word        = ~"\w+"

        """)

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_user(self, node, visited_children):
        return {"user": node.text}

    def visit_domain(self, node, visited_children):
        return {"domain": node.text}

    def visit_rest(self, node, visited_children):
        child = visited_children[0]
        if isinstance(child, list):
            # first branch, port and path
            return {"port": child[0], "path": child[1]}
        else:
            return {"path": child}

    def visit_path(self, node, visited_children):
        return node.text

    def visit_port(self, node, visited_children):
        digits, _ = visited_children
        return digits.text

    def visit_expr(self, node, visited_children):
        out = {}
        _ = [out.update(child) for child in visited_children if isinstance(child, dict)]
        return out

gv = GitVisitor()
for line in data.split("\n"):
    result = gv.parse(line)
    print(result)

这将产生

{'user': 'git', 'domain': 'github.com', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'path': '007/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': '007/example'}

解析器允许一些明显存在的歧义。

如果您使用的是

Python

，您可以编写自己的解析器：

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = """git@github.com:james/example
git@github.com:007/example
git@github.com:22/james/example
git@github.com:22/007/example"""

class GitVisitor(NodeVisitor):
    grammar = Grammar(
        r"""
        expr        = user at domain colon rest

        user        = word+
        domain      = ~"[^:]+"
        rest        = (port path) / path

        path        = word slash word
        port        = digits slash

        slash       = "/"
        colon       = ":"
        at          = "@"
        digits      = ~"\d+"
        word        = ~"\w+"

        """)

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_user(self, node, visited_children):
        return {"user": node.text}

    def visit_domain(self, node, visited_children):
        return {"domain": node.text}

    def visit_rest(self, node, visited_children):
        child = visited_children[0]
        if isinstance(child, list):
            # first branch, port and path
            return {"port": child[0], "path": child[1]}
        else:
            return {"path": child}

    def visit_path(self, node, visited_children):
        return node.text

    def visit_port(self, node, visited_children):
        digits, _ = visited_children
        return digits.text

    def visit_expr(self, node, visited_children):
        out = {}
        _ = [out.update(child) for child in visited_children if isinstance(child, dict)]
        return out

gv = GitVisitor()
for line in data.split("\n"):
    result = gv.parse(line)
    print(result)

这将产生

{'user': 'git', 'domain': 'github.com', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'path': '007/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': '007/example'}

语法分析器允许一些明显的歧义。

可能是一个小语法分析器？@Jan你的意思是不使用正则表达式来做这件事吗？请看我的答案bewlo（但用另一个较短的答案）。可能是一个小语法分析器？@Jan你的意思是不使用正则表达式来做这件事吗？请看我的答案bewlo（但用另一个较短的答案）。@ThatGuy343正确，

位于冒号之后

：

@ThatGuy343如果

不是端口，那么哪个捕获组应该包含它？由于该条目“无效”，是否应完全忽略该条目？我的目标是创建一个简单的解析器，而不是验证器007，它是用户名和路径的一部分。路径组是这样的：

username/repo

@MonkeyZeus+1我想你很接近了，但是如果git@github.com：22/詹姆斯。。。它将匹配路径为22/james，我相信应该是james…@MonkeyZeus:+1但有一些优化：a）使用详细模式，b）现在使用的非捕获是多余的，c）去掉端口的正斜杠。总而言之，请参见@ThatGuy343正确，

位于冒号之后

：

@ThatGuy343如果

username/repo

@MonkeyZeus+1我想你很接近了，但是如果git@github.com：22/詹姆斯。。。它将匹配路径为22/james，我相信应该是james…@MonkeyZeus:+1但有一些优化：a）使用详细模式，b）现在使用的非捕获是多余的，c）去掉端口的正斜杠。总而言之，你知道，我在两个多月前开始参加regex标签，以提高我的技能，我觉得我已经取得了很好的进步；不确定我是否是专家，但至少regex看起来不再那么神秘了（这主要归功于使用regex可视化工具）。我从来没有想过我会进入解析器，但你的帖子是如此漫不经心地说“嗯，试试这个解析器”，这可能是我的下一次冒险，lol@MonkeyZeus：很高兴这里又有一位旅客。一个好的起点是在

Python

中进行一般概述和PEG解析器。这真的拓宽了可能性。@MonkeyZeus:无耻的自我宣传：你可以很好地将正则表达式和解析器结合起来，使两者兼而有之，你知道，我在两个多月前开始参与regex标记，以提高我的技能，我觉得我已经取得了很好的进步；不确定我是否是专家，但至少regex看起来不再那么神秘了（这主要归功于使用regex可视化工具）。我从来没有想过我会进入解析器，但你的帖子是如此漫不经心地说“嗯，试试这个解析器”，这可能是我的下一次冒险，lol@MonkeyZeus：很高兴这里又有一位旅客。一个好的起点是在

Python

中进行一般概述和PEG解析器。它确实拓宽了可能性。@MonkeyZeus:无耻的自我宣传：你可以很好地将正则表达式和解析器结合在一起，使两者兼得优势，请参见