Regex 需要自动确定将导致正则表达式匹配的文本_Regex

Regex 需要自动确定将导致正则表达式匹配的文本

regex

Regex 需要自动确定将导致正则表达式匹配的文本,regex,Regex,我需要提供自动匹配正则表达式的文本例如，给定正则表达式：[Tt]rue 我需要生成True或True 我只需要为每个正则表达式生成一个有效的匹配项，希望这使它更容易解决正则表达式可能带来的无限可能性我不确定这是否可能，因为正则表达式是如何构建的。另外，我不确定我会搜索什么。“反转”火柴这一更常见的问题往往会掩盖我的问题如果重要的话，我用的是C。如果解决方案需要其他技术，也可以答复: 我被指向Xeger，一个用于此目的的Java库，这让我想到：这是Xeger和Xeger使用的dk.bri

我需要提供自动匹配正则表达式的文本

例如，给定正则表达式：

[Tt]rue

我需要生成

True

或

True

我只需要为每个正则表达式生成一个有效的匹配项，希望这使它更容易解决正则表达式可能带来的无限可能性

我不确定这是否可能，因为正则表达式是如何构建的。另外，我不确定我会搜索什么。“反转”火柴这一更常见的问题往往会掩盖我的问题

如果重要的话，我用的是C。如果解决方案需要其他技术，也可以

答复: 我被指向Xeger，一个用于此目的的Java库，这让我想到：

这是Xeger和Xeger使用的dk.brics.automaton的C#端口。

您可以使用实现这一点（用于从正则表达式生成随机文本的Java库）

从文件中可以看出：

可以将其视为正则表达式匹配器的反面这个库允许您生成保证与传入的正则表达式

让我们以正则表达式为例：

[ab]{4,6}c

使用Xeger，您现在可以生成与此模式匹配的字符串，如下所示：

Xeger网站还建议检查其局限性。在这里，您可以找到由限制定义的内容：

regexp  ::=     unionexp                
|                       
unionexp        ::=     interexp | unionexp     (union) 
|       interexp                
interexp        ::=     concatexp & interexp    (intersection)  [OPTIONAL]
|       concatexp               
concatexp       ::=     repeatexp concatexp     (concatenation) 
|       repeatexp               
repeatexp       ::=     repeatexp ?     (zero or one occurrence)        
|       repeatexp *     (zero or more occurrences)      
|       repeatexp +     (one or more occurrences)       
|       repeatexp {n}   (n occurrences) 
|       repeatexp {n,}  (n or more occurrences) 
|       repeatexp {n,m} (n to m occurrences, including both)    
|       complexp                
complexp        ::=     ~ complexp      (complement)    [OPTIONAL]
|       charclassexp            
charclassexp    ::=     [ charclasses ] (character class)       
|       [^ charclasses ]        (negated character class)       
|       simpleexp               
charclasses     ::=     charclass charclasses           
|       charclass               
charclass       ::=     charexp - charexp       (character range, including end-points) 
|       charexp         
simpleexp       ::=     charexp         
|       .       (any single character)  
|       #       (the empty language)    [OPTIONAL]
|       @       (any string)    [OPTIONAL]
|       " <Unicode string without double-quotes> "      (a string)      
|       ( )     (the empty string)      
|       ( unionexp )    (precedence override)   
|       < <identifier> >        (named automaton)       [OPTIONAL]
|       <n-m>   (numerical interval)    [OPTIONAL]
charexp ::=     <Unicode character>     (a single non-reserved character)       
|       \ <Unicode character>   (a single character)

regexp:：=unionexp
|                       
unionexp:：=interexp | unionexp（union）
|区间
interexp:：=concatexp和interexp（交叉点）[可选]
|concatexp
concatexp:：=repeatexp concatexp（串联）
|repeatexp
repeatexp:：=repeatexp？（零次或一次出现）
|repeatexp*（零次或多次出现）
|repeatexp+（一次或多次出现）
|repeatexp{n}（n次）
|repeatexp{n，}（n次或多次出现）
|repeatexp{n，m}（n到m次，包括两者）
|复合物
complexp:：=~complexp（补码）[可选]
|charclassexp
charclassexp:：=[charclasses]（字符类）
|[^charclasses]（否定字符类）
|simpleexp
charclasses:：=charclass charclasses
|查尔斯班
charclass:：=charexp-charexp（字符范围，包括端点）
|charexp
simpleexp:：=charexp
|       .       （任何单个字符）
|#（空语言）[可选]
|@（任意字符串）[可选]
|“”（字符串）
|（）（空字符串）
|（unionexp）（优先覆盖）
|<>（命名自动机）[可选]
|（数字间隔）[可选]
charexp:：=（单个非保留字符）
|\（单个字符）

我认为您应该测试简单的正则表达式并逐步添加更复杂的功能，以便发现它是否有助于生成数据

您将如何处理量词或通配符？当然，

将进入一个无限循环。实际上，我只需要生成一个有效匹配。因此，任何有效的匹配都符合我的目的。我会用这个更新我的问题。谢谢。我从未检查过，但是regex101.com背后的代码是否免费提供？问题追踪器至少在github上，不知道源代码本身。如果是这样的话，您可以使用生成解释的算法来生成匹配的文本段。正则表达式上应该支持哪些扩展？基本版本足够了吗，或者它应该支持Perl版本吗？@AdamSmith:是的，这确实是不可能的。对于Perl正则表达式，即使确定是否有无限多的字符串也是不可能的，因为它们是图灵完全的。对于传统的，这是可能的。谢谢你指出Xeger。通过研究Xeger，我发现：这是Xeger使用的主要自动机库的一个端口。Fare还碰巧有一个Xeger的C#端口，它似乎工作得很好，而不必通过Java导航我的代码。Fare中的单元测试显示了Xeger的使用情况，但是一旦下载了它的Nuget包，就可以在一行代码中启动并运行它。

regexp  ::=     unionexp                
|                       
unionexp        ::=     interexp | unionexp     (union) 
|       interexp                
interexp        ::=     concatexp & interexp    (intersection)  [OPTIONAL]
|       concatexp               
concatexp       ::=     repeatexp concatexp     (concatenation) 
|       repeatexp               
repeatexp       ::=     repeatexp ?     (zero or one occurrence)        
|       repeatexp *     (zero or more occurrences)      
|       repeatexp +     (one or more occurrences)       
|       repeatexp {n}   (n occurrences) 
|       repeatexp {n,}  (n or more occurrences) 
|       repeatexp {n,m} (n to m occurrences, including both)    
|       complexp                
complexp        ::=     ~ complexp      (complement)    [OPTIONAL]
|       charclassexp            
charclassexp    ::=     [ charclasses ] (character class)       
|       [^ charclasses ]        (negated character class)       
|       simpleexp               
charclasses     ::=     charclass charclasses           
|       charclass               
charclass       ::=     charexp - charexp       (character range, including end-points) 
|       charexp         
simpleexp       ::=     charexp         
|       .       (any single character)  
|       #       (the empty language)    [OPTIONAL]
|       @       (any string)    [OPTIONAL]
|       " <Unicode string without double-quotes> "      (a string)      
|       ( )     (the empty string)      
|       ( unionexp )    (precedence override)   
|       < <identifier> >        (named automaton)       [OPTIONAL]
|       <n-m>   (numerical interval)    [OPTIONAL]
charexp ::=     <Unicode character>     (a single non-reserved character)       
|       \ <Unicode character>   (a single character)