Ruby on rails 3个字符的非英语字符串上的正则表达式错误(Java::JavaLang::ArrayIndexOutOfBoundsException:4)

Ruby on rails 3个字符的非英语字符串上的正则表达式错误(Java::JavaLang::ArrayIndexOutOfBoundsException:4),ruby-on-rails,jruby,Ruby On Rails,Jruby,以下步骤再现错误。有什么解决方法/修复方法吗?谢谢 这种情况也发生在 jruby 1.5.2/rails 2.3.9和jruby 1.6/rails 3.0.5 行动步骤 d:\myapp>jruby script/rails console Loading development environment (Rails 3.0.5) irb(main):001:0> regex = /(aaa|bbb):/ => /(aaa|bbb):/ irb(main):002:0>

以下步骤再现错误。有什么解决方法/修复方法吗?谢谢 这种情况也发生在

jruby 1.5.2/rails 2.3.9和jruby 1.6/rails 3.0.5

行动步骤

d:\myapp>jruby script/rails console
Loading development environment (Rails 3.0.5)
irb(main):001:0> regex = /(aaa|bbb):/
=> /(aaa|bbb):/
irb(main):002:0> str = "\343\202\242:"
=> "péó:"
irb(main):003:0> str =~ regex
Java::JavaLang::ArrayIndexOutOfBoundsException: 4
        from org.jcodings.MultiByteEncoding.safeLengthForUptoFour(MultiByteEncoding.java:5
        from org.jcodings.specific.NonStrictUTF8Encoding.length(NonStrictUTF8Encoding.java
        from org.joni.Matcher.forwardSearchRange(Matcher.java:124)
        from org.joni.Matcher.search(Matcher.java:432)
        from org.jruby.RubyRegexp.search(RubyRegexp.java:1474)
        from org.jruby.RubyRegexp.op_match(RubyRegexp.java:1391)
        from org.jruby.RubyString.op_match(RubyString.java:1557)
        from org.jruby.RubyString$i$1$0$op_match.call(RubyString$i$1$0$op_match.gen:65535)
        from org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:
        from org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:139)
        from org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
        from org.jruby.ast.NewlineNode.interpret(NewlineNode.java:103)
        from org.jruby.ast.RootNode.interpret(RootNode.java:129)
        from org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95)
        from org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:160)
        from org.jruby.RubyKernel.evalCommon(RubyKernel.java:1134)
... 158 levels...
        from org.jruby.RubyKernel$s$1$0$require.call(RubyKernel$s$1$0$require.gen:65535)
        from org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrNBlock.call(Java
        from org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
        from org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:61)
        from org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:
        from org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:139)
        from script.rails.__file__(script/rails:6)
        from script.rails.load(script/rails)
        from org.jruby.Ruby.runScript(Ruby.java:670)
        from org.jruby.Ruby.runNormally(Ruby.java:574)
        from org.jruby.Ruby.runFromMain(Ruby.java:423)
        from org.jruby.Main.doRunFromMain(Main.java:278)
        from org.jruby.Main.internalRun(Main.java:198)
        from org.jruby.Main.run(Main.java:164)
        from org.jruby.Main.run(Main.java:148)
        from org.jruby.Main.main(Main.java:128)irb(main):004:0>

也许正则表达式没有分隔。
怎么样
str=~/(aaa | bbb):/


正则表达式='(aaa | bbb):'

str=~/regex/

我认为,因为要传递一个包含多字节字符的字符串,所以需要传递/u regex参数以将其解析为UTF-8字符串

刚刚在不同版本的Ruby中检查了这一点,它只出现在JRuby中,所以我认为您发现了一个bug;)

如果您使用类似“string”的东西,那么首先使用java\u string似乎是可行的,但实际上它首先将其转换为ISO-8859-1,这是您不想要的。要保持编码,只需使用.To_java并将正则表达式传递给它即可

我认为这是一个可行的解决办法:

regex = /(aaa|bbb):/u
str = "\343\202\242:"
str.to_java =~ regex

使用str=~/regex/正则表达式为/regex/not/(aaa | bbb):/因为正则表达式不会被其值替换。
regex = /(aaa|bbb):/u
str = "\343\202\242:"
str.to_java =~ regex