Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/380.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java Apache Commons UrlValidator-配置为允许umlaut字符_Java_Special Characters_Apache Commons_Url Validation - Fatal编程技术网

Java Apache Commons UrlValidator-配置为允许umlaut字符

Java Apache Commons UrlValidator-配置为允许umlaut字符,java,special-characters,apache-commons,url-validation,Java,Special Characters,Apache Commons,Url Validation,我想验证一长串URL字符串,但其中一些包含umlaut字符,例如:ä、á、è、ö等 有没有办法配置ApacheCommonsUrlValidator来接受这些字符 此测试失败(注意ã): @Test public void urlValidatorShouldPassWithUmlaut() { // Given org.apache.commons.validator.routines.UrlValidator validator; validator = new Ur

我想验证一长串URL字符串,但其中一些包含umlaut字符,例如:ä、á、è、ö等

有没有办法配置ApacheCommonsUrlValidator来接受这些字符

此测试失败(注意ã):

@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/São_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/Sao_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
该测试通过(替换为a):

@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/São_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/Sao_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
软件版本:

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>1.4.0</version>
</dependency>

通用验证器
通用验证器
1.4.0
更新:

@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/São_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/Sao_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}

validator.isValid(IDN.toASCII(url))
也会失败,因为
IDN.toASCII(url)
做了一些我还不了解的事情,例如它转换
http://dbpedia.org/resource/S圣保罗
into
http://dbpedia.xn--org/resource/so_paulo-w1b
,根据
UrlValidator

的规定,仍然无效。在验证umlaut部件之前,必须将其编码为:

import org.apache.commons.validator.routines.UrlValidator;

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

public class UmlautUrlTest {
    public static void main(String[] args) {
        String url = "http://dbpedia.org/resource/";
        String umlautPart="São_Paulo";
        UrlValidator v= null;
        try {
            String s[]={"http", "https"};
            v = new UrlValidator(s, UrlValidator.ALLOW_ALL_SCHEMES);
            String encodedUrl=URLEncoder.encode(umlautPart,"UTF-8");
            System.out.println(v.isValid(url+encodedUrl));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
    }
}
输出为:

true
S%C3%A3o_Paulo
编辑:

@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/São_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
@Test
public void urlValidatorShouldPassWithUmlaut()
{
    // Given
    org.apache.commons.validator.routines.UrlValidator validator;
    validator = new UrlValidator( new String[] { "http", "https" }, UrlValidator.ALLOW_ALL_SCHEMES );

    // When
    String url = "http://dbpedia.org/resource/Sao_Paulo";

    // Then
    assertThat( validator.isValid( url ), is( true ) );
}
您可以使用此函数对整个url进行编码以进行解析

public static String encodeUrl(String url) {
        String temp[] = url.split("://");
        String protocol = temp[0];
        String restOfUrl = temp[1];
        temp = restOfUrl.split("\\.");
        //for the all except last token of host
        for (int i = 0; i < temp.length - 1; i++) {
            try {
                temp[i] = URLEncoder.encode(temp[i], "UTF-8");
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
            }
        }
        String temp2[] = temp[temp.length - 1].split("/");
        String host = "";
        for (int i = 0; i < temp.length - 1; i++) {
            host = host + temp[i];
        }
        try {
            host = host + "." + URLEncoder.encode(temp2[0], "UTF-8");
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
        }
        host = host.substring(0);
        String remainingPart = "";
        for (int i = 1; i < temp2.length; i++) {
            try {
                remainingPart = remainingPart + "/" + URLEncoder.encode(temp2[i], "UTF-8");
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
            }
        }
        return (protocol + "://" + host + remainingPart);
    }
公共静态字符串编码url(字符串url){
字符串temp[]=url.split(“:/”);
字符串协议=临时[0];
字符串restOfUrl=temp[1];
温度=剩余分离(“\\”);
//对于除主机最后一个令牌之外的所有令牌
对于(int i=0;i

并在测试中使用:
validator.isValid(encodeUrl(url))

在阅读此SO问题()时,我发现另一个部分解决方案如下:

    public static boolean removeAccentsAndValidateUrl( String url )
    {
        String normalizedUrl = Normalizer.normalize( url, Normalizer.Form.NFD );
        Pattern accentsPattern = Pattern.compile( "\\p{InCombiningDiacriticalMarks}+" );
        String urlWithoutAccents = accentsPattern.matcher( normalizedUrl ).replaceAll( "" );
        String[] schemes = {"http", "https"};
        long options = UrlValidator.ALLOW_ALL_SCHEMES;
        UrlValidator urlValidator = new UrlValidator( schemes, options );
        return urlValidator.isValid(urlWithoutAccents);
    }
然而,事实证明UrlValidator在(除其他字符外)上也会失败

例如,以下验证失败:

http://dbpedia.org/resource/PENTA_–_Pena_Transportes_Aereos

您使用的是org.apache.commons.validator.routines.urlvidator还是org.apache.commons.validator.urlvidator?org.apache.commons.validator.routines.urlvidator(org.apache.commons.validator.urlvidator已弃用)您是否尝试过对
IDN.toASCII(url)
运行验证?谢谢,我刚刚尝试过,但不起作用,请参阅下面的注释widn.toAscii(“)==”,该注释仍然无效,无法告知“UrlValidator”(或其他库-我愿意接受建议)允许使用UMLAUT,而不是执行编码?代码应该只验证许多文本元素是否按预期格式进行了格式化。转换文本元素似乎没有必要,效率低下,因为转换后的文本会立即被丢弃。@AlexAverbuch我给出的方法是处理特殊字符的标准方法n java URL的。我不知道有任何这样的特殊库。对我来说,我总是觉得最好坚持采用更为公认的方式来完成事情。顺便说一句,我的解决方案不进行转换,而是进行编码。@AlexAverbuch您所面临的任何问题在apache commons中都是一个开放的问题。看到这一点,如果您的解决方案是这样的,如果我不知道,我会怎么做umlaut位于URL的哪一部分?谢谢,我还编辑了一个错误,修复了路径中不必要的“.”字符编码