如何在JAVA中搜索阿拉伯语文本？_Java_Android_Arabic_Text Search

如何在JAVA中搜索阿拉伯语文本？

java android

如何在JAVA中搜索阿拉伯语文本？,java,android,arabic,text-search,Java,Android,Arabic,Text Search,我在数据库中有带变音符号的阿拉伯文文本。当我键入阿拉伯语搜索某个字符串时，它没有与数据库字符串绝对不匹配的变音符号。它在没有变音符号的文本上工作得很好。有没有办法在带有变音符号的文本上运行它？阿拉伯变音符号是字符，所以您可以像这样使用like子句： SELECT * FROM table WHERE name LIKE 'a[cd]*b[cd]*' 这将发现“ab”之间有任意数量的c或d 你可以在每个字母后面的方括号中加上所有阿拉伯字母的发音符号您可以通过其unicode代码点找到所有这些代

我在数据库中有带变音符号的阿拉伯文文本。当我键入阿拉伯语搜索某个字符串时，它没有与数据库字符串绝对不匹配的变音符号。它在没有变音符号的文本上工作得很好。有没有办法在带有变音符号的文本上运行它？

阿拉伯变音符号是字符，所以您可以像这样使用like子句：

SELECT * FROM table WHERE name LIKE 'a[cd]*b[cd]*'

这将发现“ab”之间有任意数量的c或d

你可以在每个字母后面的方括号中加上所有阿拉伯字母的发音符号

您可以通过其unicode代码点找到所有这些代码

有没有办法在带有变音符号的文本上运行它

不幸的是，没有。就像MIE说的：

阿拉伯文发音符号是字符

就我所知，这是不可能的

MIE的答案将很难实现，而且如果您更改数据库中的任何内容，将根本无法获得更新

你也许可以看看这个。我不确定，但看起来它能解决你的问题

或者，您需要从数据库中删除所有的变音符号，然后只需使用一个小型的阿拉伯语规范化器（如：

我发现这样做更好。所有奖励都是为了：

import java.text.Normalizer;
import java.text.Normalizer.Form;

/**
 *
 * @author Ibbtek <http://ibbtek.altervista.org/>
 */
public class ArabicDiacritics {

    private String input;
    private final String output;

    /**
     * ArabicDiacritics constructor
     * @param input String
     */
    public ArabicDiacritics(String input){
        this.input=input;
        this.output=normalize();
    }

    /**
     * normalize Method
     * @return String
     */
    private String normalize(){

        input = Normalizer.normalize(input, Form.NFKD)
                .replaceAll("\\p{M}", "");

        return input;
    }

    /**
     * @return the output
     */
    public String getOutput() {
        return output;
    }

    public static void main(String[] args) {
        String test = "كَلَّا لَا تُطِعْهُ وَاسْجُدْ وَاقْتَرِبْ ۩";
        System.out.println("Before: "+test);
        test=new ArabicDiacritics(test).getOutput();
        System.out.println("After: "+test);
    }
}

导入java.text.Normalizer；
导入java.text.Normalizer.Form；
/**
*
*@author Ibbtek
*/
公共类阿拉伯批评{
私有字符串输入；
私有最终字符串输出；
/**
*阿拉伯语系构造器
*@param输入字符串
*/
公共阿拉伯字母（字符串输入）{
这个输入=输入；
this.output=normalize（）；
}
/**
*规范化方法
*@返回字符串
*/
私有字符串规范化（）{
输入=Normalizer.normalize（输入，Form.NFKD）
.replaceAll（“\\p{M}，”）；
返回输入；
}
/**
*@返回输出
*/
公共字符串getOutput（）{
返回输出；
}
公共静态void main（字符串[]args）{
弦乐测试；
System.out.println（“之前：+测试）；
test=新的ArabicDiacritics（test）.getOutput（）；
System.out.println（“之后：”+测试）；
}
}

请参见下面我为android创建的类，返回可扩展字符串。它是如此的基本，不需要担心内存消耗。你们可以优化自己

http://freshinfresh.com/sample/ABHArabicDiacritics.java

如果要检查阿拉伯字符串中包含的不带nunition（harakath）的内容

    ABHArabicDiacritics objSearchd = new ABHArabicDiacritics();
objSearchdobjSearch.getDiacriticinsensitive("وَ اَشْهَدُ اَنْ لا اِلهَ اِلاَّ اللَّهُ").contains("اشهد");

如果要返回字符串中的高亮或红色搜索部分。使用下面的代码

ABHArabicDiacritics objSearch = new ABHArabicDiacritics( وَ اَشْهَدُ اَنْ لا اِلهَ اِلاَّ اللَّهُ, اشهد);
SpannableString spoutput=objSearch.getSearchHighlightedSpan();
            textView.setText(spoutput);

要查看搜索文本的开始和结束位置，使用以下方法

 /** to serch Contains */
            objSearch.isContain();//
            objSearch.getSearchHighlightedSpan();
            objSearch.getSearchTextStartPosition();
            objSearch.getSearchTextEndPosition();

请复制共享java类并享受

如果你们要求的话，我会花更多的时间来开发更多的功能

谢谢

希望不要在聚会上迟到，我的问题与OP有点不同，我想搜索带有变音符号的阿拉伯语文本，并想用一些颜色标记搜索到的文本，因此我需要保存匹配文本的索引

问题是，规范化没有变音符号的文本将减少文本长度，并将获得搜索文本的不同索引

所以，通过使用正则表达式和

SpannableString

/*
 * input: input text with Arabic Diacritics Or Letters that you want to ignore while searching
 * searchedWord: the word/text that you want to search in @input text
 * color: used to return a the founded matches with a different Foreground color using a SpannableString
 * */
public static Spannable searchArabicWithIgnoredDiacriticsOrLetters(String input, String searchedWord, int color) {

    Spannable output = new SpannableString(replaceLetters(input));
    StringBuilder sb = new StringBuilder();
    for (char ch : replaceLetters(searchedWord).toCharArray()) {
        sb.append(ch);
        sb.append("[\\u0655\\u0654\\u0670\\u065F\\u065E\\u065D\\u065C\\u065B\\u065A\\u0659\\u0658\\u0657\\u0656\\u06EC\\u06EB\\u06EA\\u06E4\\u061A\\u0619\\u0618\\u0617\\u0616\\u0615\\u064B\\u064C\\u064D\\u064E\\u064F\\u0650\\u0651\\u0652\\u0653\\u06DA\\u06D6\\u06D7\\u06D8\\u06D9\\u06DB\\u06DC\\u06DF\\u06E0\\u06E1\\u06E2\\u06E3\\u06E5\\u06E6\\u06E7\\u06E8\\u06EB\\u06EC\\u06ED]*");
    }

    Pattern pattern = Pattern.compile(String.valueOf(sb)); // get Pattern of the Regex
    Matcher matcher = pattern.matcher(input);  // get Matcher of the Pattern Regex in the input text
    while (matcher.find())
        output.setSpan(new ForegroundColorSpan(color),
                matcher.start(), matcher.end(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
    return output;
}

public static String replaceLetters(String input) {
    String output;
    output = input.replaceAll("أ", "ا");
    output = output.replaceAll("إ", "ا");
    output = output.replaceAll("ى", "ي");
    output = output.replaceAll("ة", "ه");
    output = output.replaceAll("آ", "ا");
    output = output.replaceAll("ٱ", "ا");
    return output;
}

replaceLetters（）

注意：您可以参考Unicode表示法的公认答案。

文本的编码是什么？是utf-8吗？你是说你想搜索U+0632并找到U+0633吗？在这种情况下，您可以编写自己的搜索或字符串。replace（）某些regexp In（使用regex搜索）是肯定的吗？您只需从db读取局部变量，并在局部删除变音符号，然后与搜索字符串进行比较。我用的是这种方式，它工作得很好。也许你应该放一个链接，指向这个答案的正确答案。谢谢你，这个程序真的很有用。测试字符串包含一个sajda-ayah。现在我需要做一个sajdah，只用于在Stackoverflow中阅读帖子。

String targetWord = "الذين"
String text = "صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ الْمَغْضُوبِ عَلَيْهِمْ وَلَا الضَّالِّين";

char[] input = targetWord.toCharArray();
StringBuilder regex = new StringBuilder("");
for(char c : input) {
   regex.append(c);
   regex.append("(\\p{M})*");
}

Pattern searchRegEx = Pattern.compile(regex.toString());
Matcher m = searchRegEx.matcher(text);

if(m.find()){
   i = m.start();
   System.out.println("m.group(0):: " + i + " : " + m.group(0));
}

/*
 * input: input text with Arabic Diacritics Or Letters that you want to ignore while searching
 * searchedWord: the word/text that you want to search in @input text
 * color: used to return a the founded matches with a different Foreground color using a SpannableString
 * */
public static Spannable searchArabicWithIgnoredDiacriticsOrLetters(String input, String searchedWord, int color) {

    Spannable output = new SpannableString(replaceLetters(input));
    StringBuilder sb = new StringBuilder();
    for (char ch : replaceLetters(searchedWord).toCharArray()) {
        sb.append(ch);
        sb.append("[\\u0655\\u0654\\u0670\\u065F\\u065E\\u065D\\u065C\\u065B\\u065A\\u0659\\u0658\\u0657\\u0656\\u06EC\\u06EB\\u06EA\\u06E4\\u061A\\u0619\\u0618\\u0617\\u0616\\u0615\\u064B\\u064C\\u064D\\u064E\\u064F\\u0650\\u0651\\u0652\\u0653\\u06DA\\u06D6\\u06D7\\u06D8\\u06D9\\u06DB\\u06DC\\u06DF\\u06E0\\u06E1\\u06E2\\u06E3\\u06E5\\u06E6\\u06E7\\u06E8\\u06EB\\u06EC\\u06ED]*");
    }

    Pattern pattern = Pattern.compile(String.valueOf(sb)); // get Pattern of the Regex
    Matcher matcher = pattern.matcher(input);  // get Matcher of the Pattern Regex in the input text
    while (matcher.find())
        output.setSpan(new ForegroundColorSpan(color),
                matcher.start(), matcher.end(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
    return output;
}

public static String replaceLetters(String input) {
    String output;
    output = input.replaceAll("أ", "ا");
    output = output.replaceAll("إ", "ا");
    output = output.replaceAll("ى", "ي");
    output = output.replaceAll("ة", "ه");
    output = output.replaceAll("آ", "ا");
    output = output.replaceAll("ٱ", "ا");
    return output;
}

public static String replaceLetters(String input) {
    String output;

    output = input.replaceAll("\\u0623", String.valueOf((char) Integer.parseInt("0627", 16)));  // replace أ with ا
    output = output.replaceAll("\\u0625", String.valueOf((char) Integer.parseInt("0627", 16))); // replace إ with ا
    output = output.replaceAll("\\u0649", String.valueOf((char) Integer.parseInt("064A", 16))); // replace ي with ى
    output = output.replaceAll("\\u0629", String.valueOf((char) Integer.parseInt("0647", 16))); // replace ة with ه
    output = output.replaceAll("\\u0622", String.valueOf((char) Integer.parseInt("0627", 16))); // replace آ with ا
    output = output.replaceAll("\\u0671", String.valueOf((char) Integer.parseInt("0627", 16))); // replace ٱ with ا

    return output;
}