Php 发音算法

Php 发音算法,php,algorithm,function,Php,Algorithm,Function,我正在努力寻找/创建一种算法,可以确定随机5个字母组合的发音 到目前为止,我发现的最接近的是这个3年前的StackOverflow线程: 。。。但它还远远不够完美,出现了一些相当奇怪的误报: 使用此函数,以下所有速率都可以发音(7/10以上) 兹特达 LLFDA MMGDA THHDA 香港电台 XYHDA 维奇达 有没有比我更聪明的人可以用这个算法来实现: “MM”、“LL”和“TH”仅在后面或前面有 元音 一行中有3个或3个以上的辅音是否定的(第一个或第二个辅音除外) 最后一个是

我正在努力寻找/创建一种算法,可以确定随机5个字母组合的发音

到目前为止,我发现的最接近的是这个3年前的StackOverflow线程:


。。。但它还远远不够完美,出现了一些相当奇怪的误报:

使用此函数,以下所有速率都可以发音(7/10以上)

  • 兹特达
  • LLFDA
  • MMGDA
  • THHDA
  • 香港电台
  • XYHDA
  • 维奇达
有没有比我更聪明的人可以用这个算法来实现:

  • “MM”、“LL”和“TH”仅在后面或前面有 元音
  • 一行中有3个或3个以上的辅音是否定的(第一个或第二个辅音除外) 最后一个是“R”或“L”)
  • 你能想到的任何其他改进

(我做了大量的研究/谷歌搜索,这似乎是过去3年来每个人都在引用/使用的主要发音功能,因此我相信更新、更精炼的版本会受到更广泛社区的欢迎,而不仅仅是我!)。

基于对链接问题的建议“对字母使用马尔可夫模型”

使用马尔可夫模型(当然是字母,而不是单词)。单词的概率是发音容易程度的一个很好的代表

我想我会尝试一下,并取得了一些成功

我的方法 我将一个5个字母的单词列表复制到一个文件中,作为我的数据集(…嗯,实际上)

然后我使用一个隐马尔可夫模型(基于1克、2克和3克)来预测目标词出现在该数据集中的可能性

(作为其中一个步骤,通过某种语音转录可以获得更好的结果。)

首先,我计算数据集中字符序列的概率

例如,如果“A”出现50次,并且数据集中只有250个字符,则“A”的概率为50/250或.2

对“AB”、“AC”和

对“ABC”、“ABD”和

基本上,我对“ABCDE”一词的得分包括:

  • prob('A')
  • prob('B')
  • prob('C')
  • prob('D')
  • prob('E')
  • prob('AB')
  • prob('BC')
  • prob('CD')
  • prob('DE')
  • prob('ABC')
  • 问题(‘BCD’)
  • 概率(‘CDE’)
您可以将所有这些值相乘,得到目标词出现在数据集中的估计概率(但这是非常小的)

因此,我们取而代之的是将每个日志添加到一起

现在我们有了一个分数,可以估计目标词出现在数据集中的可能性

我的代码 我把它编码为C#,发现大于负160的分数是非常好的

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace Pronouncability
{

class Program
{
    public static char[] alphabet = new char[]{ 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z' };

    public static List<string> wordList = loadWordList(); //Dataset of 5-letter words

    public static Random rand = new Random();

    public const double SCORE_LIMIT = -160.00;

    /// <summary>
    /// Generates random words, until 100 of them are better than
    /// the SCORE_LIMIT based on a statistical score. 
    /// </summary>
    public static void Main(string[] args)
    {
        Dictionary<Tuple<char, char, char>, int> trigramCounts = new Dictionary<Tuple<char, char, char>, int>();

        Dictionary<Tuple<char, char>, int> bigramCounts = new Dictionary<Tuple<char, char>, int>();

        Dictionary<char, int> onegramCounts = new Dictionary<char, int>();

        calculateProbabilities(onegramCounts, bigramCounts, trigramCounts);

        double totalTrigrams = (double)trigramCounts.Values.Sum();
        double totalBigrams = (double)bigramCounts.Values.Sum();
        double totalOnegrams = (double)onegramCounts.Values.Sum();

        SortedList<double, string> randomWordsScores = new SortedList<double, string>();

        while( randomWordsScores.Count < 100 )
        {
            string randStr = getRandomWord();

            if (!randomWordsScores.ContainsValue(randStr))
            {
                double score = getLikelyhood(randStr,trigramCounts, bigramCounts, onegramCounts, totalTrigrams, totalBigrams, totalOnegrams);

                if (score > SCORE_LIMIT)
                {
                    randomWordsScores.Add(score, randStr);
                }
            }
        }


        //Right now randomWordsScores contains 100 random words which have 
        //a better score than the SCORE_LIMIT, sorted from worst to best.
    }


    /// <summary>
    /// Generates a random 5-letter word
    /// </summary>
    public static string getRandomWord()
    {
        char c0 = (char)rand.Next(65, 90);
        char c1 = (char)rand.Next(65, 90);
        char c2 = (char)rand.Next(65, 90);
        char c3 = (char)rand.Next(65, 90);
        char c4 = (char)rand.Next(65, 90);

        return "" + c0 + c1 + c2 + c3 + c4;
    }

    /// <summary>
    /// Returns a score for how likely a given word is, based on given trigrams, bigrams, and one-grams
    /// </summary>
    public static double getLikelyhood(string wordToScore, Dictionary<Tuple<char, char,char>, int> trigramCounts, Dictionary<Tuple<char, char>, int> bigramCounts, Dictionary<char, int> onegramCounts, double totalTrigrams, double totalBigrams, double totalOnegrams)
    {
        wordToScore = wordToScore.ToUpper();

        char[] letters = wordToScore.ToCharArray();

        Tuple<char, char>[] bigrams = new Tuple<char, char>[]{ 

            new Tuple<char,char>( wordToScore[0], wordToScore[1] ),
            new Tuple<char,char>( wordToScore[1], wordToScore[2] ),
            new Tuple<char,char>( wordToScore[2], wordToScore[3] ),
            new Tuple<char,char>( wordToScore[3], wordToScore[4] )

        };

        Tuple<char, char, char>[] trigrams = new Tuple<char, char, char>[]{ 

            new Tuple<char,char,char>( wordToScore[0], wordToScore[1], wordToScore[2] ),
            new Tuple<char,char,char>( wordToScore[1], wordToScore[2], wordToScore[3] ),
            new Tuple<char,char,char>( wordToScore[2], wordToScore[3], wordToScore[4] ),


        };

        double score = 0;

        foreach (char c in letters)
        {
            score += Math.Log((((double)onegramCounts[c]) / totalOnegrams));
        }

        foreach (Tuple<char, char> pair in bigrams)
        {
            score += Math.Log((((double)bigramCounts[pair]) / totalBigrams));
        }

        foreach (Tuple<char, char, char> trio in trigrams)
        {
            score += 5.0*Math.Log((((double)trigramCounts[trio]) / totalTrigrams));
        }


        return score;
    }

    /// <summary>
    /// Build the probability tables based on the dataset (WordList)
    /// </summary>
    public static void calculateProbabilities(Dictionary<char, int> onegramCounts, Dictionary<Tuple<char, char>, int> bigramCounts, Dictionary<Tuple<char, char, char>, int> trigramCounts)
    {
        foreach (char c1 in alphabet)
        {
            foreach (char c2 in alphabet)
            {
                foreach( char c3 in alphabet)
                {
                    trigramCounts[new Tuple<char, char, char>(c1, c2, c3)] = 1;
                }
            }
        }

        foreach( char c1 in alphabet)
        {
            foreach( char c2 in alphabet)
            {
                bigramCounts[ new Tuple<char,char>(c1,c2) ] = 1;
            }
        }

        foreach (char c1 in alphabet)
        {
            onegramCounts[c1] = 1;
        }


        foreach (string word in wordList)
        {
            for (int pos = 0; pos < 3; pos++)
            {
                trigramCounts[new Tuple<char, char, char>(word[pos], word[pos + 1], word[pos + 2])]++;
            }

            for (int pos = 0; pos < 4; pos++)
            {
                bigramCounts[new Tuple<char, char>(word[pos], word[pos + 1])]++;
            }

            for (int pos = 0; pos < 5; pos++)
            {
                onegramCounts[word[pos]]++;
            }
        }
    }

    /// <summary>
    /// Get the dataset (WordList) from file.
    /// </summary>
    public static List<string> loadWordList()
    {
        string filePath = "WordList.txt";

        string text = File.ReadAllText(filePath);

        List<string> result = text.Split(' ').ToList();

        return result;
    }
}

}
使用系统;
使用System.Collections.Generic;
使用System.Linq;
使用系统文本;
使用System.IO;
名称空间可宣告性
{
班级计划
{
公共静态字符[]字母表=新字符[]{'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
public static List wordList=loadWordList();//5个字母单词的数据集
public static Random rand=new Random();
公共建筑双倍分数限制=-160.00;
/// 
///生成随机单词,直到其中100个比
///基于统计分数的分数限制。
/// 
公共静态void Main(字符串[]args)
{
Dictionary trigramCounts=新字典();
Dictionary bigramCounts=新字典();
Dictionary onegramCounts=新字典();
计算概率(onegramCounts、bigramCounts、trigramCounts);
double totalTrigrams=(double)trigramCounts.Values.Sum();
double totalBigrams=(double)bigramCounts.Values.Sum();
double totalOnegrams=(double)onegramCounts.Values.Sum();
SortedList RandomWordsCores=新的SortedList();
while(randomWordsCores.计数<100)
{
字符串randStr=getRandomWord();
如果(!randomWordsCores.ContainsValue(randStr))
{
double score=getLikelyhood(randStr、trigramCounts、bigramCounts、onegramCounts、totalTrigrams、totalBigrams、totalOnegrams);
如果(分数>分数限制)
{
添加(分数,randStr);
}
}
}
//现在RandomWordsCores包含100个具有
//比分数限制更好的分数,从最差到最佳排序。
}
/// 
///生成一个随机的5个字母的单词
/// 
公共静态字符串getRandomWord()
{
char c0=(char)rand.Next(65,90);
char c1=(char)rand.Next(65,90);
char c2=(char)rand.Next(65,90);
char c3=(char)rand.Next(65,90);
char c4=(char)rand.Next(65,90);
返回“+c0+c1+c2+c3+c4”;
}
/// 
///根据给定的三角形、双格形和1克,返回给定单词的可能性分数
/// 
public static double getLikelyhood(string wordToScore、Dictionary trigramCounts、Dictionary bigramCounts、Dictionary onegramCounts、double totalTrigrams、double totalBigrams、double totalegrams)
{
wordToScore=wordToScore.ToUpper();
char[]letters=wordToScore.ToCharArray();
元组[]bigrams=新元组[]{
新元组(wordToScore[0],wordToScore[1]),
新元组(wordToScore[1],wordToScore[2]),
新元组(wordToScore[2],wordToScore[3]),
新元组(wordToScore[3],wordToScore[4])
};
元组[]三角图=新元组[]{
新元组(wordToScore[0]、wordToScore[1]、wordToScore[2]),
新元组(wordToScore[1]、wordToScore[2]、wordToScore[3]),
新元组(wordToScore[2],wordToScore[3],wordToS
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace Pronouncability
{

class Program
{
    public static char[] alphabet = new char[]{ 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z' };

    public static List<string> wordList = loadWordList(); //Dataset of 5-letter words

    public static Random rand = new Random();

    public const double SCORE_LIMIT = -160.00;

    /// <summary>
    /// Generates random words, until 100 of them are better than
    /// the SCORE_LIMIT based on a statistical score. 
    /// </summary>
    public static void Main(string[] args)
    {
        Dictionary<Tuple<char, char, char>, int> trigramCounts = new Dictionary<Tuple<char, char, char>, int>();

        Dictionary<Tuple<char, char>, int> bigramCounts = new Dictionary<Tuple<char, char>, int>();

        Dictionary<char, int> onegramCounts = new Dictionary<char, int>();

        calculateProbabilities(onegramCounts, bigramCounts, trigramCounts);

        double totalTrigrams = (double)trigramCounts.Values.Sum();
        double totalBigrams = (double)bigramCounts.Values.Sum();
        double totalOnegrams = (double)onegramCounts.Values.Sum();

        SortedList<double, string> randomWordsScores = new SortedList<double, string>();

        while( randomWordsScores.Count < 100 )
        {
            string randStr = getRandomWord();

            if (!randomWordsScores.ContainsValue(randStr))
            {
                double score = getLikelyhood(randStr,trigramCounts, bigramCounts, onegramCounts, totalTrigrams, totalBigrams, totalOnegrams);

                if (score > SCORE_LIMIT)
                {
                    randomWordsScores.Add(score, randStr);
                }
            }
        }


        //Right now randomWordsScores contains 100 random words which have 
        //a better score than the SCORE_LIMIT, sorted from worst to best.
    }


    /// <summary>
    /// Generates a random 5-letter word
    /// </summary>
    public static string getRandomWord()
    {
        char c0 = (char)rand.Next(65, 90);
        char c1 = (char)rand.Next(65, 90);
        char c2 = (char)rand.Next(65, 90);
        char c3 = (char)rand.Next(65, 90);
        char c4 = (char)rand.Next(65, 90);

        return "" + c0 + c1 + c2 + c3 + c4;
    }

    /// <summary>
    /// Returns a score for how likely a given word is, based on given trigrams, bigrams, and one-grams
    /// </summary>
    public static double getLikelyhood(string wordToScore, Dictionary<Tuple<char, char,char>, int> trigramCounts, Dictionary<Tuple<char, char>, int> bigramCounts, Dictionary<char, int> onegramCounts, double totalTrigrams, double totalBigrams, double totalOnegrams)
    {
        wordToScore = wordToScore.ToUpper();

        char[] letters = wordToScore.ToCharArray();

        Tuple<char, char>[] bigrams = new Tuple<char, char>[]{ 

            new Tuple<char,char>( wordToScore[0], wordToScore[1] ),
            new Tuple<char,char>( wordToScore[1], wordToScore[2] ),
            new Tuple<char,char>( wordToScore[2], wordToScore[3] ),
            new Tuple<char,char>( wordToScore[3], wordToScore[4] )

        };

        Tuple<char, char, char>[] trigrams = new Tuple<char, char, char>[]{ 

            new Tuple<char,char,char>( wordToScore[0], wordToScore[1], wordToScore[2] ),
            new Tuple<char,char,char>( wordToScore[1], wordToScore[2], wordToScore[3] ),
            new Tuple<char,char,char>( wordToScore[2], wordToScore[3], wordToScore[4] ),


        };

        double score = 0;

        foreach (char c in letters)
        {
            score += Math.Log((((double)onegramCounts[c]) / totalOnegrams));
        }

        foreach (Tuple<char, char> pair in bigrams)
        {
            score += Math.Log((((double)bigramCounts[pair]) / totalBigrams));
        }

        foreach (Tuple<char, char, char> trio in trigrams)
        {
            score += 5.0*Math.Log((((double)trigramCounts[trio]) / totalTrigrams));
        }


        return score;
    }

    /// <summary>
    /// Build the probability tables based on the dataset (WordList)
    /// </summary>
    public static void calculateProbabilities(Dictionary<char, int> onegramCounts, Dictionary<Tuple<char, char>, int> bigramCounts, Dictionary<Tuple<char, char, char>, int> trigramCounts)
    {
        foreach (char c1 in alphabet)
        {
            foreach (char c2 in alphabet)
            {
                foreach( char c3 in alphabet)
                {
                    trigramCounts[new Tuple<char, char, char>(c1, c2, c3)] = 1;
                }
            }
        }

        foreach( char c1 in alphabet)
        {
            foreach( char c2 in alphabet)
            {
                bigramCounts[ new Tuple<char,char>(c1,c2) ] = 1;
            }
        }

        foreach (char c1 in alphabet)
        {
            onegramCounts[c1] = 1;
        }


        foreach (string word in wordList)
        {
            for (int pos = 0; pos < 3; pos++)
            {
                trigramCounts[new Tuple<char, char, char>(word[pos], word[pos + 1], word[pos + 2])]++;
            }

            for (int pos = 0; pos < 4; pos++)
            {
                bigramCounts[new Tuple<char, char>(word[pos], word[pos + 1])]++;
            }

            for (int pos = 0; pos < 5; pos++)
            {
                onegramCounts[word[pos]]++;
            }
        }
    }

    /// <summary>
    /// Get the dataset (WordList) from file.
    /// </summary>
    public static List<string> loadWordList()
    {
        string filePath = "WordList.txt";

        string text = File.ReadAllText(filePath);

        List<string> result = text.Split(' ').ToList();

        return result;
    }
}

}