.net 翻译大量文本数据的最佳方式是什么?

.net 翻译大量文本数据的最佳方式是什么?,.net,translation,google-api,bing-api,.net,Translation,Google Api,Bing Api,我有很多文本数据,想把它翻译成不同的语言 我知道的可能方式: 谷歌翻译API 必应翻译API 问题是,所有这些服务在文本长度、通话次数等方面都有限制,这给它们的使用带来了不便 在这种情况下,您可以建议使用哪些服务/方法?在将语言翻译与xmpp聊天服务器集成时,我必须解决同样的问题。我将有效负载(我需要翻译的文本)划分为完整句子的较小子集。我记不起确切的数字,但通过谷歌基于rest的翻译url,我翻译了一组完整的句子,总共少于(或等于)1024个字符,因此,一个大段落将导致多个翻译服务调用。将

我有很多文本数据,想把它翻译成不同的语言

我知道的可能方式:

  • 谷歌翻译API
  • 必应翻译API
问题是,所有这些服务在文本长度、通话次数等方面都有限制,这给它们的使用带来了不便


在这种情况下,您可以建议使用哪些服务/方法?

在将语言翻译与xmpp聊天服务器集成时,我必须解决同样的问题。我将有效负载(我需要翻译的文本)划分为完整句子的较小子集。我记不起确切的数字,但通过谷歌基于rest的翻译url,我翻译了一组完整的句子,总共少于(或等于)1024个字符,因此,一个大段落将导致多个翻译服务调用。

将大文本分解为标记化字符串,然后通过循环将每个标记传递给翻译器。将翻译后的输出存储在一个数组中,一旦所有标记都翻译并存储在数组中,将它们放回一起,您将拥有一个完全翻译的文档

编辑:4/25/2010

为了证明这一点,我把它放在一起:)它的边缘很粗糙,但它可以处理大量的文本,而且它在翻译准确性方面与谷歌一样好,因为它使用谷歌API。我用这个代码处理了苹果公司2005年的整个SEC 10-K文件,只需点击一个按钮(大约45分钟)。结果基本上和你在谷歌翻译器中一次复制粘贴一个句子的结果一样。它并不完美(结尾标点不准确,我没有逐行写入文本文件),但它确实证明了概念。如果你再使用一些正则表达式,它可能会有更好的标点符号

Imports System.IO
Imports System.Text.RegularExpressions

Public Class Form1

    Dim file As New String("Translate Me.txt")
    Dim lineCount As Integer = countLines()

    Private Function countLines()

        If IO.File.Exists(file) Then

            Dim reader As New StreamReader(file)
            Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
            reader.Close()
            Return lineCount

        Else

            MsgBox(file + " cannot be found anywhere!", 0, "Oops!")

        End If

        Return 1

    End Function

    Private Sub translateText()

        Dim lineLoop As Integer = 0
        Dim currentLine As String
        Dim currentLineSplit() As String
        Dim input1 As New StreamReader(file)
        Dim input2 As New StreamReader(file)
        Dim filePunctuation As Integer = 1
        Dim linePunctuation As Integer = 1

        Dim delimiters(3) As Char
        delimiters(0) = "."
        delimiters(1) = "!"
        delimiters(2) = "?"

        Dim entireFile As String
        entireFile = (input1.ReadToEnd)

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
        Next

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
        Next

        For i = 1 To Len(entireFile)
            If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
        Next

        Dim sentenceArraySize = filePunctuation + lineCount

        Dim sentenceArrayCount = 0
        Dim sentence(sentenceArraySize) As String
        Dim sentenceLoop As Integer

        While lineLoop < lineCount

            linePunctuation = 1

            currentLine = (input2.ReadLine)

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
            Next

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
            Next

            For i = 1 To Len(currentLine)
                If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
            Next

            currentLineSplit = currentLine.Split(delimiters)
            sentenceLoop = 0

            While linePunctuation > 0

                Try

                    Dim trans As New Google.API.Translate.TranslateClient("")
                    sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
                    sentenceLoop += 1
                    linePunctuation -= 1
                    sentenceArrayCount += 1

                Catch ex As Exception

                    sentenceLoop += 1
                    linePunctuation -= 1

                End Try

            End While

            lineLoop += 1

        End While

        Dim newFile As New String("Translated Text.txt")
        Dim outputLoopCount As Integer = 0

        Using output As StreamWriter = New StreamWriter(newFile)

            While outputLoopCount < sentenceArraySize

                output.Write(sentence(outputLoopCount) + ". ")

                outputLoopCount += 1

            End While

        End Using

        input1.Close()
        input2.Close()

    End Sub

    Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click

        translateText()

    End Sub

End Class
Imports System.IO
导入System.Text.RegularExpressions
公开课表格1
Dim文件作为新字符串(“Translate Me.txt”)
Dim lineCount为整数=countLines()
私有函数countLines()
如果IO.File.Exists(文件)存在,则
Dim读卡器作为新的StreamReader(文件)
Dim lineCount As Integer=Split(reader.ReadToEnd.Trim(),Environment.NewLine).长度
reader.Close()
返回行计数
其他的
MsgBox(文件+“在任何地方都找不到!”,0,“Oops!”)
如果结束
返回1
端函数
私有子translateText()
Dim lineLoop作为整数=0
将当前行设置为字符串
将currentLineSplit()设置为字符串
Dim input1作为新的StreamReader(文件)
Dim input2作为新的StreamReader(文件)
Dim文件标点符号为整数=1
将线条标点标注为整数=1
Dim分隔符(3)作为字符
分隔符(0)=”
分隔符(1)=“!”
分隔符(2)=“?”
作为字符串的Dim entireFile
entireFile=(input1.ReadToEnd)
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“”,则文件标点符号+=1
下一个
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“!”则文件标点符号+=1
下一个
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“?”,则文件标点符号+=1
下一个
Dim sentenceArraySize=文件标点符号+行数
Dim sentenceArrayCount=0
模糊句子(句子排列)为字符串
Dim sentenceLoop作为整数
而lineLoop0时
尝试
Dim trans作为新的Google.API.Translate.TranslateClient(“”)
句子(sentenceArrayCount)=翻译(currentLineSplit(sentenceLoop)、Google.API.Translate.Language.English、Google.API.Translate.Language.German、Google.API.Translate.TranslateFormat.Text)
语句循环+=1
线条标点-=1
sentenceArrayCount+=1
特例
语句循环+=1
线条标点-=1
结束尝试
结束时
lineLoop+=1
结束时
Dim newFile作为新字符串(“Translated Text.txt”)
Dim outputLoopCount为整数=0
将输出用作StreamWriter=新StreamWriter(新文件)
而outputLoopCount
编辑:4/26/2010
请在否决投票前试用,如果效果不好,我不会发布它。

这很简单,有几种方法:

  • 使用API并在块中转换数据(符合限制)
  • 编写自己的简单库来使用HttpWebRequest并向其中发布一些数据
下面是一个例子(第二个例子):

方法:

private String TranslateTextEnglishSpanish(String textToTranslate)
{           
        HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
        http.Method = "POST";
        http.ContentType = "application/x-www-form-urlencoded";
        http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
        http.Referer = "http://translate.google.com/";

        byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);

        http.ContentLength = dataBytes.Length;

        using (Stream postStream = http.GetRequestStream())
        {
            postStream.Write(dataBytes, 0, dataBytes.Length);
        }

        HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
        if (httpResponse != null)
        {
            using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
            {
                //* Return translated Text
                return reader.ReadToEnd();
            }
        }

        return "";
}
方法调用:

private String TranslateTextEnglishSpanish(String textToTranslate)
{           
        HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
        http.Method = "POST";
        http.ContentType = "application/x-www-form-urlencoded";
        http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
        http.Referer = "http://translate.google.com/";

        byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);

        http.ContentLength = dataBytes.Length;

        using (Stream postStream = http.GetRequestStream())
        {
            postStream.Write(dataBytes, 0, dataBytes.Length);
        }

        HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
        if (httpResponse != null)
        {
            using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
            {
                //* Return translated Text
                return reader.ReadToEnd();
            }
        }

        return "";
}
String translatedText=translatextenglishSpanish(“hello world”)

结果:

private String TranslateTextEnglishSpanish(String textToTranslate)
{           
        HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
        http.Method = "POST";
        http.ContentType = "application/x-www-form-urlencoded";
        http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
        http.Referer = "http://translate.google.com/";

        byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);

        http.ContentLength = dataBytes.Length;

        using (Stream postStream = http.GetRequestStream())
        {
            postStream.Write(dataBytes, 0, dataBytes.Length);
        }

        HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
        if (httpResponse != null)
        {
            using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
            {
                //* Return translated Text
                return reader.ReadToEnd();
            }
        }

        return "";
}
translatedText==“h