.net 翻译大量文本数据的最佳方式是什么?
我有很多文本数据,想把它翻译成不同的语言 我知道的可能方式:.net 翻译大量文本数据的最佳方式是什么?,.net,translation,google-api,bing-api,.net,Translation,Google Api,Bing Api,我有很多文本数据,想把它翻译成不同的语言 我知道的可能方式: 谷歌翻译API 必应翻译API 问题是,所有这些服务在文本长度、通话次数等方面都有限制,这给它们的使用带来了不便 在这种情况下,您可以建议使用哪些服务/方法?在将语言翻译与xmpp聊天服务器集成时,我必须解决同样的问题。我将有效负载(我需要翻译的文本)划分为完整句子的较小子集。我记不起确切的数字,但通过谷歌基于rest的翻译url,我翻译了一组完整的句子,总共少于(或等于)1024个字符,因此,一个大段落将导致多个翻译服务调用。将
- 谷歌翻译API
- 必应翻译API
在这种情况下,您可以建议使用哪些服务/方法?在将语言翻译与xmpp聊天服务器集成时,我必须解决同样的问题。我将有效负载(我需要翻译的文本)划分为完整句子的较小子集。我记不起确切的数字,但通过谷歌基于rest的翻译url,我翻译了一组完整的句子,总共少于(或等于)1024个字符,因此,一个大段落将导致多个翻译服务调用。将大文本分解为标记化字符串,然后通过循环将每个标记传递给翻译器。将翻译后的输出存储在一个数组中,一旦所有标记都翻译并存储在数组中,将它们放回一起,您将拥有一个完全翻译的文档 编辑:4/25/2010 为了证明这一点,我把它放在一起:)它的边缘很粗糙,但它可以处理大量的文本,而且它在翻译准确性方面与谷歌一样好,因为它使用谷歌API。我用这个代码处理了苹果公司2005年的整个SEC 10-K文件,只需点击一个按钮(大约45分钟)。结果基本上和你在谷歌翻译器中一次复制粘贴一个句子的结果一样。它并不完美(结尾标点不准确,我没有逐行写入文本文件),但它确实证明了概念。如果你再使用一些正则表达式,它可能会有更好的标点符号
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Dim file As New String("Translate Me.txt")
Dim lineCount As Integer = countLines()
Private Function countLines()
If IO.File.Exists(file) Then
Dim reader As New StreamReader(file)
Dim lineCount As Integer = Split(reader.ReadToEnd.Trim(), Environment.NewLine).Length
reader.Close()
Return lineCount
Else
MsgBox(file + " cannot be found anywhere!", 0, "Oops!")
End If
Return 1
End Function
Private Sub translateText()
Dim lineLoop As Integer = 0
Dim currentLine As String
Dim currentLineSplit() As String
Dim input1 As New StreamReader(file)
Dim input2 As New StreamReader(file)
Dim filePunctuation As Integer = 1
Dim linePunctuation As Integer = 1
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
Dim entireFile As String
entireFile = (input1.ReadToEnd)
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "." Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "!" Then filePunctuation += 1
Next
For i = 1 To Len(entireFile)
If Mid$(entireFile, i, 1) = "?" Then filePunctuation += 1
Next
Dim sentenceArraySize = filePunctuation + lineCount
Dim sentenceArrayCount = 0
Dim sentence(sentenceArraySize) As String
Dim sentenceLoop As Integer
While lineLoop < lineCount
linePunctuation = 1
currentLine = (input2.ReadLine)
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(currentLine)
If Mid$(currentLine, i, 1) = "?" Then linePunctuation += 1
Next
currentLineSplit = currentLine.Split(delimiters)
sentenceLoop = 0
While linePunctuation > 0
Try
Dim trans As New Google.API.Translate.TranslateClient("")
sentence(sentenceArrayCount) = trans.Translate(currentLineSplit(sentenceLoop), Google.API.Translate.Language.English, Google.API.Translate.Language.German, Google.API.Translate.TranslateFormat.Text)
sentenceLoop += 1
linePunctuation -= 1
sentenceArrayCount += 1
Catch ex As Exception
sentenceLoop += 1
linePunctuation -= 1
End Try
End While
lineLoop += 1
End While
Dim newFile As New String("Translated Text.txt")
Dim outputLoopCount As Integer = 0
Using output As StreamWriter = New StreamWriter(newFile)
While outputLoopCount < sentenceArraySize
output.Write(sentence(outputLoopCount) + ". ")
outputLoopCount += 1
End While
End Using
input1.Close()
input2.Close()
End Sub
Private Sub translateButton_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles translateButton.Click
translateText()
End Sub
End Class
Imports System.IO
导入System.Text.RegularExpressions
公开课表格1
Dim文件作为新字符串(“Translate Me.txt”)
Dim lineCount为整数=countLines()
私有函数countLines()
如果IO.File.Exists(文件)存在,则
Dim读卡器作为新的StreamReader(文件)
Dim lineCount As Integer=Split(reader.ReadToEnd.Trim(),Environment.NewLine).长度
reader.Close()
返回行计数
其他的
MsgBox(文件+“在任何地方都找不到!”,0,“Oops!”)
如果结束
返回1
端函数
私有子translateText()
Dim lineLoop作为整数=0
将当前行设置为字符串
将currentLineSplit()设置为字符串
Dim input1作为新的StreamReader(文件)
Dim input2作为新的StreamReader(文件)
Dim文件标点符号为整数=1
将线条标点标注为整数=1
Dim分隔符(3)作为字符
分隔符(0)=”
分隔符(1)=“!”
分隔符(2)=“?”
作为字符串的Dim entireFile
entireFile=(input1.ReadToEnd)
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“”,则文件标点符号+=1
下一个
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“!”则文件标点符号+=1
下一个
对于i=1至Len(全反射)
如果Mid$(entireFile,i,1)=“?”,则文件标点符号+=1
下一个
Dim sentenceArraySize=文件标点符号+行数
Dim sentenceArrayCount=0
模糊句子(句子排列)为字符串
Dim sentenceLoop作为整数
而lineLoop0时
尝试
Dim trans作为新的Google.API.Translate.TranslateClient(“”)
句子(sentenceArrayCount)=翻译(currentLineSplit(sentenceLoop)、Google.API.Translate.Language.English、Google.API.Translate.Language.German、Google.API.Translate.TranslateFormat.Text)
语句循环+=1
线条标点-=1
sentenceArrayCount+=1
特例
语句循环+=1
线条标点-=1
结束尝试
结束时
lineLoop+=1
结束时
Dim newFile作为新字符串(“Translated Text.txt”)
Dim outputLoopCount为整数=0
将输出用作StreamWriter=新StreamWriter(新文件)
而outputLoopCount
编辑:4/26/2010
请在否决投票前试用,如果效果不好,我不会发布它。这很简单,有几种方法:
- 使用API并在块中转换数据(符合限制)
- 编写自己的简单库来使用HttpWebRequest并向其中发布一些数据李>
private String TranslateTextEnglishSpanish(String textToTranslate)
{
HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
http.Referer = "http://translate.google.com/";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
if (httpResponse != null)
{
using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
{
//* Return translated Text
return reader.ReadToEnd();
}
}
return "";
}
方法调用:
private String TranslateTextEnglishSpanish(String textToTranslate)
{
HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
http.Referer = "http://translate.google.com/";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
if (httpResponse != null)
{
using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
{
//* Return translated Text
return reader.ReadToEnd();
}
}
return "";
}
String translatedText=translatextenglishSpanish(“hello world”)
结果:
private String TranslateTextEnglishSpanish(String textToTranslate)
{
HttpWebRequest http = WebRequest.Create("http://translate.google.com/") as HttpWebRequest;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)";
http.Referer = "http://translate.google.com/";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(String.Format("js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&text={0}+&file=&sl=en&tl=es", textToTranslate);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
if (httpResponse != null)
{
using (StreamReader reader = new StreamReader(httpResponse.GetResponseStream()))
{
//* Return translated Text
return reader.ReadToEnd();
}
}
return "";
}
translatedText==“h