Vb.net 列出Word文档使用的字体(更快的方法)
我正在制定一个验证文档的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体 我有以下代码存根,可以工作:Vb.net 列出Word文档使用的字体(更快的方法),vb.net,ms-word,office-interop,Vb.net,Ms Word,Office Interop,我正在制定一个验证文档的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体 我有以下代码存根,可以工作: Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass() Dim wordDocument As Word.Document = Nothing Dim fontList As New List(Of String)() Try
Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
Dim wordDocument As Word.Document = Nothing
Dim fontList As New List(Of String)()
Try
wordDocument = wordApplication.Documents.Open(FileName:="document Path")
'I've also tried using a for loop with an integer counter, no change in speed'
For Each c As Word.Range In wordDocument.Characters
If Not fontList.Contains(c.Font.Name) Then
fontList.Add(c.Font.Name)
End If
Next
但是这是难以置信的慢!难以置信的慢=2500个字符/分钟(我用秒表计时)。我的大部分文件大约6000字/30000个字符(约25页)。但是有些文件是100页的
有没有更快的方法?我必须支持Office2003格式的文件,所以OpenXMLSDK不是一个选项
--更新--
我尝试将其作为Word宏运行(使用找到的代码@),它运行得更快(不到一分钟)。不幸的是,就我的目的而言,我认为宏不起作用
--更新#2--
我采纳了Chris的建议,动态地将文档转换为OpenXML格式。然后,我使用以下代码查找所有RunFonts对象并读取字体名称:
Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()
fontList.AddRange(runFonts)
End Using
您可能必须支持Office2003,但这并不意味着您必须以那种格式解析它。以Office 2003文档为例,暂时将其转换为DOCX文件,将其作为ZIP文件打开,解析
/word/fontTable.xml
文件,然后删除DOCX。我认为这是错误的做法。我们寻找的是字体包含的事实,而不是该字体的位置。这是一个存在问题,而不是位置问题
迭代字体要快得多。唯一的诀窍是,这个词有时会对空格之类的东西大惊小怪。这对我很管用
Sub FindAllFonts()
Dim lWhichFont As Long, sTempName As String, sBuffer As String
For lWhichFont = 1 To FontNames.Count
sTempName = FontNames(lWhichFont)
If FindThisFont(sTempName) Then
sBuffer = sBuffer & "Found " & sTempName & vbCrLf
Else
If FindThisFont(Replace(sTempName, " ", "")) Then
sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
End If
End If
Next
Documents.Add
Selection.TypeText Text:=sBuffer
End Sub
Function FindThisFont(sName As String) As Boolean
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
With Selection.Find
.Font.Name = sName
.Forward = True
.Format = True
.Execute
If .Found() Then
FindThisFont = True
Else
FindThisFont = False
End If
End With
End Function
它工作得非常快(唯一慢的部分是字体迭代)
(很明显,它在你的系统上找不到字体,但如果你正在准备传输你写的东西,而某个助手程序已经把Helvetica或Minchin女士放进去了,你可以找到它)
好的,人们告诉我这不是每个人都想要的,人们想要找到他们机器上没有的字体。但另一种方法仍然太慢,需要寻找很多不在那里的东西。因此,这里有一个替代方案,它保存为rtf,并处理rtf头
Sub FIndAllFonts2()
Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
Dim objPic As InlineShape, objShp As Shape
' rememer old name for reloading
sOldName = ActiveDocument.Path
'delete image to make file out small
For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
sTempFile = ActiveDocument.Path
ActiveDocument.Close
lPos2 = 1
' we only want the header, but we don't know how long the file is
'I am using a Mac, so filesystemobject not available
' if you end up having a huge header, make 2500 bigger
lStopAt = 2500
Open sTempFile For Input As #1
Do While Not EOF(1) And lPos2 < lStopAt
sBit = Input(1, #1)
sBuffer = sBuffer & sBit
lPos2 = lPos2 + 1
Loop
Close #1
'delete temp file
Kill sTempFile
' loop through header, fonts identified in the table as {\f1\
' if you have more than 100 fonts, make this bigger
' not all numbers are used
lStopAt = 100
For lCounter = 1 To lStopAt
lPos = InStr(sBuffer, "{\f" & lCounter & "\")
If lPos > 0 Then
sBuffer = Mid(sBuffer, lPos)
lPos = InStr(sBuffer, ";")
sBuffer2 = Left(sBuffer, lPos - 1)
'this is where you would look for the alternate name if you want it
lPos = InStr(sBuffer2, "{\*\falt")
If lPos > 0 Then
sBuffer2 = Left(sBuffer2, lPos - 1)
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
Else
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
End If
sOut = sOut & sBuffer2 & vbCrLf
End If
Next
'reopen old file
Documents.Open sOldName
Set newdoc = Documents.Add
sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
Selection.TypeText Text:=sOut
End Sub
子FIndAllFonts2()
Dim sOldName As String、sOldPath As String、sBuffer As String、sBuffer2 As String、sOut As String、sBit
Dim L计数器为长,LPO为长,lPos2为长,lStopAt为长
Dim objPic作为InlineShape,objShp作为Shape
'重新加载时请记住旧名称
sOldName=ActiveDocument.Path
'删除图像使文件变小
对于ActiveDocument.InlineShapes中的每个objPic:objPic.Delete:Next
对于ActiveDocument中的每个objShp。形状:objShp。删除:下一步
ActiveDocument.SaveAs“tempout.rtf”,wdFormatRTF
sTempFile=ActiveDocument.Path
ActiveDocument。关闭
lPos2=1
'我们只需要标题,但不知道文件的长度
'我正在使用Mac,因此filesystemobject不可用
如果你的头球很大,那就再大2500个
lStopAt=2500
打开sTempFile作为#1输入
当EOF(1)和lPos20,则
sBuffer=Mid(sBuffer,LPO)
lPos=仪表(sBuffer,“;”)
sBuffer2=左(sBuffer,lPos-1)
'如果需要,您可以在此处查找备用名称
lPos=InStr(sBuffer2,“{\*\falt”)
如果lPos>0,则
sBuffer2=左(sBuffer2,lPos-1)
sBuffer2=Mid(sBuffer2,InStrRev(sBuffer2,“”)+1)和“*”表示它是较短的ascii名称
其他的
sBuffer2=Mid(sBuffer2,InStrRev(sBuffer2,“}”)+1)
如果结束
sOut=sOut&sBuffer2&vbCrLf
如果结束
下一个
'重新打开旧文件
文档。打开sOldName
Set newdoc=Documents.Add
sOut=“文档中使用的字体”&sOldName&vbCrLf&sOut
Selection.TypeText文本:=sOut
端接头
这篇文章在MacBook Pro上用不到20秒的时间就完成了我350页的论文草稿。因此它足够快,非常有用。如果您想在文档中使用所有字体,只需使用OPEN XML通过一行即可获得所有字体:
using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
{
var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
}
使用(WordprocessingDocument=WordprocessingDocument.Open(filePath,true))
{
var fontlst=doc.MainDocumentPart.FontTablePart.Fonts.Elements();
}
每个字体元素都有其“Name”属性,该属性在文本运行的属性中的元素中引用
提示:你必须考虑每个单词DOC。<强>>不/强>有2个以上的字体表部分,一个在主部分,另一个在词汇表部分,所以如果需要的话也不要忘记考虑词汇表。 您可以从下载OPEN XML SDK,请尝试以下操作:
Sub Word_Get_Document_Fonts()
Dim report As String
Dim J As Integer
Dim font_name As String
report = "Fonts in use in this document:" & vbCr & vbCr
For J = 1 To FontNames.Count
font_name = FontNames(J)
Set myrange = ActiveDocument.Range
myrange.Find.ClearFormatting
myrange.Find.Font.Name = font_name
With myrange.Find
.text = "^?"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindStop
.Format = True
End With
myrange.Find.Execute
If myrange.Find.Found Then
report = report & font_name & vbCr
End If
Next J
MsgBox (report)
End Sub
通过迭代段落可以大大加快速度。只有段落包含混合字体时,才需要逐个字符进行检查。名称、粗体、斜体等属性有一个特殊的“不确定”值,即名称为空字符串,样式属性为9999999 因此,例如,如果Bold=999999,则表示段落包含一些粗体字符和一些非粗体字符 我将包含以下片段以展示总体思路:
For Each P as Paragraph in doc.Paragraphs
Dim R as Range = P.Range
If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
' This paragraph uses mixed fonts, so we need to analyse character by character
AnalyseCharacterByCharacter(R)
Else
' R.Font is used throughout this paragraph
FontHasBeenUsed(R.Font)
End If
Next
我发现的另一种不用编码的方法是:
- 将文档导出为PDF
- 在AdobeReader中打开它
- 在adobe reader中,转到:文件菜单\属性,然后是字体选项卡,其中列出了这些字体的族字体和子字体
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Xml.Linq; using Word = NetOffice.WordApi; namespace _5261108 { class Program { static void Main(string[] args) { using (var app = new Word.Application()) { var doc = app.Documents.Open(Path.GetFullPath("test.docx")); foreach (var font in GetFontNames(doc)) { Console.WriteLine(font); } app.Quit(false); } Console.ReadLine(); } private static IEnumerable<string> GetFontNames(Word.Document document) { var xml = document.Content.XML; var doc = XDocument.Parse(xml); var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font"); var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value); return fontNames.Distinct(); } } }
Imports System.Collections.Generic Imports System.IO Imports System.Linq Imports System.Xml.Linq Imports Word = NetOffice.WordApi Namespace _5261108 Class Program Private Shared Sub Main(args As String()) Using app = New Word.Application() Dim doc = app.Documents.Open(Path.GetFullPath("test.docx")) For Each font As var In GetFontNames(doc) Console.WriteLine(font) Next app.Quit(False) End Using Console.ReadLine() End Sub Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String) Dim xml = document.Content.XML Dim doc = XDocument.Parse(xml) Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font") Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value) Return fontNames.Distinct() End Function End Class End Namespace '======================================================= 'Service provided by Telerik (www.telerik.com) 'Conversion powered by NRefactory. 'Twitter: @telerik 'Facebook: facebook.com/telerik '=======================================================