Vb.net 列出Word文档使用的字体(更快的方法)

Vb.net 列出Word文档使用的字体(更快的方法),vb.net,ms-word,office-interop,Vb.net,Ms Word,Office Interop,我正在制定一个验证文档的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体 我有以下代码存根,可以工作: Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass() Dim wordDocument As Word.Document = Nothing Dim fontList As New List(Of String)() Try

我正在制定一个验证文档的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体

我有以下代码存根,可以工作:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next
但是这是难以置信的慢!难以置信的慢=2500个字符/分钟(我用秒表计时)。我的大部分文件大约6000字/30000个字符(约25页)。但是有些文件是100页的

有没有更快的方法?我必须支持Office2003格式的文件,所以OpenXMLSDK不是一个选项

--更新--

我尝试将其作为Word宏运行(使用找到的代码@),它运行得更快(不到一分钟)。不幸的是,就我的目的而言,我认为宏不起作用

--更新#2--

我采纳了Chris的建议,动态地将文档转换为OpenXML格式。然后,我使用以下代码查找所有RunFonts对象并读取字体名称:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using

您可能必须支持Office2003,但这并不意味着您必须以那种格式解析它。以Office 2003文档为例,暂时将其转换为DOCX文件,将其作为ZIP文件打开,解析
/word/fontTable.xml
文件,然后删除DOCX。

我认为这是错误的做法。我们寻找的是字体包含的事实,而不是该字体的位置。这是一个存在问题,而不是位置问题

迭代字体要快得多。唯一的诀窍是,这个词有时会对空格之类的东西大惊小怪。这对我很管用

Sub FindAllFonts()
    Dim lWhichFont As Long, sTempName As String, sBuffer As String
    For lWhichFont = 1 To FontNames.Count
       sTempName = FontNames(lWhichFont)
       If FindThisFont(sTempName) Then
          sBuffer = sBuffer & "Found " & sTempName & vbCrLf
        Else
           If FindThisFont(Replace(sTempName, " ", "")) Then
              sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
           End If
        End If
   Next
   Documents.Add
   Selection.TypeText Text:=sBuffer
End Sub

Function FindThisFont(sName As String) As Boolean
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Font.Name = sName
       .Forward = True
       .Format = True
       .Execute
       If .Found() Then
          FindThisFont = True
       Else
          FindThisFont = False
       End If
   End With
End Function
它工作得非常快(唯一慢的部分是字体迭代)

(很明显,它在你的系统上找不到字体,但如果你正在准备传输你写的东西,而某个助手程序已经把Helvetica或Minchin女士放进去了,你可以找到它)


好的,人们告诉我这不是每个人都想要的,人们想要找到他们机器上没有的字体。但另一种方法仍然太慢,需要寻找很多不在那里的东西。因此,这里有一个替代方案,它保存为rtf,并处理rtf头

Sub FIndAllFonts2()
    Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
    Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
    Dim objPic As InlineShape, objShp As Shape
    ' rememer old name for reloading
    sOldName = ActiveDocument.Path
    'delete image to make file out small
    For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
    For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
    ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
    sTempFile = ActiveDocument.Path
    ActiveDocument.Close
    lPos2 = 1
    ' we only want the header, but we don't know how long the file is
    'I am using a Mac, so filesystemobject not available
    ' if you end up having a huge header, make 2500 bigger
    lStopAt = 2500
    Open sTempFile For Input As #1
    Do While Not EOF(1) And lPos2 < lStopAt
        sBit = Input(1, #1)
        sBuffer = sBuffer & sBit
        lPos2 = lPos2 + 1
    Loop
    Close #1
    'delete temp file
    Kill sTempFile
    ' loop through header, fonts identified in the table as {\f1\
    ' if you have more than 100 fonts, make this bigger
    ' not all numbers are used
    lStopAt = 100
    For lCounter = 1 To lStopAt
        lPos = InStr(sBuffer, "{\f" & lCounter & "\")
        If lPos > 0 Then
            sBuffer = Mid(sBuffer, lPos)
            lPos = InStr(sBuffer, ";")
            sBuffer2 = Left(sBuffer, lPos - 1)
            'this is where you would look for the alternate name if you want it
            lPos = InStr(sBuffer2, "{\*\falt")
            If lPos > 0 Then
                sBuffer2 = Left(sBuffer2, lPos - 1)
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
            Else
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
            End If
            sOut = sOut & sBuffer2 & vbCrLf
        End If
    Next
    'reopen old file
    Documents.Open sOldName
    Set newdoc = Documents.Add
    sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
    Selection.TypeText Text:=sOut
End Sub
子FIndAllFonts2()
Dim sOldName As String、sOldPath As String、sBuffer As String、sBuffer2 As String、sOut As String、sBit
Dim L计数器为长,LPO为长,lPos2为长,lStopAt为长
Dim objPic作为InlineShape,objShp作为Shape
'重新加载时请记住旧名称
sOldName=ActiveDocument.Path
'删除图像使文件变小
对于ActiveDocument.InlineShapes中的每个objPic:objPic.Delete:Next
对于ActiveDocument中的每个objShp。形状:objShp。删除:下一步
ActiveDocument.SaveAs“tempout.rtf”,wdFormatRTF
sTempFile=ActiveDocument.Path
ActiveDocument。关闭
lPos2=1
'我们只需要标题,但不知道文件的长度
'我正在使用Mac,因此filesystemobject不可用
如果你的头球很大,那就再大2500个
lStopAt=2500
打开sTempFile作为#1输入
当EOF(1)和lPos20,则
sBuffer=Mid(sBuffer,LPO)
lPos=仪表(sBuffer,“;”)
sBuffer2=左(sBuffer,lPos-1)
'如果需要,您可以在此处查找备用名称
lPos=InStr(sBuffer2,“{\*\falt”)
如果lPos>0,则
sBuffer2=左(sBuffer2,lPos-1)
sBuffer2=Mid(sBuffer2,InStrRev(sBuffer2,“”)+1)和“*”表示它是较短的ascii名称
其他的
sBuffer2=Mid(sBuffer2,InStrRev(sBuffer2,“}”)+1)
如果结束
sOut=sOut&sBuffer2&vbCrLf
如果结束
下一个
'重新打开旧文件
文档。打开sOldName
Set newdoc=Documents.Add
sOut=“文档中使用的字体”&sOldName&vbCrLf&sOut
Selection.TypeText文本:=sOut
端接头

这篇文章在MacBook Pro上用不到20秒的时间就完成了我350页的论文草稿。因此它足够快,非常有用。

如果您想在文档中使用所有字体,只需使用OPEN XML通过一行即可获得所有字体:

 using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
 {
     var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
 }
使用(WordprocessingDocument=WordprocessingDocument.Open(filePath,true))
{
var fontlst=doc.MainDocumentPart.FontTablePart.Fonts.Elements();
}
每个字体元素都有其“Name”属性,该属性在文本运行的属性中的元素中引用

提示:你必须考虑每个单词DOC。<强>>不/强>有2个以上的字体表部分,一个在主部分,另一个在词汇表部分,所以如果需要的话也不要忘记考虑词汇表。 您可以从下载OPEN XML SDK,请尝试以下操作:

Sub Word_Get_Document_Fonts()
  Dim report As String
  Dim J As Integer
  Dim font_name As String
  report = "Fonts in use in this document:" & vbCr & vbCr
  For J = 1 To FontNames.Count
    font_name = FontNames(J)
    Set myrange = ActiveDocument.Range
    myrange.Find.ClearFormatting
    myrange.Find.Font.Name = font_name
    With myrange.Find
      .text = "^?"
      .Replacement.text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
    End With
    myrange.Find.Execute
    If myrange.Find.Found Then
      report = report & font_name & vbCr
    End If
  Next J
  MsgBox (report)
End Sub

通过迭代段落可以大大加快速度。只有段落包含混合字体时,才需要逐个字符进行检查。名称、粗体、斜体等属性有一个特殊的“不确定”值,即名称为空字符串,样式属性为9999999

因此,例如,如果Bold=999999,则表示段落包含一些粗体字符和一些非粗体字符

我将包含以下片段以展示总体思路:

For Each P as Paragraph in doc.Paragraphs
    Dim R as Range = P.Range
    If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
        Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
        ' This paragraph uses mixed fonts, so we need to analyse character by character
        AnalyseCharacterByCharacter(R)
    Else
        ' R.Font is used throughout this paragraph
        FontHasBeenUsed(R.Font)
    End If
 Next

我发现的另一种不用编码的方法是:

  • 将文档导出为PDF
  • 在AdobeReader中打开它
  • 在adobe reader中,转到:文件菜单\属性,然后是字体选项卡,其中列出了这些字体的族字体和子字体
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Xml.Linq;
    using Word = NetOffice.WordApi;
    
    namespace _5261108
    {
        class Program
        {
            static void Main(string[] args)
            {
                using (var app = new Word.Application())
                {
                    var doc = app.Documents.Open(Path.GetFullPath("test.docx"));
    
                    foreach (var font in GetFontNames(doc))
                    {
                        Console.WriteLine(font);
                    }
    
                    app.Quit(false);
                }
    
                Console.ReadLine();
            }
    
            private static IEnumerable<string> GetFontNames(Word.Document document)
            {
                var xml = document.Content.XML;
                var doc = XDocument.Parse(xml);
                var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
                var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
                return fontNames.Distinct();
            }
        }
    }
    
    Imports System.Collections.Generic
    Imports System.IO
    Imports System.Linq
    Imports System.Xml.Linq
    Imports Word = NetOffice.WordApi
    
    Namespace _5261108
        Class Program
            Private Shared Sub Main(args As String())
                Using app = New Word.Application()
                    Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))
    
                    For Each font As var In GetFontNames(doc)
                        Console.WriteLine(font)
                    Next
    
                    app.Quit(False)
                End Using
    
                Console.ReadLine()
            End Sub
    
            Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
                Dim xml = document.Content.XML
                Dim doc = XDocument.Parse(xml)
                Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
                Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
                Return fontNames.Distinct()
            End Function
        End Class
    End Namespace
    
    '=======================================================
    'Service provided by Telerik (www.telerik.com)
    'Conversion powered by NRefactory.
    'Twitter: @telerik
    'Facebook: facebook.com/telerik
    '=======================================================