C# word只是读取了我的文档文件的一小部分内容

C# word只是读取了我的文档文件的一小部分内容,c#,ms-word,aspose,aspose.words,C#,Ms Word,Aspose,Aspose.words,我正在使用以下代码尝试使用aspose.net获取我的world文件的内容: Document doc = new Document(@"D:\a.docx"); // Create an object that inherits from the DocumentVisitor class. MyDocToTxtWriter myConverter = new MyDocToTxtWriter(); doc.Accept(my

我正在使用以下代码尝试使用aspose.net获取我的world文件的内容:

 Document doc = new Document(@"D:\a.docx");

          // Create an object that inherits from the DocumentVisitor class.
          MyDocToTxtWriter myConverter = new MyDocToTxtWriter();


          doc.Accept(myConverter);


          System.IO.File.WriteAllText(@"c:/a.txt", myConverter.GetText());
            Console.ReadLine();
在上述代码中定义的我的类:

public class MyDocToTxtWriter : DocumentVisitor
  {
      public MyDocToTxtWriter()
      {
          mIsSkipText = false;
          mBuilder = new StringBuilder();
      }

      /// <summary>
      /// Gets the plain text of the document that was accumulated by the visitor.
      /// </summary>
      public string GetText()
      {
          return mBuilder.ToString();
      }

      /// <summary>
      /// Called when a Run node is encountered in the document.
      /// </summary>
      public override VisitorAction VisitRun(Run run)
      {
          AppendText(run.Text);

          // Let the visitor continue visiting other nodes.
          return VisitorAction.Continue;
      }

      /// <summary>
      /// Called when a FieldStart node is encountered in the document.
      /// </summary>
      public override VisitorAction VisitFieldStart(FieldStart fieldStart)
      {
          // In Microsoft Word, a field code (such as "MERGEFIELD FieldName") follows
          // after a field start character. We want to skip field codes and output field 
          // result only, therefore we use a flag to suspend the output while inside a field code.
          //
          // Note this is a very simplistic implementation and will not work very well
          // if you have nested fields in a document. 
          mIsSkipText = true;

          return VisitorAction.Continue;
      }

      /// <summary>
      /// Called when a FieldSeparator node is encountered in the document.
      /// </summary>
      public override VisitorAction VisitFieldSeparator(FieldSeparator fieldSeparator)
      {
          // Once reached a field separator node, we enable the output because we are
          // now entering the field result nodes.
          mIsSkipText = false;

          return VisitorAction.Continue;
      }

      /// <summary>
      /// Called when a FieldEnd node is encountered in the document.
      /// </summary>
      public override VisitorAction VisitFieldEnd(FieldEnd fieldEnd)
      {
          // Make sure we enable the output when reached a field end because some fields
          // do not have field separator and do not have field result.
          mIsSkipText = false;

          return VisitorAction.Continue;
      }

      /// <summary>
      /// Called when visiting of a Paragraph node is ended in the document.
      /// </summary>
      public override VisitorAction VisitParagraphEnd(Paragraph paragraph)
      {
          // When outputting to plain text we output Cr+Lf characters.
          AppendText(ControlChar.CrLf);

          return VisitorAction.Continue;
      }

      public override VisitorAction VisitBodyStart(Body body)
      {
          // We can detect beginning and end of all composite nodes such as Section, Body, 
          // Table, Paragraph etc and provide custom handling for them.
          mBuilder.Append("*** Body Started ***\r\n");

          return VisitorAction.Continue;
      }

      public override VisitorAction VisitBodyEnd(Body body)
      {
          mBuilder.Append("*** Body Ended ***\r\n");
          return VisitorAction.Continue;
      }

      /// <summary>
      /// Called when a HeaderFooter node is encountered in the document.
      /// </summary>
      public override VisitorAction VisitHeaderFooterStart(HeaderFooter headerFooter)
      {
          // Returning this value from a visitor method causes visiting of this
          // node to stop and move on to visiting the next sibling node.
          // The net effect in this example is that the text of headers and footers
          // is not included in the resulting output.
          return VisitorAction.SkipThisNode;
      }


      /// <summary>
      /// Adds text to the current output. Honours the enabled/disabled output flag.
      /// </summary>
      private void AppendText(string text)
      {
          if (!mIsSkipText)
              mBuilder.Append(text);
      }

      private readonly StringBuilder mBuilder;
      private bool mIsSkipText;
  }
公共类MyDocToTxtWriter:DocumentVisitor
{
公共MyDocToTxtWriter()
{
mIsSkipText=false;
mBuilder=新的StringBuilder();
}
/// 
///获取访问者累积的文档的纯文本。
/// 
公共字符串GetText()
{
返回mBuilder.ToString();
}
/// 
///在文档中遇到运行节点时调用。
/// 
公共覆盖访问者操作访问者(运行)
{
AppendText(run.Text);
//让访问者继续访问其他节点。
返回访问者操作。继续;
}
/// 
///在文档中遇到FieldStart节点时调用。
/// 
公共覆盖访问者操作访问者字段开始(FieldStart FieldStart)
{
//在Microsoft Word中,字段代码(如“MERGEFIELD FieldName”)如下所示
//在字段开始字符之后。我们要跳过字段代码并输出字段
//仅结果,因此我们在字段代码中使用标志来挂起输出。
//
//注意,这是一个非常简单的实现,不会很好地工作
//如果文档中有嵌套字段。
mIsSkipText=true;
返回访问者操作。继续;
}
/// 
///在文档中遇到FieldSeparator节点时调用。
/// 
公共覆盖访问者操作访问者字段分隔器(FieldSeparator FieldSeparator)
{
//一旦到达字段分隔符节点,我们将启用输出,因为
//现在进入字段结果节点。
mIsSkipText=false;
返回访问者操作。继续;
}
/// 
///在文档中遇到FieldEnd节点时调用。
/// 
公共覆盖访问者操作访问者字段(FieldEnd FieldEnd)
{
//确保在到达字段结尾时启用输出,因为某些字段
//没有字段分隔符,也没有字段结果。
mIsSkipText=false;
返回访问者操作。继续;
}
/// 
///在文档中段落节点的访问结束时调用。
/// 
公共覆盖访问者操作访问者权限范围(段落)
{
//当输出为纯文本时,我们输出Cr+Lf字符。
附录文本(ControlChar.CrLf);
返回访问者操作。继续;
}
公共覆盖访问者操作访问者启动(正文)
{
//我们可以检测所有组合节点的开始和结束,如截面、主体、,
//表格、段落等,并为其提供自定义处理。
mBuilder.Append(“***正文已启动***\r\n”);
返回访问者操作。继续;
}
公共覆盖访问者操作访问者(正文)
{
mBuilder.Append(“***正文结束***\r\n”);
返回访问者操作。继续;
}
/// 
///在文档中遇到HeaderFooter节点时调用。
/// 
公共覆盖访问者操作访问者页脚开始(页眉页脚页眉页脚)
{
//从访问者方法返回此值会导致访问此
//节点停止并继续访问下一个同级节点。
//本例中的净效果是页眉和页脚的文本
//不包括在结果输出中。
return VisitorAction.SkipThisNode;
}
/// 
///将文本添加到当前输出。遵守启用/禁用输出标志。
/// 
专用文本(字符串文本)
{
如果(!mIsSkipText)
mBuilder.Append(文本);
}
私有只读StringBuilder mBuilder;
私人布尔·米斯基普捷特;
}
当我运行这段代码时,只提取了一小部分内容,而不是全部内容。
为什么?

试着像这样反复阅读每一段:

Document doc = new Document(@"D:\a.docx");
var builder = new DocumentBuilder(doc);
var mBuilder = new StringBuilder();
var paragraphs = builder.Document.GetChildNodes(NodeType.Paragraph, true).ToArray().ToList();
paragraphs.ForEach(
    x =>
        {
            ((Paragraph)x).Runs.ToArray().ToList().ForEach(y => mBuilder.Append(y.Text));
            mBuilder.Append(Environment.NewLine);
        }
);
System.IO.File.WriteAllText(@"c:/a.txt", mBuilder.ToString());
Console.ReadLine();
使用API,您可以使用以下简单代码轻松地将Word文档转换为TXT格式

Document doc = new Document(MyDir + @"a.docx");
TxtSaveOptions opts = new TxtSaveOptions();
doc.Save(MyDir + @"a.txt", opts);
获取整个Word文档的文本表示的另一种方法如下:

Document doc = new Document(MyDir + @"a.docx");
System.IO.File.WriteAllText(MyDir + @"a.txt", doc.ToString(SaveFormat.Text));

我以开发者布道者的身份与Aspose合作。

我应该把这段代码放在哪里,你能在.tolist()中给我更多信息吗?我有一个错误,错误是1'Aspose.Words.NodeCollection'不包含'tolist'的定义,并且找不到接受'Aspose.Words.NodeCollection'类型的第一个参数的扩展方法'tolist'(是否缺少using指令或程序集引用?)C:\Users\ehsan\documents\visual studio 2012\Projects\ConsoleApplication1\ConsoleApplication1\Program.cs 51 28 ConsoleApplication1Add System.Collections.Generic and System.Linq在您的使用中它以前在使用中,因为您使用的是试用版的Aspose。请使用有效的许可证文件或要求Aspose为您提供30天的完整许可证这里的天数=>