C# 如何从.docx/.odt/.doc文件读取或复制文本_C#_.net_Doc

C# 如何从.docx/.odt/.doc文件读取或复制文本

c# .net

C# 如何从.docx/.odt/.doc文件读取或复制文本,c#,.net,doc,C#,.net,Doc,在我的应用程序中，我想读取一个文档文件（.doc或.odt或.docx），并将该文本存储在字符串中。为此，我使用以下代码： string text; using (var streamReader = new StreamReader(@"D:\Sample\Demo.docx", System.Text.Encoding.UTF8)) { text = streamReader.ReadToEnd(); } 但我无法阅读或复制正确的文本，如图所示：主键��!��x%

在我的应用程序中，我想读取一个文档文件（.doc或.odt或.docx），并将该文本存储在字符串中。为此，我使用以下代码：

string text;     
using (var streamReader = new StreamReader(@"D:\Sample\Demo.docx", System.Text.Encoding.UTF8))
{
    text = streamReader.ReadToEnd();
}

但我无法阅读或复制正确的文本，如图所示：

主键��!��x%��E��[内容类型].xml�(��木卫一�0��H��W��p@5��R�Jqv�Ij/�ۿg�%J��)P��Y��tf�N&�QY��0��T9��W� L！jk-gs@�л��0!��英国石油公司��Y�VJ�T�+��N�Kk��Z�'(Ÿ��/我��X�|/F�L騏��^��w$1 ZIho|b��tŔ�R��+?�W��6V�7*�W$}�ë�DΧ��R�我��Q�=��,��Fݜ��T�5+Z(��?�A.�Z��我�[!0�K��,}O��助教�\� �M�我�|��ж�在�某人�;'m、 y\9�“洛杉矶��o� %��@k8��?,足球俱乐部� hL_\��̱�9I��!�=��M��TT��|P�̩}}�$�|�� =�|��}��主键��

如何从文档文件中读取或复制文本？

Microsoft DocX格式是一个容器，不包含简单的纯文本数据（您的

StreamReader试图读取）
你应该考虑使用如下第三方库：
因为你需要使用不同的库
使用Microsoft.Office.Interop.Word从Word文档读取数据的示例
using System;
using Microsoft.Office.Interop.Word;

class Program
{
    static void Main()
    {
    // Open a doc file.
    Application application = new Application();
    Document document = application.Documents.Open("C:\\word.doc");

    // Loop through all words in the document.
    int count = document.Words.Count;
    for (int i = 1; i <= count; i++)
    {
        // Write the word.
        string text = document.Words[i].Text;
        Console.WriteLine("Word {0} = {1}", i, text);
    }
    // Close word.
    application.Quit();
    }
}

使用系统；
使用Microsoft.Office.Interop.Word；
班级计划
{
静态void Main（）
{
//打开一个文档文件。
应用程序=新应用程序（）；
Document Document=application.Documents.Open（“C:\\word.doc”）；
//循环浏览文档中的所有单词。
int count=document.Words.count；
for（int i=1；iMicrosoft.Office.Interop.Word对于大型文档来说速度非常慢。因此我建议使用OpenXml。对于使用OpenXml，您应该安装它
使用软件包管理器安装：
安装软件包DocumentFormat.OpenXml-版本2.8.1
2.使用OpenWordprocessingDocumentReadonly函数：
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace Readdocx
{
    class Program
    {
        static void Main(string[] args)
        {
            string mytext = OpenWordprocessingDocumentReadonly("mytext.docx");
        }
        public static string OpenWordprocessingDocumentReadonly(string filepath)
        {
            // Open a WordprocessingDocument based on a filepath.
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Open(filepath, false))
            {
                // Assign a reference to the existing document body.  
                Body body = wordDocument.MainDocumentPart.Document.Body;
                //text of Docx file 
                return body.InnerText.ToString();
             }
            return "-1";
        }
    }
}

你有一个MS Word文档，而不是文本文档。它不是以文本形式存储的（从技术上讲，它是一个ZIP文件）。你需要使用API来访问该文档-例如，Quick n dirt版本的可能副本：Open as ZIP-extract Word/document.xml-扔掉所有标记-瞧，openoffice文档的可能副本我是如何做到的？