C# 提取Word文档数据并插入SQL数据库

C# 提取Word文档数据并插入SQL数据库,c#,sql,asp.net,scripting,ms-word,C#,Sql,Asp.net,Scripting,Ms Word,Word文档示例 A 1. Name of House: Aasleagh Lodge Townland: Srahatloe Near: Killary Harbour, Leenane Status/Public Access: maintained, private fishing lodge Date Built: 1838-1850, burnt 1923, rebuilt 1928 Description: Large Victorian country house. Origina

Word文档示例

A 1. Name of House: Aasleagh Lodge
Townland: Srahatloe
Near: Killary Harbour, Leenane
Status/Public Access: maintained, private fishing lodge
Date Built: 1838-1850, burnt 1923, rebuilt 1928
Description: Large Victorian country house. Original house 6-bay, 2-storey, 3-bay section on right is higher; after fire house was reduced in size giving current three parallel- hipped roof bays. 
Associated Families: Lord Sligo; rented - Hon David Plunkett ; Capt W.E. and Constance Mary Phillips; James Leslie Wanklyn M.P. for Bradford; Walter H. Maudslay; Ernest Richard Hartley; Alice Marsh, Lord and Lady Brabourne; Western Fisheries Board; Inland Fisheries Ireland.
是否有方法插入标题后的数据,例如word文档中的“Townland”位置我希望将标题后的数据插入数据库中的一列,在本例中为“Srahatloe”。我想从Word文档中提取所有这些数据,这是我正在构建的网站的数据,所有信息都存储在Word文档中,但我需要在不复制和粘贴的情况下将文本添加到数据库中,因为该文档非常大(70000多个字),是否有脚本可以用于此操作

源代码

var wordApp = new Microsoft.Office.Interop.Word.Application();
            var wordDoc = wordApp.Documents.Open(@"C:\Users\mhoban\Documents\Book.docx");
            var txt = wordDoc.Content.Text;
            var regex = new Regex(@"(Townland\: )(.+?)[\r\n]");
            var allMatches = regex.Matches(txt);
            foreach (Match match in allMatches)
            {
                var townValue = match.Groups[2].Value;

                // Insert values into database
                SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["ConnectionString"].ToString());
                SqlCommand com = new SqlCommand();

                com.CommandText = "INSERT INTO Houses (Townland) VALUES (@town)";

                com.Parameters.Add("@town", SqlDbType.NVarChar).SqlValue = townValue;

                com.Connection = con;

                con.Open();

                com.ExecuteNonQuery();

                con.Close();
            }

为RegEx尖叫。像这样的东西应该能让你工作:

var wordApp = new Microsoft.Office.Interop.Word.Application();
var wordDoc = wordApp.Documents.Open(pathToYourDocument);
var txt = wordDoc.Content.Text;
    var regex = new Regex(@"(Townland\: )(.+?)[\r\n]");
    var allMatches = regex.Matches(txt);
    foreach (Match match in allMatches)
    {
        var townValue = match.Groups[2].Value;
        //townValue now holds "Srahatloe"
        //do your magic
    }

下面是我用来从word文档中提取特定文本的代码

我最终使用了更快的正则表达式,但是我已经没有代码了。无论如何,这里是如何从word中提取文本并将其放入csv

请不要,您需要在您的开发PC上安装PIA以实现办公自动化

若要添加对Microsoft.Office.Interop.Word的引用,请转到Visual Studio-->右键单击引用-->COM-->Microsoft.Word 14.0(很抱歉,我无法访问我的工作电脑,因此无法附加屏幕截图)

使用系统;
使用System.Collections.Generic;
使用System.Linq;
使用系统文本;
使用System.Threading.Tasks;
使用Microsoft.Office.Interop.Word;
使用Microsoft.Office.Interop.Excel;
使用System.IO;
命名空间控制台应用程序2
{
班级计划
{
静态void Main(字符串[]参数)
{
字符串month=“2014年7月”;
字符串分隔符=“,”;
string[]files=Directory.GetFiles(“C:\\temp\\\”+月);
字符串[][]csvoutput=新字符串[][]{};
csvoutput=newstring[]{newstring[]{“学校名”、“学生名”、“Id”、“报告日期”};
StringBuilder sb=新的StringBuilder();
sb.AppendLine(string.Join(分隔符,csvoutput[0]);
AppendAllText(“C:\\Temp\\”+month+“.csv”,sb.ToString());
foreach(文件中的var文件)
{
var id=string.Empty;
var studentName=string.Empty;
var school=string.Empty;
var reportDate=string.Empty;
if(file.ToLower().EndsWith(“.doc”))
{
var word=新的Microsoft.Office.Interop.word.Application();
var sourceFile=新文件信息(文件);
var doc=word.Documents.Open(sourceFile.FullName);
WriteLine(“处理:-{“+file.ToLower());
对于(int i=0;i

}

试试这个。你试过使用正则表达式吗?@Pradnya bolli,据我所知,他不想保存整个文档,只想保存其中的一些文本。如果我是对的,你应该以某种方式解析word文档,例如使用Open XML SDK。这不是word文档的示例,只是两行文本。是否有其他文本围绕着它?它是在表格中还是在简单的段落中?它的格式是否总是这样,即“名称:…年龄:…”在?软返回还是硬返回?@LocEngineer之间有一个换行符,我已用确切的word文档格式更新了问题所有信息都以这种方式显示我如何访问word文档我需要将其导入visual studio还是设置其路径?设置对Microsoft.word.Interop(Microsoft.Office.Interop.word)的引用。然后创建并设置变量,就像我在上面的代码中添加的变量一样。是的,正如我所说,您需要将ref设置为Word.Interop;列表中显示的完整COM程序集名称是“Microsoft Word 14.0对象库”var townValue=match.Groups[1].Value;当我在这行上放置一个断点时,它告诉我townValue保存的是值Townland:而不是Srahatloe,因此将Townland:插入数据库而不是headArg!抱歉。忘记了它是基于1的。match.Groups[2]。然后选择值。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.Excel;
using System.IO;

namespace ConsoleApplication2
{
class Program
{
    static void Main(string[] args)
    {
        string month = "July2014";
        string delimiter = ",";
        string[] files = Directory.GetFiles("C:\\temp\\"+ month);
        string[][] csvoutput = new string[][] { };
        csvoutput = new string[][] { new string[]{"School Name","Student Name","Id","ReportDate"}};
        StringBuilder sb = new StringBuilder();
        sb.AppendLine(string.Join(delimiter, csvoutput[0]));
        File.AppendAllText("C:\\Temp\\"+month+".csv", sb.ToString());

        foreach (var file in files)
        {
            var id = string.Empty;
            var studentName = string.Empty;
            var school = string.Empty;
            var reportDate = string.Empty;

            if (file.ToLower().EndsWith(".doc"))
            {
                var word = new Microsoft.Office.Interop.Word.Application();
                var sourceFile = new FileInfo(file);
                var doc = word.Documents.Open(sourceFile.FullName);
                Console.WriteLine("Processing :-{ " + file.ToLower());

                for (int i = 0; i < doc.Paragraphs.Count; i++)
                {

                    try
                    {
                        if (doc.Paragraphs[i + 1].Range.Text.StartsWith("School:"))
                        {
                            school = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a","").Replace("School: ","").Trim();

                        }
                        if (doc.Paragraphs[i + 1].Range.Text.StartsWith("Student Names:"))
                        {
                            studentName = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("Student Names:","").Trim();

                        }
                        if (doc.Paragraphs[i + 1].Range.Text.StartsWith("xx Id:"))
                        {
                            id = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("xx Id:", "").Trim();

                        }

                        if (doc.Paragraphs[i + 1].Range.Text.StartsWith("Date of Report:"))
                        {
                            reportDate = doc.Paragraphs[i + 1].Range.Text.ToString().Replace("\r\a", "").Replace("Date of Report:","").Trim();

                        }
                    }
                    catch (Exception)
                    {
                        Console.WriteLine("Error occurred" + file.ToLower());
                    }
                }
                csvoutput = new string[][]
                        {
                            new string[]{school,studentName,id,reportDate} 
                        };

                int csvlength = csvoutput.GetLength(0);
                for (int index = 0; index < csvlength; index++)
                    sb.AppendLine(string.Join(delimiter, csvoutput[index]));
                File.AppendAllText("C:\\Temp\\" + month + ".csv", sb.ToString());
                word.ActiveDocument.Close();
                word.Quit();
            }
        }
        Console.WriteLine("Finished");
        Console.ReadLine();
    }
}