C# 如何用C解析文本文件#_C#_Parsing_Text

C# 如何用C解析文本文件#

c# parsing text

C# 如何用C解析文本文件#,c#,parsing,text,C#,Parsing,Text,我说的文本格式是指更复杂的东西起初，我开始手动将我要问这个问题的文本文件中的5000行添加到我的项目中文本文件有5000行不同长度的行。例如： 1 1 ITEM_ETC_GOLD_01 골드(소) xxx xxx xxx_TT_DESC 0 0 3 3 5 0 180000 3 0 1 0 0 255 1 1 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0

我说的文本格式是指更复杂的东西

起初，我开始手动将我要问这个问题的文本文件中的5000行添加到我的项目中

文本文件有5000行不同长度的行。例如：

1   1   ITEM_ETC_GOLD_01    골드(소)   xxx xxx xxx_TT_DESC 0   0   3   3   5   0   180000  3   0   1   0   0   255 1   1   0   0   0   0   0   0   0   0   0   0   -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_money_small.bsr    xxx xxx xxx 0   2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1   표현할 골드의 양(param1이상) -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   4   ITEM_ETC_HP_POTION_01   HP 회복 약초    xxx SN_ITEM_ETC_HP_POTION_01    SN_ITEM_ETC_HP_POTION_01_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   60  0   0   0   1   21  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_01.ddj   xxx xxx 50  2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP회복양   0   HP회복양(%)    0   MP회복양   0   MP회복양(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   5   ITEM_ETC_HP_POTION_02   HP 회복약 (소)  xxx SN_ITEM_ETC_HP_POTION_02    SN_ITEM_ETC_HP_POTION_02_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   110 0   0   0   2   39  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_02.ddj   xxx xxx 50  2   0   0   2   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP회복양   0   HP회복양(%)    0   MP회복양   0   MP회복양(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

第一个字符（1）和第二个字符（1/4/5）之间的文本不是空白，而是制表符。该文本文件中没有空白

我想要的是：

我想得到第二个整数（在上面的三行中，第二个整数是1，4和5），每个行中间的字符串表示路径（它以“item”开头），以文件扩展名“.dJ”结尾。我的问题是：

当我在谷歌上搜索“文本格式C#”-我得到的只是如何打开文本文件以及如何用C#编写文本文件。我不知道如何在文本文件中搜索文本。我也无法搜索第一个整数，因为如果它是一个小整数，就像我上面发布的三行一样，我将无法找到相应的位置，因为例如“1”可能存在于其他位置

我的问题:

如果我写一个程序，删除任何东西，但我需要的东西，那将是最好的

在我看来，另一种方法是直接在该文件内搜索，但正如我前面提到的，如果第二个整数太低，我可能会得到错误的位置

请提供一些建议，我无法手动格式化所有这些内容。

您可以执行以下操作：

using (TextReader rdr = OpenYourFile()) {
    string line;
    while ((line = rdr.ReadLine()) != null) {
        string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
        int theInt = Convert.ToInt32(fields[1]);
    }
}

搜索“格式化”时未找到相关结果的原因是，您正在执行的操作称为“解析”。

尝试使用正则表达式。您可以在文本中找到特定模式，并将其替换为所需的内容。我现在无法给您确切的代码，但您可以使用它测试表达式美国

好的，我们要做的是：打开文件，逐行读取，然后按制表符拆分。然后抓取第二个整数并循环其余整数以找到路径

StreamReader reader = File.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) 
{
    string[] items = line.Split('\t');
    int myInteger = int.Parse(items[1]);   // Here's your integer.

    // Now let's find the path.
    string path = null;
    foreach (string item in items) 
    {
        if (item.StartsWith("item\\") && item.EndsWith(".ddj"))
            path = item;
    }

    // At this point, `myInteger` and `path` contain the values we want
    // for the current line. We can then store those values or print them,
    // or anything else we like.
}

您可以打开该文件并使用StreamReader.ReadLine逐行读取该文件。然后，您可以使用String.Split将每一行分成几段（使用\t分隔符）来提取第二个数字

由于项目的数量不同，您需要在字符串中搜索模式“item\*.ddj”

要删除项目，您可以（例如）将文件的所有内容保留在内存中，并在用户单击“保存”时写出新文件。

另一种解决方案，这次使用正则表达式：

using System.Text.RegularExpressions;

...

Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");

StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
    Match match = parts.Match(line);
    if (match.Success) {
        int number = int.Parse(match.Group(1).Value);
        string path = match.Group(2).Value;

        // At this point, `number` and `path` contain the values we want
        // for the current line. We can then store those values or print them,
        // or anything else we like.
    }
}

这个表达式有点复杂，所以这里对它进行了分解：

^        Start of string
\d+      "\d" means "digit" - 0-9. The "+" means "one or more."
         So this means "one or more digits."
\t       This matches a tab.
(\d+)    This also matches one or more digits. This time, though, we capture it
         using brackets. This means we can access it using the Group method.
\t       Another tab.
.+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
         This is to stop it grabbing everything in sight - it'll only grab as much
         as it needs to for the regex to work.
\t       Another tab.

(item\\[^\t]+\.ddj)
    Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"

^字符串的开头
\d+“\d”表示“数字”-0-9.“+”表示“一个或多个”
这意味着“一个或多个数字。”
\这和标签不匹配。
（\d+）这也匹配一个或多个数字。不过，这次我们捕获了它
使用括号。这意味着我们可以使用Group方法访问它。
\别再付帐单了。
.+？“”的意思是“任何东西”。所以“任何东西中的一个或多个”。此外，它是懒惰的。
这是为了阻止它抓住眼前的一切——它只会抓住尽可能多的东西
因为它需要正则表达式才能工作。
\别再付帐单了。
（项目\\[^\t]+\.ddj）
这是肉。这和“item\.ddj”匹配

正如前面提到的，我强烈建议使用正则表达式（在System.Text中）来完成这类工作

在combo中，您可以使用一个可靠的工具，例如，处理任何复杂的文本记录解析情况，以及快速获得结果

希望这会有所帮助。

我发现在这种情况下非常有用的一种方法是老一套，使用Jet OLEDB提供程序，以及schema.ini文件，在使用ADO.Net时读取以制表符分隔的大型文件。显然，只有知道要导入的文件的格式，此方法才真正有用

public void ImportCsvFile(string filename)
{
    FileInfo file = new FileInfo(filename);

    using (OleDbConnection con = 
            new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
            file.DirectoryName + "\";
            Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
    {
        using (OleDbCommand cmd = new OleDbCommand(string.Format
                                  ("SELECT * FROM [{0}]", file.Name), con))
        {
            con.Open();

            // Using a DataReader to process the data
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    // Process the current reader entry...
                }
            }

            // Using a DataTable to process the data
            using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
            {
                DataTable tbl = new DataTable("MyTable");
                adp.Fill(tbl);

                foreach (DataRow row in tbl.Rows)
                {
                    // Process the current row...
                }
            }
        }
    }
}

一旦数据格式如datatable，筛选出所需的数据就变得非常简单。

“该文本文件中没有空格”仅供参考：制表符就是空格。您的意思是“该文本文件中没有空格”。我的工作如下：[解析一行并设置逗号以生成csv字符串][1][1]这不会得到“每行中指示路径的字符串”（直接从问题中获取）。您可能需要使用Lo.S拆除法（“\t”。ToChar Rayle（））。根据您的版本（Irc）要小心。如果您想访问该行上的第十五个项目，但是您正在处理的行仅包含12个项（例如）您将得到一个异常。请尽可能避免这种情况。此外，空行将使您陷入混乱（无意使用双关语），因为行。拆分（“\t”）命令将返回一个只包含一个空元素的数组。我不知道您要接受哪一个答案，这两个答案都很好。我更喜欢这一个，因为您解释了原因，我以前从未见过这一点！如果您喜欢正则表达式，我建议您下次要处理这样的文件时使用类似Perl的东西。它是aro设计的Samir Talwar：我认为你应该成为一名教授正则表达式的教师。你解释一切的方式非常出色。我从来没有一位老师这么详细过！+1我不明白你到底在哪一点上解析“行”“。看起来你没有将正则表达式与任何东西匹配。现在看起来你必须这样做：Regex r=newregex（@“^\d+\t（+\d+\t.+？\t（item\[^\t]+\.ddj）”）；匹配m=r.匹配（线）；伟大的我在这台机器上没有C#编译器，所以我不得不用它。很高兴听到它是开箱即用的。看起来在.NET4中，您需要实例化一个FileInfo对象，然后在此对象上调用OpenText（）。i、例如，

FileInfo fi=newfileinfo（“filename.txt”）；StreamReader=fi.OpenText（）我认为项目的定义应该是：