C# 从文本文件中提取电子邮件地址和名称_C#_Email_Text

C# 从文本文件中提取电子邮件地址和名称

c# email text

C# 从文本文件中提取电子邮件地址和名称,c#,email,text,C#,Email,Text,我会尽力解释这个问题。我有一个包含电子邮件地址和姓名的文本文件。它看起来是这样的：Barb Beney“de。mariof@vienna.aa“，“Beny Beney”bet@catering.at等……都在同一行中。这只是一个例子，我喜欢在一个大的文本文件数千这样的数据。我想提取电子邮件和姓名，以便最终得到类似的信息：贝尼bet@catering.at-彼此相邻，在一行中分开，不带引号。最后，它应该从文件中删除所有重复的地址我写了提取电子邮件地址的代码，它可以工作，但我不知道如何做其余的

我会尽力解释这个问题。我有一个包含电子邮件地址和姓名的文本文件。它看起来是这样的：

Barb Beney“de。mariof@vienna.aa“，“Beny Beney”bet@catering.at

等……都在同一行中。这只是一个例子，我喜欢在一个大的文本文件数千这样的数据。我想提取电子邮件和姓名，以便最终得到类似的信息：

贝尼bet@catering.at-彼此相邻，在一行中分开，不带引号。最后，它应该从文件中删除所有重复的地址

我写了提取电子邮件地址的代码，它可以工作，但我不知道如何做其余的。如何提取名称将其作为地址放在一行中并消除重复项。我希望我描述得恰当，这样你就知道我想做什么了。这是我的代码：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.IO;

namespace Email
{
class Program
{
    static void Main(string[] args)
    {
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");   
    }


    public static void ExtractEmails(string inFilePath, string outFilePath)
    {
        string data = File.ReadAllText(inFilePath);

        Regex emailRegex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
            RegexOptions.IgnoreCase);


        MatchCollection emailMatches = emailRegex.Matches(data);


        StringBuilder sb = new StringBuilder();

        foreach (Match emailMatch in emailMatches)
        {
            sb.AppendLine(emailMatch.Value);

        }

        File.WriteAllText(outFilePath, sb.ToString());
    }

}

欢迎使用此代码，它将通过创建新文件来处理文件，该文件将包含所有不重复的电子邮件：

    static void Main(string[] args)
    {
        TextWriter w = File.CreateText(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt");
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
        TextReader r = File.OpenText(@"C:\Users\drake\Desktop\Email.txt");
        RemovingAllDupes(r, w);
    }

    public static void RemovingAllDupes(TextReader reader, TextWriter writer)
    {
        string currentLine;
        HashSet<string> previousLines = new HashSet<string>();

        while ((currentLine = reader.ReadLine()) != null)
        {
            // Add returns true if it was actually added,
            // false if it was already there
            if (previousLines.Add(currentLine))
            {
                writer.WriteLine(currentLine);
            }
        }
        writer.Close();
    }

static void Main（字符串[]args）
{
TextWriter w=File.CreateText（@“C:\Users\drake\Desktop\nonduplicateMails.txt”）；
提取电子邮件（@“C:\Users\drake\Desktop\New.txt”，“C:\Users\drake\Desktop\Email.txt”）；
TextReader r=File.OpenText（@“C:\Users\drake\Desktop\Email.txt”）；
移除所有重复（r，w）；
}
公共静态无效删除所有重复（TextReader、TextWriter-writer）
{
串电流线；
HashSet previousLines=新HashSet（）；
while（（currentLine=reader.ReadLine（））！=null）
{
//如果实际添加了Add，则返回true，
//如果已经存在，则为false
如果（上一行。添加（当前行））
{
writer.WriteLine（当前行）；
}
}
writer.Close（）；
}

对于所需的新格式，您可以执行以下操作：

private string[] parseEmails(string bigStringiIn){

string[] output;
string bigString;

bigString = bigStringiIn.Replace("\"", "");

output = bigString.Slit(",".ToCharArray());

return output;
}

它接受带有邮件地址的字符串，替换引号，然后将字符串拆分为一个字符串数组，格式为：

name lastnameemail@some.com

对于重复条目的删除，嵌套的for应该起作用，检查（可能在.Split（）之后）匹配字符串。

您也可以将此代码用于大文件：

    static void Main(string[] args)
    {
        ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
        var sr = new StreamReader(File.OpenRead(@"C:\Users\drake\Desktop\Email.txt"));
        var sw = new StreamWriter(File.OpenWrite(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt"));
        RemovingAllDupes(sr, sw);
    }

    public static void RemovingAllDupes(StreamReader str, StreamWriter stw)
    {

        var lines = new HashSet<int>();
        while (!str.EndOfStream)
        {
            string line = str.ReadLine();
            int hc = line.GetHashCode();
            if (lines.Contains(hc))
                continue;

            lines.Add(hc);
            stw.WriteLine(line);
        }
        stw.Flush();
        stw.Close();
        str.Close();

static void Main（字符串[]args）
{
提取电子邮件（@“C:\Users\drake\Desktop\New.txt”，“C:\Users\drake\Desktop\Email.txt”）；
var sr=new StreamReader（File.OpenRead（@“C:\Users\drake\Desktop\Email.txt”）；
var sw=new StreamWriter（File.OpenWrite（@“C:\Users\drake\Desktop\nondouplateemails.txt”）；
移除所有重复（sr、sw）；
}
公共静态void removingalldupe（StreamReader str、StreamWriter stw）
{
var lines=newhashset（）；
而（！str.EndOfStream）
{
字符串行=str.ReadLine（）；
int hc=line.GetHashCode（）；
if（行包含（hc））
继续；
行。添加（hc）；
stw.WriteLine（行）；
}
stw.Flush（）；
stw.Close（）；
str.Close（）；

电子邮件地址和名称用什么分隔？原始文件看起来像“FirstName LastName”，第二，第三…它们用空格分隔。最终结果应该是：FirstName LastName emailaddress-在一行中没有引号，然后是第二行等。我认为最好使用string.split（“，”）分割所有日期然后将结果放入一个数组中，现在所有电子邮件和名称都在一个数组中索引，然后使用string.split（“”）[使用空格拆分]现在你在一些特定的索引中有了名字、姓氏和电子邮件地址，所以电子邮件用引号括起来，或者名字？或者两者都有？或者两者都有？只需输入一个准确的条目，而不是试图解释它们的格式“它将以正确的方式工作”-正确的方法是什么？谢谢。我只需要找出如何将名称和电子邮件提取到一行，删除“和（）。我不能使用拆分，因为名称可以是2-5个单词。如果我有这样的第一个名称“第二个名称”第三个名称，最后应该是没有引号和括号的名称，并将电子邮件保存到一个文件中的一行中。。。。