使用ASP.NET将逗号分隔的CSV文件处理为多个文件
有人能帮我指出正确的方向吗?先谢谢你 我正在寻找一个小应用程序,它将处理一个csv文件,根据数组中一列(比如Header1)的不同值列表,将行输出到多个csv文件中,但我不知道从哪里开始。仅供参考:标题1中的列表将始终更改 我已经能够使用以下代码将文件读入数组:使用ASP.NET将逗号分隔的CSV文件处理为多个文件,asp.net,csv,Asp.net,Csv,有人能帮我指出正确的方向吗?先谢谢你 我正在寻找一个小应用程序,它将处理一个csv文件,根据数组中一列(比如Header1)的不同值列表,将行输出到多个csv文件中,但我不知道从哪里开始。仅供参考:标题1中的列表将始终更改 我已经能够使用以下代码将文件读入数组: [Read From Comma-Delimited Text Files in Visual Basic][1] 现在我想基于第一列处理数据。比如, 输入: input.csv "Header1","Header2","Heade
[Read From Comma-Delimited Text Files in Visual Basic][1]
现在我想基于第一列处理数据。比如,
输入:
input.csv
"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"
"pear","pie","soda","beer"
"pear","pie","soda","beer"
"orange","pie","soda","beer"
"orange","pie","soda","beer"
输出:
output1.csv
"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"
output2.csv
"Header1","Header2","Header3","Header4"
"pear","pie","soda","beer"
"pear","pie","soda","beer"
output2.csv
"Header1","Header2","Header3","Header4"
"orange","pie","soda","beer"
"orange","pie","soda","beer"
你能做的就是
将键列读入列表q
创建不同的键列表
将q中的值与dist进行比较,并根据dist中的索引将行写入文件
范例
Dim lines As String() = System.IO.File.ReadAllLines("input.csv")
Dim q = (From line In lines
Let x = line.Split(",")
Select x(0)).ToList()
Dim dist = q.Distinct().ToList()
For j As Integer = 1 To dist.Count - 1
Using sw As New StreamWriter(File.Open("output" & j & ".csv", FileMode.OpenOrCreate))
sw.WriteLine(lines(0))
End Using
Next
For i As Integer = 1 To q.Count - 1
Console.WriteLine(q(i))
Console.WriteLine(dist.IndexOf(q(i)))
Using sw As New StreamWriter(File.Open("output" & dist.IndexOf(q(i)) & ".csv", FileMode.Append))
sw.WriteLine(lines(i))
End Using
Next
如果键列不是第一列,请在x0中更改其索引。保存数据的合适数据结构(而不是数组)将是字典。这就很容易检查你是否已经有一个特定类别的条目,比如苹果或梨。然后,您只需将新条目添加到字典或添加到现有条目 要创建输出文件,需要迭代字典中的每个条目以分离文件,然后遍历字典条目值中的每个实体以获得文件中的行
Option Infer On
Imports System.IO
Imports Microsoft.VisualBasic.FileIO
Module Module1
Sub SeparateCsvToFiles(srcFile As String)
Dim d As New Dictionary(Of String, List(Of String))
Dim headers As String()
Using tfp As New TextFieldParser(srcFile)
tfp.HasFieldsEnclosedInQuotes = True
tfp.SetDelimiters(",")
Dim currentRow As String()
' Get the headers
Try
headers = tfp.ReadFields()
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
Throw New FormatException(String.Format("Could not read header line in ""{0}"".", srcFile))
End Try
' Read the data
Dim lineNumber As Integer = 1
While Not tfp.EndOfData
Try
currentRow = tfp.ReadFields()
'TODO: Possibly handle the wrong number of entries more gracefully.
If currentRow.Count = headers.Count Then
' assume column to sort on is the zeroth one
Dim category = currentRow(0)
Dim values = String.Join(",", currentRow.Skip(1).Select(Function(s) """" & s & """"))
If d.ContainsKey(category) Then
d(category).Add(values)
Else
Dim valuesList As New List(Of String)
valuesList.Add(values)
d.Add(category, valuesList)
End If
Else
Throw New FormatException(String.Format("Wrong number of entries in line {0} in ""{1}"".", lineNumber, srcFile))
End If
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
Throw New FormatException(String.Format("Could not read data line {0} in ""{1}"".", lineNumber, srcFile))
End Try
lineNumber += 1
End While
End Using
' Output the data
'TODO: Write code to output files to a different directory.
Dim destDir = Path.GetDirectoryName(srcFile)
Dim fileNumber As Integer = 1
Dim headerLine = String.Join(",", headers.Select(Function(s) """" & s & """"))
'TODO: think up more meaningful names instead of x and y.
For Each x In d
Dim destFile = Path.Combine(destDir, "output" & fileNumber.ToString() & ".csv")
Using sr As New StreamWriter(destFile)
sr.WriteLine(headerLine)
For Each y In x.Value
sr.WriteLine(String.Format("""{0}"",{1}", x.Key, y))
Next
End Using
fileNumber += 1
Next
End Sub
Sub Main()
SeparateCsvToFiles("C:\temp\input.csv")
Console.WriteLine("Done.")
Console.ReadLine()
End Sub
End Module
非常感谢您的快速回复!到目前为止,这是伟大的工作。我想知道是否有一种方法可以修改sr.WriteLine部分,以防我想选择性地只写某些列。因此,输出只能是Header1、Header3和/或Header4。要修改的行应该是Dim values=String.Join、、currentRow.Skip1.SelectFunctions&s&其中它构建了类别后面的输出行部分。我相信您可以自己使用For..Next循环来完成。@user3599851抱歉,我忘了在之前的评论中添加@,以确保您得到通知。@user3599851如果我建议的答案解决了问题,您可能希望将其标记为已接受的答案,或者至少标记为有用的答案,因此,其他人可以很容易地看到它的帮助,如果他们在未来来到这个线程。是的,你一直很有帮助。。我只是一直在和你斗争。。我找不到。。下一个循环开始工作,但我确实通过添加这样的东西使它按照我想要的方式工作。。[Dim category=currentRow1 Dim values=¤tRow3&、¤tRow4&、¤tRow6&