C# 使用SSI加载具有非标准格式的CSV

C# 使用SSI加载具有非标准格式的CSV,c#,sql-server,csv,ssis,etl,C#,Sql Server,Csv,Ssis,Etl,我的任务是加载csv文件中的会计事务。该文件包含应用于整个文件的一行标题信息,但由于某种原因,它将数据按账号分组,位于交易数据之上,但与ID位于同一列中 "ID","Name","Date","Debit","Credit","Balance" ,,,,, "1150 - Cash in Bank",,,,, "Starting Balance",,,,,"59,612.78" 615892,"Account Name 1","5/5/2018","2,100.00",,"61,712.78"

我的任务是加载csv文件中的会计事务。该文件包含应用于整个文件的一行标题信息,但由于某种原因,它将数据按账号分组,位于交易数据之上,但与ID位于同一列中

"ID","Name","Date","Debit","Credit","Balance"
,,,,,
"1150 - Cash in Bank",,,,,
"Starting Balance",,,,,"59,612.78"
615892,"Account Name 1","5/5/2018","2,100.00",,"61,712.78"
645761,"Account Name 2","5/7/2018",,7,"61,705.78"
615892,"Account Name 3","5/8/2018",,"2,144.33","59,561.45"
713300,"Account Name 4","5/8/2018","2,144.33",,"61,705.78"
713300,"Account Name 5","5/8/2018",,"2,144.33","59,561.45"
693615,"Account Name 6","5/9/2018",,"1,650.00","57,911.45"
"Net Change",,,,,"-1,701.33"
,,,"4,244.33","5,945.66","57,911.45"
"3150 - Owner Contribution",,,,,
"Starting Balance",,,,,0
713300,"Account Name 4","5/8/2018",,"2,144.33","-2,144.33"
"Net Change",,,,,"-2,144.33"
,,,0,"2,144.33","-2,144.33"

有人能告诉我怎么处理这件事吗?我知道如何通过几个变量和逐行处理逻辑地完成这一点,但我根本不是C#或前端开发人员。我最大的问题似乎是你不能像SQL那样编写一篇文章并测试它。我可以查询表,查看数据并继续构建它,但使用C#我需要一个完整的脚本来协同工作。如何从一个小的块开始并展开?甚至像读取变量的第一个帐户名并在数据流任务中将其显示为变量一样。我可以将代码发送到某个地方,并从中获取一些信息,似乎我在网上找到的每个脚本都有一些编译错误,我还不知道如何解决这些错误。

这应该可以将所有这些信息都输入到一个数据表结构中,然后您可以使用该结构分配或执行任何操作。如果你需要一个不同类型的终端对象,让我知道

        var data = string.Empty; //String var to hold file
        var tbl = new DataTable("MyData"); //Tmp dataTable object
        using (var fs = new StreamReader(@"C:\Temp\test.csv")) //Open file
            data = fs.ReadToEnd(); //Read entirely into data variable

        var rows = data.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries); //Split into array by lines. RemoveEmpty's for end of file extra lines.

        var cnt = 0; //Counter to know header
        foreach (var row in rows) //Iterate rows
        {
            var cells = row.Split(new string[] { "\",\"" }, StringSplitOptions.None); //Split row into cells. Leave empties here cause some cells might be empty.
            if (cnt == 0) foreach (var cell in cells) //If is header row add columns
                    tbl.Columns.Add(new DataColumn(cell));
            else //Else data row
            {
                var dataRow = tbl.NewRow(); //New row
                dataRow.ItemArray = cells; //Assign cell values
                tbl.Rows.Add(dataRow); //Add row to table.
            }
            cnt++;
        }
编辑:使用和添加注释进行清理

EDIT2:如果文件太大,这里有一个流式版本:

        var cnt = 0; //Row counter
        var tbl = new DataTable("MyData"); //Tmp dataTable object
        using (var fs = new StreamReader(@"C:\Temp\test.csv")) //Load file
        {
            do //Start loop
            {
                var row = fs.ReadLine(); //Get first line
                var cells = row.Split(new string[] { "\",\"" }, StringSplitOptions.None); //Split into cells
                if (cnt == 0) //If is header row
                {
                    foreach (var cell in cells) //For each header
                        tbl.Columns.Add(new DataColumn(cell)); //Add Column
                } else { //Not header row
                    var dataRow = tbl.NewRow(); //Create new row based on tmp table
                    dataRow.ItemArray = cells; //Assign cell values
                    tbl.Rows.Add(row); //Add row to table
                }
                cnt++;
            } while (!fs.EndOfStream); //If not done loop
        }
解决方案概述 我在VB.Net中提供了我的答案,因为它可能更容易理解,尤其是你不是C#开发者

  • 数据流任务中
    平面文件源之后添加
    脚本组件
  • 在中,将所有列标记为输入列,并添加8个输出列
  • Input0\u ProcessInputRow
    中,检查ID列是否为空,它是否包含一个整数以创建输出行,否则如果它包含帐号或起始余额,请将这些值存储到变量中,否则忽略该行
详细解决方案
  • 添加平面文件连接管理器,选择文本文件
  • 将文本限定符更改为
  • 添加数据流任务
  • 在数据流任务中添加平面文件源、脚本组件和OLEDB目标
  • 在脚本组件中,选择所有列作为输入列

  • 添加8个输出列(主列+科目+起始余额)(所有类型均为
    DT_STR

  • OutputBuffer
    SynchronousInput
    属性更改为
    None
  • 选择脚本语言以
    Visual Basic
  • 在脚本编辑器中编写以下脚本

    Private AccountName as String = ""
    Private StartingBalance as String = ""
    
  • Public重写子Input0\u ProcessInputRow(ByVal行作为Input0Buffer)
    如果不是,则Row.ID_为null,并且也是
    不是String.IsNullOrEmpty(Row.ID.Trim)那么
    “跳过坏行
    如果Row.ID=“”,则退出Sub
    如果是Integer.TryParse(Row.ID,New Integer),则
    Output0Buffer.AddRow()
    Output0Buffer.ID=行.ID
    Output0Buffer.Name=行.Name
    Output0Buffer.Date=行.Date
    Output0Buffer.Debit=行.Debit
    Output0Buffer.Credit=行.Credit
    Output0Buffer.Balance=行.Balance
    Output0Buffer.Account=AccountName
    Output0Buffer.StartingBalance=StartingBalance
    Elseif Row.ID.Contains(“起始余额”)则
    StartingBalance=Row.Balance
    Elseif Row.ID.Contains(“-”)则
    AccountName=Row.ID
    其他的
    “忽略行
    出口接头
    如果结束
    如果结束
    端接头
    
  • 将输出列映射到目标列
  • 输出将是:

  • 我刚刚看到这篇文章。我在一天前经历了一次非常相似的经历,我建议运行下面的宏(它可以在Excel或CSV中运行,但如果使用CSV扩展名保存更改,则无法保存代码)

    希望这个解决方案对你有效。它对我绝对有效

    ' Add reference to Microsoft Active X Data Objects 2.8 Library                                                           
    
    Sub testexportsql()
        Dim Cn As ADODB.Connection
        Dim ServerName As String
        Dim DatabaseName As String
        Dim TableName As String
        Dim UserID As String
        Dim Password As String
        Dim rs As ADODB.Recordset
        Dim RowCounter As Long
    
        Dim NoOfFields As Integer
        Dim StartRow As Long
        Dim EndRow As Long
    
        Dim ColCounter As Integer
    
    
        Set rs = New ADODB.Recordset
    
    
        ServerName = "server_name" ' Enter your server name here
        DatabaseName = "db_name" ' Enter your  database name here
        TableName = "table_name" ' Enter your Table name here
        UserID = "" ' Enter your user ID here
         ' (Leave ID and Password blank if using windows Authentification")
        Password = "" ' Enter your password here
        NoOfFields = 10 ' Enter number of fields to update (eg. columns in your worksheet)
        StartRow = 2 ' Enter row in sheet to start reading  records
        EndRow = 100 ' Enter row of last record in sheet
    
         '  CHANGES
        Dim shtSheetToWork As Worksheet
        Set shtSheetToWork = ActiveWorkbook.Worksheets("sheet_name")
         '********
    
        Set Cn = New ADODB.Connection
        Cn.Open "Driver={SQL Server};Server=" & ServerName & ";Database=" & DatabaseName & _
        ";Uid=" & UserID & ";Pwd=" & Password & ";"
    
        rs.Open TableName, Cn, adOpenKeyset, adLockOptimistic
    
         'EndRow = shtSheetToWork.Cells(Rows.Count, 1).End(xlUp).Row
        For RowCounter = StartRow To EndRow
            rs.AddNew
            For ColCounter = 1 To NoOfFields
                rs(ColCounter - 1) = shtSheetToWork.Cells(RowCounter, ColCounter)
            Next ColCounter
            Debug.Print RowCounter
        Next RowCounter
        rs.UpdateBatch
    
         ' Tidy up
        rs.Close
        Set rs = Nothing
        Cn.Close
        Set Cn = Nothing
    
    End Sub