MySQL到JSON的不一致提取

MySQL到JSON的不一致提取,mysql,sql,json,mongodb,go,Mysql,Sql,Json,Mongodb,Go,我有一个MySQL数据库,共有6个表和大约200万行 我想将所有数据迁移到MongoDB中 我决定将SQL表转换为JSON并将其导入MongoDB 我在Golang中编写了一个程序来提取数据并将其输出为JSON 这是程序的主要功能: func main() { // Open a database connection var err error db, err = sql.Open("mysql", "root:password@tcp(127.0.0.1:3306)/

我有一个MySQL数据库,共有6个表和大约200万行

我想将所有数据迁移到MongoDB中

我决定将SQL表转换为JSON并将其导入MongoDB

我在Golang中编写了一个程序来提取数据并将其输出为JSON

这是程序的主要功能:

func main() {
    // Open a database connection
    var err error
    db, err = sql.Open("mysql", "root:password@tcp(127.0.0.1:3306)/employees")
    checkErr(err)
    // Check if reachable
    if err = db.Ping(); err != nil {
        log.Fatal("Database is unreachable:", err)
    }
    // Populate variables with data
    err = populateVars()
    checkErr(err)
    // Marshal variables into JSON
    binaryJSON, err := json.Marshal(collection)
    checkErr(err)
    // Write JSON to a file
    err = writeStringToFile("/home/user01/Temporary/sql2data.json", string(binaryJSON))
    checkErr(err)
}
问题是输出不一致

每次我运行程序时,生成的文件都有不同的大小,并且缺少一些随机场

这可能是什么原因造成的

这似乎不是程序逻辑的问题,因为所有操作都执行无误,并且大多数字段都填充得很好

我是不是读得太快了,以至于有些东西偶尔会丢失

还是我还缺少什么

编辑:

大部分工作发生在
populateVars()
函数调用中

它有多个代码块,用于执行给定的SQL查询并根据模式填充结构变量

这就是这样一个街区:

rows, err = db.Query("SELECT emp_no, dept_emp.dept_no, dept_name, from_date, to_date FROM dept_emp JOIN departments ON departments.dept_no = dept_emp.dept_no;")
checkErr(err)
i := 0
for rows.Next() {
    var id int
    var depNumber string
    var depName string
    var fromDate string
    var toDate string
    var position = "Employee"
    err = rows.Scan(&id, &depNumber, &depName, &fromDate, &toDate,)
    // For debugging purposes:
    fmt.Println(id, depNumber, depName, fromDate, toDate, position, i)
    if err != nil {
        return err
    }
    for i := range collection {
        if collection[i].ID == id {
            collection[i].Departments = append(collection[i].Departments, Department{DepartmentNumber: depNumber, DepartmentName: depName, FromDate: fromDate, ToDate: toDate, Position: position})
            // For debugging purposes:
            fmt.Println(collection[i].Departments)
        }
    }
    i++
}
下面是指向整个程序的GitHub链接:

编辑2:

问题似乎与查询超时有关

每个查询执行大约需要10分钟,但在大约6分钟后,我收到此错误,程序停止执行查询:

[mysql] 2017/04/29 17:35:16 packets.go:66: unexpected EOF
[mysql] 2017/04/29 17:35:16 packets.go:412: busy buffer
2017/04/29 17:35:16 driver: bad connection
MySQL日志文件中显示:

2017-04-29T16:28:49.975805Z 102 [Note] Aborted connection 102 to db: 'employees' user: 'root' host: 'localhost' (Got timeout writing communication packets)
到目前为止,我尝试使用MySQL变量来禁用可能出现的任何超时,但运气不佳

我认为问题可能在于Go的mysql驱动程序。

考虑使用和

该程序所做的唯一事情就是嵌入1对多和多对多文档,这可以通过聚合框架轻松完成

一个逐步的例子:

  • 从mysql导出csv

    SELECT * from employees INTO OUTFILE '/tmp/employees.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from salaries INTO OUTFILE '/tmp/salaries.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from titles INTO OUTFILE '/tmp/titles.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from departments INTO OUTFILE '/tmp.departments.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from dept_emp INTO OUTFILE '/tmp/dept_emp.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from dept_manager INTO OUTFILE '/tmp/dept_manager.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    
  • 将csv导入mongo(根据您的模式定义“字段规范”,参见员工字段规范示例)

  • 从mongo shell中删除导入的集合

    db.tmp_employees.aggregate([
        // 1-to-many joins
        {$lookup: {
            from: 'tmp_salaries',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'salaries'
        }},
        {$lookup: {
            from: 'tmp_titles',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'titles'
        }},
        // many-to-many joins
        {$lookup: {
            from: 'tmp_dept_emp',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'dept_emp'
        }},
        {$lookup: {
            from: 'tmp_dept_manager',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'dept_manager'
        }},
        {$unwind: { path: '$dept_emp', preserveNullAndEmptyArrays: true }},
        {$lookup: {
            from: 'tmp_departments',
            localField: 'dept_emp.dept_no',
            foreignField: 'dept_no',
            as: 'dept_emp_deps'
        }},    
        {$unwind: { path: '$dept_emp_deps', preserveNullAndEmptyArrays: true }},
        {$group: {
            _id: '$_id',
            root: {$first: '$$ROOT'},
            dept_manager: {$first: '$dept_manager'},
            departments_emp: {$push: {
                department_number: '$dept_emp.emp_no',
                department_name: '$dept_emp_deps.dept_name',
                from_date: '$dept_emp.from_date',
                to_date: '$dept_emp.to_date',
                position: '$dept_emp.position'
            }},
        }},
        {$unwind: { path: '$dept_manager', preserveNullAndEmptyArrays: true }},
        {$lookup: {
            from: 'tmp_departments',
            localField: 'dept_manager.dept_no',
            foreignField: 'dept_no',
            as: 'dept_manager_deps'
        }},    
        {$unwind: { path: '$dept_manager_deps', preserveNullAndEmptyArrays: true }},
        {$group: {
            _id: '$_id',
            root: {$first: '$root'},
            departments_emp: {$first: '$departments_emp'},
            departments_manager: {$push: {
                department_number: '$dept_manager.emp_no',
                department_name: '$dept_manager_deps.dept_name',
                from_date: '$dept_manager.from_date',
                to_date: '$dept_manager.to_date',
                position: '$dept_manager.position'
            }},
        }},
        // combine departments to a single array
        {$project: {
            root: 1,
            departments_all: {$concatArrays: [ "$departments_emp", "$departments_manager" ] }
        }},
        //final reshape
        {$project: {
            id: '$root.id',
            birth_date: '$root.birth_date',
            first_name: '$root.first_name',
            last_name: '$root.last_name',
            gender: '$root.gender',
            hire_date: '$root.hire_date',
            salaries: '$root.salaries',
            titles: '$root.titles',
            departments: {$filter: {
                input: "$departments_all",
                as: "departments",
                cond: { $ne: [ "$$departments", {} ] }}}
        }},
        { $out : "employees" }
    ])
    
    db.tmp_employees.drop();
    db.tmp_salaries.drop();
    db.tmp_titles.drop();
    db.tmp_departments.drop();
    db.tmp_dept_emp.drop();
    db.tmp_dept_manager.drop();
    

  • 你知道你可以使用CSV:and。不容易,我需要重新构造模式,而不仅仅是转储所有表。我通常使用聚合来实现这一点,如果你做了一些非常复杂的事情,我认为不一致性可能来自于此。问题中的代码不足以回答这一部分。谢谢,我添加了更长的解释。在读取每一行时,我通常只是将结构变量附加到结构片上。集合的大小是否与查询返回的行数相同?也就是说,您是否已将集合预先分配给已知数量的返回行?所有查询(其结果用于填充相同的
    集合
    切片)是否返回相同的行数?
    db.tmp_employees.drop();
    db.tmp_salaries.drop();
    db.tmp_titles.drop();
    db.tmp_departments.drop();
    db.tmp_dept_emp.drop();
    db.tmp_dept_manager.drop();