Javascript 将大型JSON文件保存到MySQL时性能更佳_Javascript_Mysql

Javascript 将大型JSON文件保存到MySQL时性能更佳

javascript mysql

Javascript 将大型JSON文件保存到MySQL时性能更佳,javascript,mysql,Javascript,Mysql,我有个问题所以，我的故事是：我有一个30 GB的大文件（JSON），包含特定时间段内所有reddit帖子。我不会将每个帖子的所有值都插入表中我遵循了，他用Python编写了我要做的事情。我试着跟随（在NodeJS中），但是当我测试它时，它太慢了。它每5秒插入一行。还有500000多个reddit帖子，这确实需要几年的时间这里有一个例子，说明我在做什么 var readStream = fs.createReadStream(location) oboe(readStream)

我有个问题

所以，我的故事是：

我有一个30 GB的大文件（JSON），包含特定时间段内所有reddit帖子。我不会将每个帖子的所有值都插入表中

我遵循了，他用Python编写了我要做的事情。我试着跟随（在NodeJS中），但是当我测试它时，它太慢了。它每5秒插入一行。还有500000多个reddit帖子，这确实需要几年的时间

这里有一个例子，说明我在做什么

var readStream = fs.createReadStream(location)
oboe(readStream)
    .done(async function(post) {
        let { parent_id, body, created_utc, score, subreddit } = data;
        let comment_id = data.name;

        // Checks if there is a comment with the comment id of this post's parent id in the table
        getParent(parent_id, function(parent_data) {
            // Checks if there is a comment with the same parent id, and then checks which one has higher score
            getExistingCommentScore(parent_id, function(existingScore) {

                // other code above but it isn't relevant for my question

                // this function adds the query I made to a table
                addToTransaction()

            })
        })
})

基本上，它的作用是启动一个读取流，然后将其传递给一个名为的模块

然后我得到JSON作为回报。然后，它检查数据库中是否已经保存了父项，然后检查是否存在具有相同父项id的现有注释

我需要使用这两个函数来获取所需的数据（仅获取“最佳”注释）

这就是

addToTransaction

看起来的样子：

function addToTransaction(query) {
    // adds the query to a table, then checks if the length of that table is 1000 or more

    if (length >= 1000) {
        connection.beginTransaction(function(err) {
            if (err) throw new Error(err);

            for (var n=0; n<transactions.length;n++) {
                let thisQuery = transactions[n];
                connection.query(thisQuery, function(err) {
                    if (err) throw new Error(err);
                })
            }

            connection.commit();
        })
    }
}

函数addToTransaction（查询）{
//将查询添加到表中，然后检查该表的长度是否为1000或更多
如果（长度>=1000）{
connection.beginTransaction（函数（错误）{
如果（错误）抛出新错误（错误）；
对于（var n=0；nummm…..您需要一个数据库和一些数据筛选，以便只查询给定时间所需的内容time@charlietfl我可以得到所有的数据，但问题是内存。这就是为什么你想使用一个数据库并查询你需要的东西。你测量了所有独立的步骤吗？如果是这样，有一种非常快速的方法在mysql中加载数据。请参阅“LOAD DATA Inflie”，它应该在几分钟内加载一个这样的集合（假设以GB为单位的数据量合理）。JSON文件只是一组mysql查询，即插入到

等中？还是您试图传输到数据库中的实际的JSON格式的reddit数据？