Javascript 将大型数据集加载到crossfilter/dc.js

Javascript 将大型数据集加载到crossfilter/dc.js,javascript,json,d3.js,crossfilter,dc.js,Javascript,Json,D3.js,Crossfilter,Dc.js,我构建了一个具有多个维度和组的交叉过滤器,以使用dc.js直观地显示数据。可视化的数据是自行车出行数据,每次出行都将加载到。现在,有超过75万条数据。我正在使用的JSON文件是70MB大的,在接下来的几个月里,当我收到更多的数据时,它只需要增加 因此,我的问题是,如何使数据更精简,以便更好地扩展?现在,我的互联网连接大约需要15秒才能加载,但我担心一旦我有太多的数据,加载时间就会太长。此外,我尝试(未成功)在加载数据时显示进度条/微调器,但没有成功 我需要的数据列包括开始日期、开始时间、用户类型

我构建了一个具有多个维度和组的交叉过滤器,以使用dc.js直观地显示数据。可视化的数据是自行车出行数据,每次出行都将加载到。现在,有超过75万条数据。我正在使用的JSON文件是70MB大的,在接下来的几个月里,当我收到更多的数据时,它只需要增加

因此,我的问题是,如何使数据更精简,以便更好地扩展?现在,我的互联网连接大约需要15秒才能加载,但我担心一旦我有太多的数据,加载时间就会太长。此外,我尝试(未成功)在加载数据时显示进度条/微调器,但没有成功

我需要的数据列包括开始日期、开始时间、用户类型、性别、tripduration、米数、年龄。我已将JSON中的这些字段缩短为
start\u date、start\u time、u、g、dur、m、age
,因此文件更小。在交叉过滤器的顶部有一个折线图,显示每天的总行程。下面是星期几(根据数据计算)、月份(也计算)的行图,以及用户类型、性别和年龄的饼图。下面是两个条形图,分别表示开始时间(向下舍入到小时)和持续时间(向上舍入到分钟)

该项目位于GitHub上:(数据集位于data2.json中)。我试图创建一个JSFIDLE,但它不起作用(可能是因为数据,甚至只收集了1000行并将其加载到带有
标记的HTML中):

public class MyDataModel
{
    public List<MyDatum> Data { get; set; }
}

public class MyDatum
{
    public long StartDate { get; set; }
    public long EndDate { get; set; }
    public int Duration { get; set; }
    public string Title { get; set; }
}

理想情况下,它的功能是只加载最上面图表的数据:这将很快加载,因为它只是每天的数据计数。然而,一旦它进入其他图表,它就需要越来越多的数据来深入到更精细的细节中。关于如何使其发挥作用,有什么想法吗?

我建议将JSON中的所有字段名缩短为1个字符(包括“开始日期”和“开始时间”)。这应该有点帮助。另外,请确保服务器上的压缩已打开。这样,发送到浏览器的数据将在传输过程中自动压缩,如果还没有打开的话,这将大大加快速度

为了获得更好的响应能力,我还建议首先设置交叉过滤器(空)、所有维度和组以及所有dc.js图表,然后使用Crossfilter.add()将更多数据分块添加到交叉过滤器中。最简单的方法是将数据分成小块(每个块数MB)并串行加载。因此,如果您使用的是d3.json,那么在上一次文件加载的回调中启动下一次文件加载。这会导致大量嵌套回调,这有点令人讨厌,但应该允许用户界面在加载数据时响应


最后,有了这么多数据,我相信您将开始在浏览器中遇到性能问题,而不仅仅是在加载数据时。我怀疑您已经看到了这一点,您看到的15秒暂停至少部分出现在浏览器中。您可以在浏览器的开发人员工具中通过分析进行检查。要解决这个问题,您需要分析并确定性能瓶颈,然后尝试优化这些瓶颈。此外,如果你的听众中有速度较慢的计算机,请务必在这些计算机上进行测试。

考虑一下我的课堂设计。这与你的不符,但它说明了我的观点

public class MyDataModel
{
    public List<List<string>> Data { get; set; }
}
公共类MyDataModel
{
公共列表数据{get;set;}
}
公共类MyDatum
{
公共长起始日期{get;set;}
公共长结束日期{get;set;}
公共整数持续时间{get;set;}
公共字符串标题{get;set;}
}
开始和结束日期是Unix时间戳,持续时间以秒为单位

序列化为: “{”数据“:
[{“开始日期”:1441256019,“结束日期”:1441257181, “Duration”:451,“Title”:“Rad是个很酷的词。”},…]}”

一行数据为92个字符

让我们开始压缩! 将日期和时间转换为以60为基数的字符串。 将所有内容存储在字符串数组中

var datacnt=0;
var timerId=setInterval(function () {
    // body...
    d3.select("#count-data-current").text(datacnt);
    //update visualization should go here, something like dc.redrawAll()...
},300);

oboe("relative-or-absolute path to your data(ajax)")
.node('CNT',function (count,path) {
    // body...
    d3.select("#count-data-all").text("Expecting " + count + " records");
    return oboe.drop;
})
.node('data.*', function (record, path) {
    // body...
    datacnt++;
    return oboe.drop;
})
.node('done', function (item, path) {
    // body...
    d3.select("#progress-data").text("all data loaded");
    clearTimeout(timerId);
    d3.select("#count-data-current").text(datacnt);
});
公共类MyDataModel
{
公共列表数据{get;set;}
}
序列化为: “{”数据“:[[“1pCSrd”,“1pCTD1”,“7V”,“Rad是个很酷的词。”],…]}”

一行数据现在是47个字符。 js是处理日期和时间的好库。它内置了解包base 60格式的功能

使用数组数组会降低代码的可读性,因此请添加注释以记录代码

仅加载最近的90天。缩放到30天。当用户拖动左侧范围图上的画笔时,开始以90天为单位获取更多数据,直到用户停止拖动。使用Add方法将数据添加到现有的交叉过滤器

当你添加越来越多的数据时,你会注意到你的图表的响应性越来越差。这是因为您在svg中渲染了数百甚至数千个元素。浏览器正在崩溃。使用d3量化功能将数据点分组到存储桶中。将显示的数据减少到50个存储桶

量化是值得付出努力的,也是创建具有不断增长的数据集的可伸缩图的唯一方法


您的另一个选择是放弃范围图,将数据按月、按日、按小时分组。然后添加一个日期范围选择器。由于您的数据将按月、日和小时进行分组,因此您会发现,即使您每天每小时骑一次自行车,您的结果集也不会超过8766行。

我观察到数据存在类似问题(在企业公司工作),我发现有两个想法值得尝试

  • 您的数据有规则的结构,所以您可以将键放在第一行,并且只能将数据放在下面的行中-模仿CSV(头一个,数据下一个)
  • 日期时间可以更改为历元编号(您可以将历元的起始日期移动到2015年1月1日,并在收到时进行计算)
  • 获取r时使用obee.js
    {"CNT":107498, 
     "keys": "DATACENTER","FQDN","VALUE","CONSISTENCY_RESULT","FIRST_REC_DATE","LAST_REC_DATE","ACTIVE","OBJECT_ID","OBJECT_TYPE","CONSISTENCY_MESSAGE","ID_PARAMETER"], 
     "data": [[22,202,"4.9.416.2",0,1449655898,1453867824,-1,"","",0,45],[22,570,"4.9.416.2",0,1449655912,1453867884,-1,"","",0,45],[14,377,"2.102.453.0",-1,1449654863,1468208273,-1,"","",0,45],[14,406,"2.102.453.0",-1,1449654943,1468208477,-1,"","",0,45],[22,202,"10.2.293.0",0,1449655898,1453867824,-1,"","",0,8],[22,381,"10.2.293.0",0,1449655906,1453867875,-1,"","",0,8],[22,570,"10.2.293.0",0,1449655912,1453867884,-1,"","",0,8],[22,381,"1.80",0,1449655906,1453867875,-1,"","",0,41],[22,570,"1.80",0,1449655912,1453867885,-1,"","",0,41],[22,202,"4",0,1449655898,1453867824,-1,"","",0,60],[22,381,"4",0,1449655906,1453867875,-1,"","",0,60],[22,570,"4",0,1449655913,1453867885,-1,"","",0,60],[22,202,"A20",0,1449655898,1453867824,-1,"","",0,52],[22,381,"A20",0,1449655906,1453867875,-1,"","",0,52],[22,570,"A20",0,1449655912,1453867884,-1,"","",0,52],[22,202,"20140201",2,1449655898,1453867824,-1,"","",0,40],[22,381,"20140201",2,1449655906,1453867875,-1,"","",0,40],[22,570,"20140201",2,1449655912,1453867884,-1,"","",0,40],[22,202,"16",-4,1449655898,1453867824,-1,"","",0,58],[22,381,"16",-4,1449655906,1453867875,-1,"","",0,58],[22,570,"16",-4,1449655913,1453867885,-1,"","",0,58],[22,202,"512",0,1449655898,1453867824,-1,"","",0,57],[22,381,"512",0,1449655906,1453867875,-1,"","",0,57],[22,570,"512",0,1449655913,1453867885,-1,"","",0,57],[22,930,"I32",0,1449656143,1461122271,-1,"","",0,66],[22,930,"20140803",-4,1449656143,1461122271,-1,"","",0,64],[14,1359,"10.2.340.19",0,1449655203,1468209257,-1,"","",0,131],[14,567,"10.2.340.19",0,1449655185,1468209111,-1,"","",0,131],[22,930,"4.9.416.0",-1,1449656143,1461122271,-1,"","",0,131],[14,1359,"10.2.293.0",0,1449655203,1468209258,-1,"","",0,13],[14,567,"10.2.293.0",0,1449655185,1468209112,-1,"","",0,13],[22,930,"4.9.288.0",-1,1449656143,1461122271,-1,"","",0,13],[22,930,"4",0,1449656143,1461122271,-1,"","",0,76],[22,930,"96",0,1449656143,1461122271,-1,"","",0,77],[22,930,"4",0,1449656143,1461122271,-1,"","",0,74],[22,930,"VMware ESXi 5.1.0 build-2323236",0,1449656143,1461122271,-1,"","",0,17],[21,616,"A20",0,1449073850,1449073850,-1,"","",0,135],[21,616,"4",0,1449073850,1449073850,-1,"","",0,139],[21,616,"12",0,1449073850,1449073850,-1,"","",0,138],[21,616,"4",0,1449073850,1449073850,-1,"","",0,140],[21,616,"2",0,1449073850,1449073850,-1,"","",0,136],[21,616,"512",0,1449073850,1449073850,-1,"","",0,141],[21,616,"Microsoft Windows Server 2012 R2 Datacenter",0,1449073850,1449073850,-1,"","",0,109],[21,616,"4.4.5.100",0,1449073850,1449073850,-1,"","",0,97],[21,616,"3.2.7895.0",-1,1449073850,1449073850,-1,"","",0,56],[9,2029,"10.7.220.6",-4,1470362743,1478315637,1,"vmnic0","",1,8],[9,1918,"10.7.220.6",-4,1470362728,1478315616,1,"vmnic3","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315616,1,"vmnic2","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315615,1,"vmnic1","",1,8],[9,1918,"10.7.220.6",-4,1470362727,1478315615,1,"vmnic0","",1,8],[14,205,"934.5.45.0-1vmw",-50,1465996556,1468209226,-1,"","",0,47],[14,1155,"934.5.45.0-1vmw",-50,1465996090,1468208653,-1,"","",0,14],[14,963,"934.5.45.0-1vmw",-50,1465995972,1468208526,-1,"","",0,14],
     "done" : true}
    
        //function to convert main data to array of objects
        function convertToArrayOfObjects(data) {
            var keys = data.shift(),
                i = 0, k = 0,
                obj = null,
                output = [];
    
            for (i = 0; i < data.length; i++) {
                obj = {};
    
                for (k = 0; k < keys.length; k++) {
                    obj[keys[k]] = data[i][k];
                }
    
                output.push(obj);
            }
    
            return output;
        }
    
       [["ID1","ID2","TEXT1","STATE1","DATE1","DATE2","STATE2","TEXT2","TEXT3","ID3"],
        [14,377,"2.102.453.0",-1,1449654863,1468208273,-1,"","",0,45],
        [14,406,"2.102.453.0",-1,1449654943,1468208477,-1,"","",0,45],
        [22,202,"10.2.293.0",0,1449655898,1453867824,-1,"","",0,8],
        [22,381,"10.2.293.0",0,1449655906,1453867875,-1,"","",0,8],
        [22,570,"10.2.293.0",0,1449655912,1453867884,-1,"","",0,8],
        [22,381,"1.80",0,1449655906,1453867875,-1,"","",0,41],
        [22,570,"1.80",0,1449655912,1453867885,-1,"","",0,41],
        [22,202,"4",0,1449655898,1453867824,-1,"","",0,60],
        [22,381,"4",0,1449655906,1453867875,-1,"","",0,60],
        [22,570,"4",0,1449655913,1453867885,-1,"","",0,60],
        [22,202,"A20",0,1449655898,1453867824,-1,"","",0,52]]