Groovy列表/映射的性能问题_Groovy

Groovy列表/映射的性能问题

groovy

Groovy列表/映射的性能问题,groovy,Groovy,我有一个比较两个列表并找出差异的代码，到目前为止还不错，它适用于小列表。现在我正在测试大量的列表。其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗？如何减少处理时间 def list1 = [ [cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"], [cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G123

我有一个比较两个列表并找出差异的代码，到目前为止还不错，它适用于小列表。现在我正在测试大量的列表。其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗？如何减少处理时间

def list1 = [ [cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"], [cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G12356"], [cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12300"], [cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"] ] def list2 = [ [name:"testname1",cuInfo:"T12",service:"3",startDate:"14-02-16 10:00",appId:"G12351"], [name:"testname1",cuInfo:"T13",service:"3",startDate:"14-01-16 13:00",appId:"G12352"], [name:"testname1",cuInfo:"T16",service:"3",startDate:"14-01-16 13:00",appId:"G12353"], [name:"testname2",cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12301"], [name:"testname3",cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"], [name:"testname3",cuInfo:"T18",service:"10",startDate:"26-02-16 10:20",appId:"G12999"] ] def m1 = [:] def m2 = [:] def rows = list1.collect { me -> [me, list2.find { it.cuInfo == me.cuInfo && it.service == me.service }] }.findAll { it[1] }.findAll { /* * This is where the differences are identified. * The 'name' attribute is excluded from the comparison, * by including only the desired attributes. */ it[0] != it[1].subMap(['cuInfo', 'service', 'startDate', 'appId']) }.collect { /* * At this point the list only contains the row pairs * which are different. This step identifies which columns * are different using asterisks. */ (m1, m2) = it m1.keySet().each { key -> if(m1[key] != m2[key]) { m1[key] = "*${m1[key]}*" m2[key] = "*${m2[key]}*" } } [m1, m2] }.collect { [it[0].values(), it[1].values()].flatten() as String[] }

也许这会有点帮助。我没有时间进行测试，但是您的代码有很多收集和查找所有可能导致性能问题的内容

def results = [] list1.each{ lst1 -> def list1WithDifferences = [] def list2WithDifferences = [] def add = false def match = list2.find{ lst2 -> lst2.cuInfo == lst1.cuInfo && lst2.service == lst1.service } match.each{k, v -> if(k != 'name'){ if(v != lst1[k]){ add = true list1WithDifferences << "*${lst1[k]}*" list2WithDifferences << "*${v}*" }else{ list1WithDifferences << v list2WithDifferences << v } }else{ list2WithDifferences << v } } if(add){ results << list1WithDifferences + list2WithDifferences } } println(results)

def结果=[] 列表1.每个{lst1-> def list1WithDifferences=[] def list2WithDifferences=[] def add=false def match=list2.find{lst2->lst2.cuInfo==lst1.cuInfo&&lst2.service==lst1.service} 匹配。每个{k，v-> 如果（k！=“name”）{ 如果（v！=lst1[k]）{ 加法=真 List1ValueStudio做一些分析来获得实际的性能瓶颈数据，但是你应该考虑使用读数据对象而不是映射，可能是一些@编译或实际的java代码。而且，大多数Groovy结构的性能开销大于Groovy在覆盖下的很多性能。ttings-Xmx3G ？大列表的第一个collect 最坏情况是90B迭代，以返回300K个新列表…运行它，同时观察jvisualvm或jmc下的内存使用情况，以查看您是否加满了ram并强制执行了大量操作GC@tim_yates我知道瓶颈是第一次收集我在两台不同的机器上尝试了该代码，它给出了相同的结果两台机器具有相同的ram和默认jvm设置？如果内存是唯一的，那么结果将是相同的issue@tim_yates，机器不同，一台是linux，另一台是windows，JVM也不同。但无论如何，对于数据处理，如查找差异，我将尝试使用存储过程，其余部分使用Groovy。谢谢，这有点帮助，但处理时间仍然很长。我认为您可能还有其他问题。我在Groovy控制台中使用300000个映射测试了此过程，耗时6000毫秒，而原来的一个甚至无法完成并冻结了我的控制台。