Groovy列表/映射的性能问题
我有一个比较两个列表并找出差异的代码,到目前为止还不错,它适用于小列表。现在我正在测试大量的列表。 其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗?如何减少处理时间Groovy列表/映射的性能问题,groovy,Groovy,我有一个比较两个列表并找出差异的代码,到目前为止还不错,它适用于小列表。现在我正在测试大量的列表。 其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗?如何减少处理时间 def list1 = [ [cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"], [cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G123
def list1 = [
[cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"],
[cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G12356"],
[cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12300"],
[cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
def list2 = [
[name:"testname1",cuInfo:"T12",service:"3",startDate:"14-02-16 10:00",appId:"G12351"],
[name:"testname1",cuInfo:"T13",service:"3",startDate:"14-01-16 13:00",appId:"G12352"],
[name:"testname1",cuInfo:"T16",service:"3",startDate:"14-01-16 13:00",appId:"G12353"],
[name:"testname2",cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12301"],
[name:"testname3",cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"],
[name:"testname3",cuInfo:"T18",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
def m1 = [:]
def m2 = [:]
def rows = list1.collect { me ->
[me, list2.find { it.cuInfo == me.cuInfo && it.service == me.service }]
}.findAll {
it[1]
}.findAll {
/*
* This is where the differences are identified.
* The 'name' attribute is excluded from the comparison,
* by including only the desired attributes.
*/
it[0] != it[1].subMap(['cuInfo', 'service', 'startDate', 'appId'])
}.collect {
/*
* At this point the list only contains the row pairs
* which are different. This step identifies which columns
* are different using asterisks.
*/
(m1, m2) = it
m1.keySet().each { key ->
if(m1[key] != m2[key]) {
m1[key] = "*${m1[key]}*"
m2[key] = "*${m2[key]}*"
}
}
[m1, m2]
}.collect {
[it[0].values(), it[1].values()].flatten() as String[]
}
也许这会有点帮助。我没有时间进行测试,但是您的代码有很多收集和查找所有可能导致性能问题的内容
def results = []
list1.each{ lst1 ->
def list1WithDifferences = []
def list2WithDifferences = []
def add = false
def match = list2.find{ lst2 -> lst2.cuInfo == lst1.cuInfo && lst2.service == lst1.service }
match.each{k, v ->
if(k != 'name'){
if(v != lst1[k]){
add = true
list1WithDifferences << "*${lst1[k]}*"
list2WithDifferences << "*${v}*"
}else{
list1WithDifferences << v
list2WithDifferences << v
}
}else{
list2WithDifferences << v
}
}
if(add){
results << list1WithDifferences + list2WithDifferences
}
}
println(results)
def结果=[]
列表1.每个{lst1->
def list1WithDifferences=[]
def list2WithDifferences=[]
def add=false
def match=list2.find{lst2->lst2.cuInfo==lst1.cuInfo&&lst2.service==lst1.service}
匹配。每个{k,v->
如果(k!=“name”){
如果(v!=lst1[k]){
加法=真
List1ValueStudio做一些分析来获得实际的性能瓶颈数据,但是你应该考虑使用读数据对象而不是映射,可能是一些@编译或实际的java代码。而且,大多数Groovy结构的性能开销大于Groovy在覆盖下的很多性能。ttings-Xmx3G
?大列表的第一个collect
最坏情况是90B迭代,以返回300K个新列表…运行它,同时观察jvisualvm或jmc下的内存使用情况,以查看您是否加满了ram并强制执行了大量操作GC@tim_yates我知道瓶颈是第一次收集我在两台不同的机器上尝试了该代码,它给出了相同的结果两台机器具有相同的ram和默认jvm设置?如果内存是唯一的,那么结果将是相同的issue@tim_yates,机器不同,一台是linux,另一台是windows,JVM也不同。但无论如何,对于数据处理,如查找差异,我将尝试使用存储过程,其余部分使用Groovy。谢谢,这有点帮助,但处理时间仍然很长。我认为您可能还有其他问题。我在Groovy控制台中使用300000个映射测试了此过程,耗时6000毫秒,而原来的一个甚至无法完成并冻结了我的控制台。