Groovy列表/映射的性能问题

Groovy列表/映射的性能问题,groovy,Groovy,我有一个比较两个列表并找出差异的代码,到目前为止还不错,它适用于小列表。现在我正在测试大量的列表。 其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗?如何减少处理时间 def list1 = [ [cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"], [cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G123

我有一个比较两个列表并找出差异的代码,到目前为止还不错,它适用于小列表。现在我正在测试大量的列表。 其中包含超过300000张的两张地图。处理它需要5个多小时。这正常吗?如何减少处理时间

def list1 = [
    [cuInfo:"T12",service:"3",startDate:"14-01-16 13:22",appId:"G12355"],
    [cuInfo:"T13",service:"3",startDate:"12-02-16 13:00",appId:"G12356"],
    [cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12300"], 
    [cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]
]
​
def list2 = [
    [name:"testname1",cuInfo:"T12",service:"3",startDate:"14-02-16 10:00",appId:"G12351"],
    [name:"testname1",cuInfo:"T13",service:"3",startDate:"14-01-16 13:00",appId:"G12352"],
    [name:"testname1",cuInfo:"T16",service:"3",startDate:"14-01-16 13:00",appId:"G12353"],
    [name:"testname2",cuInfo:"T14",service:"9",startDate:"10-01-16 11:20",appId:"G12301"], 
    [name:"testname3",cuInfo:"T15",service:"10",startDate:"26-02-16 10:20",appId:"G12999"],
    [name:"testname3",cuInfo:"T18",service:"10",startDate:"26-02-16 10:20",appId:"G12999"]  
]

def m1 = [:]
def m2 = [:]
def rows = list1.collect { me -> 
    [me, list2.find { it.cuInfo == me.cuInfo && it.service == me.service }] 
}.findAll {
            it[1]
}.findAll { 
    /*
     * This is where the differences are identified.
     * The 'name' attribute is excluded from the comparison,
     * by including only the desired attributes.
     */
    it[0] != it[1].subMap(['cuInfo', 'service', 'startDate', 'appId'])
}.collect {
    /*
     * At this point the list only contains the row pairs
     * which are different. This step identifies which columns
     * are different using asterisks.
     */
    (m1, m2) = it
    m1.keySet().each { key ->
        if(m1[key] != m2[key]) {
            m1[key] = "*${m1[key]}*"
            m2[key] = "*${m2[key]}*"
        }          
}

                [m1, m2]
            }.collect {
    [it[0].values(), it[1].values()].flatten() as String[]
}

也许这会有点帮助。我没有时间进行测试,但是您的代码有很多收集和查找所有可能导致性能问题的内容

def results = []
list1.each{ lst1 ->

    def list1WithDifferences = []
    def list2WithDifferences = []
    def add = false

    def match = list2.find{ lst2 -> lst2.cuInfo == lst1.cuInfo && lst2.service == lst1.service }    

    match.each{k, v -> 
        if(k != 'name'){
            if(v != lst1[k]){
                add = true
                list1WithDifferences << "*${lst1[k]}*"
                list2WithDifferences << "*${v}*"
            }else{
                list1WithDifferences << v
                list2WithDifferences << v
            }
        }else{
            list2WithDifferences << v
        }
    }

    if(add){
        results << list1WithDifferences + list2WithDifferences
    }
}
println(results)
def结果=[]
列表1.每个{lst1->
def list1WithDifferences=[]
def list2WithDifferences=[]
def add=false
def match=list2.find{lst2->lst2.cuInfo==lst1.cuInfo&&lst2.service==lst1.service}
匹配。每个{k,v->
如果(k!=“name”){
如果(v!=lst1[k]){
加法=真

List1ValueStudio做一些分析来获得实际的性能瓶颈数据,但是你应该考虑使用读数据对象而不是映射,可能是一些@编译或实际的java代码。而且,大多数Groovy结构的性能开销大于Groovy在覆盖下的很多性能。ttings
-Xmx3G
?大列表的第一个
collect
最坏情况是90B迭代,以返回300K个新列表…运行它,同时观察jvisualvm或jmc下的内存使用情况,以查看您是否加满了ram并强制执行了大量操作GC@tim_yates我知道瓶颈是第一次收集我在两台不同的机器上尝试了该代码,它给出了相同的结果两台机器具有相同的ram和默认jvm设置?如果内存是唯一的,那么结果将是相同的issue@tim_yates,机器不同,一台是linux,另一台是windows,JVM也不同。但无论如何,对于数据处理,如查找差异,我将尝试使用存储过程,其余部分使用Groovy。谢谢,这有点帮助,但处理时间仍然很长。我认为您可能还有其他问题。我在Groovy控制台中使用300000个映射测试了此过程,耗时6000毫秒,而原来的一个甚至无法完成并冻结了我的控制台。