Ruby on rails 按特定值匹配数据集中的实例记录
我有一个解决方案,它工作得很好,但性能不高,需要一些时间才能运行。让我们从最初的两个查询(都是双连接)返回的内容开始: 第一组数据如下所示-我们将其称为Ruby on rails 按特定值匹配数据集中的实例记录,ruby-on-rails,ruby,algorithm,activerecord,Ruby On Rails,Ruby,Algorithm,Activerecord,我有一个解决方案,它工作得很好,但性能不高,需要一些时间才能运行。让我们从最初的两个查询(都是双连接)返回的内容开始: 第一组数据如下所示-我们将其称为line\u项。正如您将看到的,line\u项目没有dh\u名字key/value [ [ { pb_id: "133599.0", pbbname: "CUSTOMER", opl_amount: "101.0", ops_type:
line\u项
。正如您将看到的,line\u项目
没有dh\u名字
key/value
[
[
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "101.0",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "11.62",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133590.0",
pbbname: "CUSTOMER",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269780,
ops_order_id: 133590,
ops_driver1: 104,
ops_delivered_time: null
},
{
pb_id: "133220.0",
pbbname: "CUSTOMER",
opl_amount: "625.0",
ops_type: "D",
ops_stop_id: 269011,
ops_order_id: 133220,
ops_driver1: 62,
ops_delivered_time: "2021-04-01T12:35:00.000-05:00"
},
{
pb_id: "133357.0",
pbbname: "CUSTOMER",
opl_amount: "550.0",
ops_type: "D",
ops_stop_id: 269290,
ops_order_id: 133357,
ops_driver1: 92,
ops_delivered_time: "2021-04-01T09:38:00.000-05:00"
},
{
pb_id: "133219.0",
pbbname: "CUSTOMER",
opl_amount: "1267.06",
ops_type: "P",
ops_stop_id: 269008,
ops_order_id: 133219,
ops_driver1: 43,
ops_delivered_time: null
},
{
pb_id: "133577.0",
pbbname: "CUSTOMER",
opl_amount: "150.0",
ops_type: "P",
ops_stop_id: 269754,
ops_order_id: 133577,
ops_driver1: 94,
ops_delivered_time: null
},
{
pb_id: "133503.0",
pbbname: "CUSTOMER",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269592,
ops_order_id: 133503,
ops_driver1: 104,
ops_delivered_time: null
},
{
pb_id: "133643.0",
pbbname: "HALLMARK CARDS BERMAN BLAKE",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269895,
ops_order_id: 133643,
ops_driver1: 104,
ops_delivered_time: null
}
]
]
现在,让我们看看第二个双连接中的下一组数据,这是<代码> LyInCurrest。它看起来像这样:
[
{
pb_id: "133633.0",
pbbname: "CUSTOMER",
pb_net_rev: "250.0",
ops_driver1: 59,
ops_stop_id: 269869,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T13:07:00.000-05:00"
},
{
pb_id: "133127.0",
pbbname: "CUSTOMER",
pb_net_rev: "1147.0",
ops_driver1: 102,
ops_stop_id: 268801,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268836,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T14:38:00.000-05:00"
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268837,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133188.0",
pbbname: "CUSTOMER",
pb_net_rev: "700.0",
ops_driver1: 71,
ops_stop_id: 268924,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T08:04:00.000-05:00"
},
]
{
FIRST LAST: [
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "101.0",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "11.62",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "45.0",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "5.18",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133522.0",
pbbname: "CUSTOMER",
opl_amount: "150.0",
ops_type: "P",
ops_stop_id: 269637,
ops_order_id: 133522,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133619.0",
pbbname: "CUSTOMER",
pb_net_rev: "550.0",
ops_driver1: 11,
ops_stop_id: 269841,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T11:41:00.000-05:00"
}
],
我目前正在做的是循环遍历这两个变量,并根据这些值匹配它们
ops_stop_id, ops_driver_1, pb_id
如果这三个匹配,那么我需要在特定驱动程序的名称下构造它们,该名称只能来自具有dh_first_name
的实例。此数据结构完成后如下所示:
[
{
pb_id: "133633.0",
pbbname: "CUSTOMER",
pb_net_rev: "250.0",
ops_driver1: 59,
ops_stop_id: 269869,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T13:07:00.000-05:00"
},
{
pb_id: "133127.0",
pbbname: "CUSTOMER",
pb_net_rev: "1147.0",
ops_driver1: 102,
ops_stop_id: 268801,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268836,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T14:38:00.000-05:00"
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268837,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133188.0",
pbbname: "CUSTOMER",
pb_net_rev: "700.0",
ops_driver1: 71,
ops_stop_id: 268924,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T08:04:00.000-05:00"
},
]
{
FIRST LAST: [
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "101.0",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "11.62",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "45.0",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "5.18",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133522.0",
pbbname: "CUSTOMER",
opl_amount: "150.0",
ops_type: "P",
ops_stop_id: 269637,
ops_order_id: 133522,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133619.0",
pbbname: "CUSTOMER",
pb_net_rev: "550.0",
ops_driver1: 11,
ops_stop_id: 269841,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T11:41:00.000-05:00"
}
],
您将看到两个记录的混合,匹配的参数组织正确
这就是我目前正在解决的问题
merger = {}
line_items.each do |lines, i|
line_stops.each do |stops|
if (lines.ops_stop_id == stops.ops_stop_id && lines.ops_driver1 == stops.ops_driver1 && lines.pb_id == stops.pb_id)
stops_arr.push(stops)
merger[stops.dh_first_name + ' ' + stops.dh_last_name] = (merger[stops.dh_first_name + ' ' + stops.dh_last_name] ||= []) << lines
end
end
end
line_stops.each do |stops|
if (!stops_arr.include?(stops))
stops_arr.push(stops)
merger[stops.dh_first_name + ' ' + stops.dh_last_name] = (merger[stops.dh_first_name + ' ' + stops.dh_last_name] ||= []) << stops
end
end
合并={}
行|项。每个do |行,i|
线路停止。每个do停止|
如果(lines.ops\u stop\u id==stops.ops\u stop\u id&&lines.ops\u driver1==stops.ops\u driver1&&lines.pb\u id==stops.pb\u id)
停止推送(停止)
合并[stops.dh_first_name+“”+stops.dh_last_name]=(合并[stops.dh_first_name+“”+stops.dh_last_name]|=[])代码的时间复杂度是O(lines.size*stops.size)
这里是我的提案,它的时间复杂度约为O(lines.size+stops.size)
def merge_键(停止)
stops.dh_first_name+“”+stops.dh_last_name
结束
#请注意,下面的散列密钥代码可能不够好
def哈希_键(行)
“{lines.ops#u stop_id}{lines.ops#u driver1}{lines.pb_id}”
结束
merge=Hash.new{| Hash,key | Hash[key]=[]}
停止\u hash=hash.new
#O(线_停止。大小)
线路停止。每个do停止|
合并密钥=合并密钥(停止)
下一个if merge.hash_key?(merge_key)#因为在代码中没有添加dup停止,对吗?
合并[merge_key]如果你从数据库中取出这些数据,那么你可能应该在那里进行,而不是在Ruby中进行。我担心这就是答案,但我只是更愿意编写脚本来完成它。。。但我认为除了通过查询进行优化之外,没有其他真正的优化方法。我甚至不知道从哪里开始,通过SQL查询实现逻辑。我会尝试一下,如果不顺利,然后问一个关于模式、模型、数据示例和预期输出的问题。你能为JSON变量提供一个可运行的示例吗?它可以是抽象的,不需要拥有所有那些不能告诉我们太多的属性(显然它们对您的业务很重要,但我不关心pb_net_rev:-),或者提供模式和预期结果?所以我们可以帮助您在SQL中实现这一点。我想stops\u arr
是一个数组,对吗?如果是真的,那么我猜罪魁祸首是逻辑:!停止\u arr.包括?(停止)
。把changestops\u arr
作为散列值怎么样。