Algorithm 我有一个集合,其中每个集合的元素都有值,我希望按值将元素尽可能均匀地分布在N个片段上

Algorithm 我有一个集合,其中每个集合的元素都有值,我希望按值将元素尽可能均匀地分布在N个片段上,algorithm,scala,Algorithm,Scala,我有一个集合,其中每个集合的元素都有值,我希望按值将元素尽可能均匀地分布在N个片段上 我尝试使用scala代码来完成以下结果,但从未找到一个好方法,而且很容易死掉循环 像这样: Map("table1"->100,"table2"->500,"table3"->20000,"table4"->10,"table5"->1000,"table6"->200,"table7"->10000) 对于4件,我希望得到尽可能平均的结果 Map("table7"

我有一个集合,其中每个集合的元素都有值,我希望按值将元素尽可能均匀地分布在N个片段上

我尝试使用scala代码来完成以下结果,但从未找到一个好方法,而且很容易死掉循环

像这样:

Map("table1"->100,"table2"->500,"table3"->20000,"table4"->10,"table5"->1000,"table6"->200,"table7"->10000) 
对于4件,我希望得到尽可能平均的结果

Map("table7"->10000)
Map("table3"->20000)
Map("table1"->100,"table2"->500,"table4"->10,"table4"->10,"table5"->1000,"table6"->200)
Map()
或者,它是完美的

Map("table7"->10000)
Map("table3"->20000)
Map("table1"->100,"table2"->500,"table4"->10,"table4"->10,"table6"->200)
Map("table5"->1000)

我使用了@的代码,得到的结果是重复的。10个桶。 公式数据为:

val testMap=Map("test_7d_all_qr "->1,"test_daily_advertise_position "->1,"test_7d_province "->94,"test_city_statistics "->34916,"test_30days_page_view "->11,"test_30days_ad_anomaly_auth_analyze "->26,"tt_user_grouping "->1,"oauth_refresh_tokens "->42,"tt_daily_share "->4476,"test_ad_anomaly_ip_analyze "->148,"test_7d_ad_anomaly_time_analyze "->15,"share_tool_template "->1,"test_30days_visit_duration "->70,"tt_qr_group_01 "->1,"tt_seven_user_tmp "->35890,"test_qr_code_statistics_temporary "->1,"tt_cms_scene "->1,"test_30days_terminal_analysis "->1406,"test_7d_terminal_analysis "->765,"test_7d_single_scene "->51,"test_30days_single_scene "->77,"test_monitor_daily_new_user "->1,"tt_ad_authorize "->1,"tt_sdk_version_appcount "->1,"test_ad_anomaly_auth_analyze "->428,"tt_attribute_relationship "->1,"test_7d_page_view "->9,"tt_ad_user_info "->1,"phone_model "->1,"share_tool_var "->1,"share_tool_type_template "->1,"tt_stay_logs_link "->8,"test_advertise_home_hour "->4,"test_monitor_hourly "->1,"test_7d_phonebrand "->71,"tt_cms_system "->1,"test_7d_city_statistics "->297,"test_link_cheat_protect "->1,"test_daily_advertise_link "->1,"test_30days_trend_analysis "->17,"tt_cms_log "->1,"test_hourly_advertise_position "->6,"push_join_info "->1,"test_monitor_daily_app "->5,"test_7d_event "->491,"cms_boss_words "->1,"test_daily_error "->13903,"test_user_activity "->569,"tt_event "->127,"test_link_summary_hour "->290,"test_7d_single_qr_group "->1,"tt_sdk_log "->1,"tt_code "->42,"test_page_view "->11689,"tt_7d_device_statistics "->709,"test_7d_ad_anomaly_ip_analyze "->15,"cms_wx_data "->1,"tt_sdk_version_apps "->311,"test_qr_code_statistics "->138,"test_daily_link "->709,"test_30days_event "->974,"test_7d_error "->1,"test_30days_city_statistics "->527,"tt_link_trace "->8,"test_7d_single_position "->1,"test_30days_single_link "->5,"test_daily_event "->33480,"tt_share_page "->53,"tt_30days_hierarchy_share "->1,"tt_stay_logs "->17149,"test_dailyshare_gender "->282,"test_hourly_advertise_media "->6,"test_monitor_daily_link "->4,"tt_device_statistics "->66629,"oauth_authorization_codes "->73,"test_ad_anomaly_all_analyze "->5488,"test_30days_userShare_top10 "->943,"test_daily_funnel "->782,"oauth_scopes "->1,"top100070922 "->0,"test_7d_userShare_top10 "->639,"tt_hierarchy_share "->148,"oauth_clients "->1,"user_phone "->0,"tt_sms_queue "->2,"tt_cms_article "->1,"test_monitor_daily "->1,"cms_source_type "->1,"test_30days_entrance_page "->11,"test_30days_ad_anomaly_all_analyze "->636,"oauth_jwt "->1,"tt_cms_user "->1,"tt_ci_sessions "->38,"test_7d_link_monitor "->1,"test_daily_link_monitor "->2717,"test_hourly_qr_group "->239,"tt_7d_hierarchy_share "->1,"test_30days_event_paras "->1,"tt_event_link_funnel "->2,"test_blackIP_setting "->1,"test_visit_duration "->2476,"tt_qr_group "->19,"test_hourly_scene "->41057,"test_30days_ad_anomaly_time_analyze "->26,"test_hourly_scene_group "->24908,"tt_smartLink_dict_name "->1,"test_scene_statistics "->7868,"test_30days_single_entrance_page "->59,"test_online_status "->1,"convert_path "->1,"user_group_funnel_daily "->7,"test_daily_media "->238,"test_grey_ak "->1,"cms_market "->1,"test_dailyshare_top10 "->10329,"test_hourly_qr "->334,"test_monitor_daily_trend "->8,"test_dailyshare_source "->224,"test_ad_anomaly_time_analyze "->158,"test_30days_scene_group "->34,"test_hourly_share_summary "->2895,"test_30days_single_page_view "->107,"service_user_view "->1,"tt_funnel "->1,"test_30days_single_qr_group "->1,"test_monitor_hourly_link "->12,"test_30days_error "->1,"test_daily_scene_group "->3702,"test_smartLink_day_analysis "->5770,"tt_prevent_cheat "->1,"test_hourly_advertise_link "->8,"tt_wechat_user_bind "->440049,"user_apps "->110,"test_dailyshare_page "->1394,"test_link_summary "->57,"test_smartLink_summary_analysis "->352,"share_tool_code "->1,"test_7d_trend_analysis "->15,"test_7d_event_paras "->1,"tt_sdk_history "->1,"tt_mini_code_authorize "->1,"test_visit_depth "->1852,"test_daily_entrance_page "->1254,"tt_private_construct_info "->1,"test_monitor_hourly_trend "->14,"tt_user_etl "->12914,"test_7d_scene_group "->26,"test_30days_link_summary "->1,"test_7d_visit_depth "->43,"tt_task "->1,"tt_code_share "->29,"tt_authorization "->1,"test_daily_advertise_media "->1,"share_tool_user_template "->1,"tt_30days_device_statistics "->1301,"tt_code_tool "->21,"test_30days_single_media "->2,"tt_scene_link "->1,"tt_code_sem "->2,"tt_stay_logs_media "->4,"tt_mini_radio "->1,"tt_sdk_notice "->1,"test_entrance_page "->5209,"test_terminal_analysis "->91350,"test_7d_single_entrance_page "->39,"test_7d_visit_duration "->57,"cms_analy_bind "->1,"test_30days_link_monitor "->1,"test_hourly_trend_analysis_debug "->1,"test_advertise_home "->1,"tt_stay_logs_position "->4,"test_30days_single_position "->1,"tt_smartLink_dict "->114,"test_ad_anomaly_all_analyze_hour "->1,"test_monitor_hourly_qr "->1,"test_smartLink_hour_analysis "->39287,"test_hourly_position "->843,"tt_cms_scene_type "->1,"test_dailyshare_city "->2915,"test_operation_log "->1,"oauth_access_tokens "->42,"share_tool_type "->1,"test_hourly_media "->1432,"tt_mini_routine_push "->6,"test_daily_position "->92,"test_app_summary "->1,"test_trend_analysis_debug "->1,"test_daily_phonebrand "->6565,"test_daily_share_summary "->371,"test_third_setup "->1,"user_app_relations "->20,"test_7d_single_link "->2,"tt_field_mapping "->1,"test_event_paras "->684271,"test_7d_entrance_page "->9,"test_dailyshare_user "->548969,"test_region_statistics "->11658,"test_7d_ad_anomaly_all_analyze "->396,"test_30days_visit_depth "->53,"tt_authorize "->6,"test_30days_province "->148,"tt_ad_activity_hourly "->714,"test_7d_link_summary "->1,"oauth_users "->1,"test_monitor_daily_qr "->1,"test_30days_phonebrand "->117,"test_7d_single_media "->1,"tt_db_split "->1,"test_30days_ad_anomaly_ip_analyze "->25,"service_user "->1,"test_monitor_hourly_share "->34,"test_hourly_trend_analysis "->16825,"test_30days_all_qr "->1,"test_trend_analysis "->2196,"aladdin_user "->4,"test_30days_single_qr "->3,"test_daily_page_view "->1295,"tt_stay_logs_bak "->250,"test_hourly_link "->3199,"tt_media "->3,"test_7d_ad_anomaly_auth_analyze "->15,"test_7d_single_qr "->1,"test_daily_qr_group "->37,"test_7d_single_page_view "->70,"user_feedback "->1,"test_user_activity_details "->2486,"test_monitor_daily_share "->1)
我得到了这个结果

Map(test_event_paras  -> 684271)
Map(test_dailyshare_user  -> 548969)
Map(tt_wechat_user_bind  -> 440049)
Map(test_terminal_analysis  -> 91350)
Map(test_daily_qr_group  -> 37, tt_stay_logs_bak  -> 250, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, test_monitor_hourly_trend  -> 14, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, tt_smartLink_dict  -> 114, test_trend_analysis  -> 2196, test_30days_trend_analysis  -> 17, test_7d_terminal_analysis  -> 765)
Map(aladdin_user  -> 4, test_30days_ad_anomaly_ip_analyze  -> 25, tt_stay_logs_bak  -> 250, test_30days_scene_group  -> 34, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, tt_smartLink_dict  -> 114, test_advertise_home_hour  -> 4, test_trend_analysis  -> 2196, test_7d_terminal_analysis  -> 765)
Map(aladdin_user  -> 4, test_30days_ad_anomaly_ip_analyze  -> 25, tt_stay_logs_bak  -> 250, test_30days_scene_group  -> 34, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, tt_smartLink_dict  -> 114, test_trend_analysis  -> 2196, test_7d_terminal_analysis  -> 765, tt_media  -> 3)
Map(aladdin_user  -> 4, test_30days_ad_anomaly_ip_analyze  -> 25, tt_stay_logs_bak  -> 250, test_30days_scene_group  -> 34, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, tt_smartLink_dict  -> 114, test_trend_analysis  -> 2196, test_7d_terminal_analysis  -> 765, tt_event_link_funnel  -> 2)
Map(aladdin_user  -> 4, test_30days_ad_anomaly_ip_analyze  -> 25, tt_stay_logs_bak  -> 250, test_30days_scene_group  -> 34, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, tt_smartLink_dict  -> 114, test_trend_analysis  -> 2196, test_7d_terminal_analysis  -> 765, tt_sdk_history  -> 1)
Map(aladdin_user  -> 4, test_30days_ad_anomaly_ip_analyze  -> 25, tt_stay_logs_bak  -> 250, test_30days_scene_group  -> 34, test_hourly_link  -> 3199, test_7d_single_page_view  -> 70, tt_device_statistics  -> 66629, user_phone  -> 0, test_smartLink_summary_analysis  -> 352, test_7d_single_scene  -> 51, test_user_activity  -> 569, tt_hierarchy_share  -> 148, test_daily_page_view  -> 1295, top100070922  -> 0, tt_smartLink_dict  -> 114, test_trend_analysis  -> 2196, test_7d_terminal_analysis  -> 765)

我以的方式重写了一个方法,结果是正确的

def distributeMy(elements: Map[String, Long], bucketCount: Long) = {
    implicit val ordering: Ordering[(Long, Map[String, Long])] = Ordering.by(-_._1)
    val map = elements.toList.sortBy(-_._2)
    var bucket =new mutable.TreeSet[(Long,Map[String,Long])]

    map.foreach{
      case (k:String,v:Long) =>{
        if(bucket.size < bucketCount){
          bucket.add((v,Map(k->v)))
        }
        else{
          val element = bucket.last
          bucket.remove(bucket.last)
          bucket.add((element._1 + v ,element._2 ++ Map(k->v) ))
        }
      }
    }
    bucket.toSet
  }
def distributeMy(元素:Map[String,Long],bucketCount:Long)={
隐式val排序:排序[(Long,Map[String,Long])]=ordering.by(-uu.\u 1)
val map=elements.toList.sortBy(---uu.\u2)
var bucket=new mutable.TreeSet[(Long,Map[String,Long])]
弗雷奇地图{
大小写(k:String,v:Long)=>{
如果(桶大小<桶计数){
bucket.add((v,映射(k->v)))
}
否则{
val元素=bucket.last
铲斗。拆卸(铲斗。最后一个)
bucket.add((element.\u1+v,element.\u2++映射(k->v)))
}
}
}
bucket.toSet
}

可能是这样的:

  • 从最高到最低对地图元素进行排序(根据地图条目的值)
  • 创建一组已排序的存储桶,其中包含两个值:值的总和和条目列表,并按从高到低的顺序进行排序
  • 获取最后一个bucket,并在更新sum时将列表中的元素添加到其中
  • 继续,直到处理完所有元素
  • 实施:

    import scala.collection.immutable._
    import scala.annotation.tailrec
    
    def distribute(elements: Map[String, Int], bucketCount: Int) = {
    
      //we use sorted set to make sure bucket with least sum is at the end
      implicit val ordering: Ordering[(Int, Map[String, Int])] = Ordering.by(-_._1)
    
      @tailrec
      def go(
          elements: List[(String, Int)],
          acc: SortedSet[(Int, Map[String, Int])]
      ): List[Map[String, Int]] = {
        elements match {
          case (x @ (_, value)) :: xs =>
            go(
              xs,
              //We take a bucket with the least sum of values and append a new element to it
              //with the sum updated, then we append new bucket to list and truncate it to 
              //desired elements count. 
              acc.last match {
                case (sum, bucket) => (acc + ((sum + value, bucket + x))).take(bucketCount)
              }
            )
          case Nil =>
            acc.toList.map(_._2) //at the end, we just need to drop sums and take only maps
        }
      }
    
      go(elements.toList.sortBy(-_._2), SortedSet((0 -> Map.empty[String, Int])))
    }
    
    distribute(
      Map(
        "table1" -> 100,
        "table2" -> 500,
        "table3" -> 20000,
        "table4" -> 10,
        "table5" -> 1000,
        "table6" -> 200,
        "table7" -> 10000,
        "table8" -> 1000
      ),
      4
    ).foreach(println)
    
    它将打印:

    地图(表3->20000)

    地图(表7->10000)

    地图(表5->1000)

    地图(表2->500、表6->200、表1->100、表4->10)


    您的描述/要求有点模糊。您说“希望尽可能均匀地分布元素”,但您的“完美”解决方案会在最大分布(20000)和最小分布(810)之间产生19190的差异。如果只将它们分发到两个不同的集合中,那么最大值(20000)和其他所有值(11810)之间的差异仅为8190。这不是一个更均匀的分布吗?非常感谢你的问题,如果你说分两部分比较合理,这是一个变量,可以随时修改,我只是举了一个分四部分的更完美解决方案的例子。我测试了不同数量的部分,看起来很完美,非常感谢。我去了一个正式的测试,发现平均分布的元素有重复,代码中也有错误。也许试着分离出可以重现这种意外行为的测试用例,并将其添加到问题中。我已经添加了数据,你可以做一个测试,看起来很有效。你能试着在那里复制它吗?