Python 在给定拆分原始列表的条件下,计算两个列表中的匹配数

Python 在给定拆分原始列表的条件下,计算两个列表中的匹配数,python,list,split,boolean,Python,List,Split,Boolean,我有一个浮动列表,其中包含一些隐藏的“级别”信息,编码在浮动的比例中,我可以将浮动的“级别”拆分为: import math import numpy as np all_scores = [1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.658458232

我有一个浮动列表,其中包含一些隐藏的“级别”信息,编码在浮动的比例中,我可以将浮动的“级别”拆分为:

import math
import numpy as np

all_scores = [1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]

easy, med, hard = [], [], []

for i in all_scores:
    if i > math.exp(50):
        easy.append(i)
    elif i > math.exp(10):
        med.append(i)
    else:
        hard.append(i)

print ([easy, med, hard])
[out]:

[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]
[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]
4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0
我还有另一个列表,它将对应于
all_分数
列表:

input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]
我需要检查easy、med和hard中有多少与所有分数匹配,我可以这样做,以获得flatten
all_分数列表中是否存在匹配的布尔值,如下所示:

matches = [i == j for i, j in zip(input_scores, all_scores)]
print ([i == j for i, j in zip(input_scores, all_scores)])
[out]:

[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]
[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]
4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0
有没有办法知道比赛中有多少简单/中等/困难以及每个级别的比赛总数?

我已经尝试过这个方法,效果很好:

matches = [int(i == j) for i, j in zip(input_scores, all_scores)]

print(sum(matches[:len(easy)]) , len(easy), sum(np.array(easy) * matches[:len(easy)]) )
print(sum(matches[len(easy):len(easy)+len(med)]), len(med), sum(np.array(med) * matches[len(easy):len(easy)+len(med)]) )
print (sum(matches[len(easy)+len(med):]) , len(hard), sum(np.array(hard) * matches[len(easy)+len(med):]) )
[out]:

[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]
[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]
4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0

但是必须有一种不太冗长的方法来实现相同的输出。

您可以使用
dict

k = ('easy', 'meduim', 'hard')    
param = dict.fromkeys(k,0) ; outlist = []
for index,i in enumerate(range(0, len(matches), 10)):
    count = {k[index]:sum(matches[i:i + 10])}
    outlist.append(count)

print(outlist)
[{'easy': 4}, {'meduim': 6}, {'hard': 6}]

我不确定这个方法是否更详细,但我会使用
np.inad
来匹配分数:

# we need numpy arrays
easy = np.array(easy)
med = np.array(med)
hard = np.array(hard)

for level in [easy, med, hard]:
    matches = level[np.where(np.in1d(level, input_scores))]
    print(len(matches), len(level), np.sum(matches))
这段代码不会产生与您所拥有的相同的输出,但是我认为您提供的数据已经被破坏了。例如,在
-数组中有两个
7474.0
4354.0
副本。这是预期的吗?easy数组中还有两个
6.7401171871631936e+22

在给定当前数据的情况下使用我的方法输出

5 10 3.58781622578e+24
6 10 143744715.777
8 10 53435.0
另外,我也不完全确定如何求和,所以我只是对所有匹配的分数进行求和(因此我们的值会不同)


编辑:使用匹配的
输入\u分数
代替
所有\u分数
。唯一需要改变的是,我们必须对
np.in1d
进行双重匹配:

scores = input_scores[np.where(np.in1d(input_scores, all_scores))]
for level in [easy, med, hard]:
    matches = scores[np.where(np.in1d(scores, level))]
    print(len(matches), len(level), np.sum(matches))
这就消除了以前的重复问题。输出:

4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0

编辑2:我意识到我对
np.where
的使用是多余的,可以完全删除它们

scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
    matches = scores[np.in1d(scores, level)]
    print(len(matches), len(level), np.sum(matches))
生成与第一次编辑相同的输出


编辑3:我把它们放在一个程序中。也可以使用numpy方便地进行简单/中等/困难分数的拆分。它可能会更有效,但这是相当可读的:

import math
import numpy as np

all_scores = np.array([1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0])

input_scores = np.array([0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0])

easy = all_scores[math.exp(50) < all_scores]
med = all_scores[(math.exp(10) < all_scores)*(all_scores < math.exp(50))] # * is boolean `and`
hard = all_scores[all_scores < math.exp(10)]

scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
    matches = scores[np.in1d(scores, level)]
    print(len(matches), len(level), np.sum(matches))
导入数学
将numpy作为np导入
所有分数=np.数组([1.0369411057174144e+22,2.7997409854370188e+23,1.296176382146768e+23,
6.7401171871631936e+22、6.7401171871631936e+22、2.022035156148958e+24、8.65845823274041e+23、,
1.643551652621017E+24、2.307193960221247e+24、1.285806971089594e+249603539.08653573、,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0])
输入_分数=np.array(0.0、0.0、0.0、0.0、0.0、0.0、8546268.7283838385852007、0.0、0.0、0.0、8.6584585858252525252525252525252525252525258.0、8.0、8.0、8.0、8.0 0、8.46468.72838.72838.72838383838383807、0、0.28282828282843.07282828282828282828284141414141343434343434343434343441414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141414141418.077878787878(见附件,8479.0])
easy=所有分数[数学实验(50)<所有分数]
med=所有分数[(数学实验(10)<所有分数)*(所有分数<数学实验(50))]#*是布尔值,``
hard=所有分数[所有分数
听起来像是

如果您还没有遇到它,
计数器
类似于dict,但在
.update()
等方法中,它们只是被添加到新值中,而不是用新值替换旧值。因此:

from collections import Counter

counter = Counter({'a': 2})
counter.update({'a': 3})
counter['a']
> 5
因此,您可以通过以下代码获得上述结果:

from collections import Counter

matches, counts, scores = [
    Counter({'easy': 0, 'med': 0, 'hard': 0}) for _ in range(3)
]

for score, inp in zip(all_scores, input_scores):
    category = (
        'easy' if score > math.exp(50) else
        'med' if score > math.exp(10) else
        'hard'
    )
    matches.update({category: score == inp})
    counts.update({category: 1})
    scores.update({category: score if score == inp else 0})

for cat in ('easy', 'med', 'hard'):
    print(matches[cat], counts[cat], scores[cat])

您可以使用一系列DICT作为查找表:

scores=defaultdict(list)#跟踪哪些数字属于类别
values=defaultdict(int)#对看到的数字进行计数
对于i,在所有的大学分数中:
如果i>math.exp(50):
值[“简单”]+=1
分数[i]=“容易”
elif i>数学实验(10):
值[“中等”]+=1
分数[i]=“中等”
其他:
值[“硬”]+=1
分数[i]=“难”
0.0、0.0、0.0、0.0、8.65845858582525252525258、8.6558585858585858585858582741414141414141414141414141414141414141414141414141414141414141414141418.5、10.0、6.0、6.0、6.7.7、6.7.7、6.7、6.7.7、6.7、6.7.4141414141414141781871871871871871871871616161637373737417、6、6、6.7、6.7、8.784646268.728.728.72838.7283838.7、0、0、0.0、0、0、0.0.0.0、0.0、0、0、0.4141464646464626268.7826268.78268.0,0.08479.0]
#找到您输入的类别
r=[(分数[i],i)对于输入中的i_分数,如果i在分数中]
#加入你的分类,以获得计数
res=defaultdict(列表)
对于r中的k,v:
res[k].追加(v)
对于k,v在res.items()中:
打印k,len(v),值[k],和(v)
>>>中等61014744715.777
硬61037326.0
easy 4 10 3.52041505391e+24

这是一个numpy解决方案,它使用
数字化来创建类别,并使用
bincount
对匹配项进行计数和求和。作为免费奖励,还为剩余项创建了这些统计数据

categories = 'hard', 'med', 'easy'

# get group membership by splitting at e^10 and e^50
# the 'right' keyword tells digitize to include right boundaries
cat_map = np.digitize(all_scores, np.exp((10, 50)), right=True)
# cat_map has a zero in all the 'hard' places of all_scores
# a one in the 'med' places and a two in the 'easy' places

# add a fourth group to mark all non-matches
# we have to force at least one np.array for element-by-element
# comparison to work
cat_map[np.asanyarray(all_scores) != input_scores] = 3

# count
numbers = np.bincount(cat_map)
# count again, this time using all_scores as weights
sums = np.bincount(cat_map, all_scores)

# print
for c, n, s in zip(categories + ('unmatched',), numbers, sums):
    print('{:12}  {:2d}  {:6.4g}'.format(c, n, s))

# output:
#
# hard           6  3.733e+04
# med            6  1.437e+08
# easy           4  3.52e+24
# unmatched     14  5.159e+24
虽然你的问题已经得到了回答,但我还是想尝试一下(为了练习)。函数给出了预期的输出,但保罗·潘泽的解决方案是目前为止最理想的解决方案。:)


值适用于所有的\u分数,输入的\u分数是非唯一的。唯一约束它们的是顺序以及它们的值是否匹配。酷,我没有听说过
np。数字化
!!顺便说一句,什么是“不匹配”?为什么会出现不匹配的情况?@alvas我指的是那些
input_scores
all_scores
不匹配的情况。他们必须被转移到一个额外的组中,这样他们就不会与其他三个组中的任何一个一起计算。啊,这是有意义的。谢谢你的解释!