对列表中的字符串进行计数,然后进行筛选&;匹配,在python中
我有一个单词列表,我使用python3计算每个单词组合之间的字母差异(使用a): 这张照片是:对列表中的字符串进行计数,然后进行筛选&;匹配,在python中,python,arrays,python-3.x,algorithm,string-matching,Python,Arrays,Python 3.x,Algorithm,String Matching,我有一个单词列表,我使用python3计算每个单词组合之间的字母差异(使用a): 这张照片是: AAHS AALS AAHS DAHS 我的问题:如何计算和记录字符串“DAHS”和“AALS”只有一个伙伴,“AAHS”有两个伙伴?我将过滤方向组合,其中每个target\u字符串正好有一个near\u matching\u word,因此我的最终数据(作为JSON)如下所示: [ { "target_word": "DAHS", "near_matching_word": "AAH
AAHS AALS
AAHS DAHS
我的问题:如何计算和记录字符串“DAHS”和“AALS”只有一个伙伴,“AAHS”有两个伙伴?我将过滤方向组合,其中每个target\u字符串
正好有一个near\u matching\u word
,因此我的最终数据(作为JSON)如下所示:
[
{
"target_word": "DAHS",
"near_matching_word": "AAHS"
},
{
"target_word": "AALS",
"near_matching_word": "AAHS"
}
]
(注意AAHS没有显示为目标词
)
我有一个版本使用
哪张照片
AALS
DAHS
但是我必须回到我的大列表
对,找到接近匹配的单词。当然,在我的最终版本中,listpairs
要大得多,而target\u word
可以是元组(x,y)中的第一项或第二项。您可以使用字典而不是成对列表:
pairs = {}
for x, y in itertools.combinations(w, 2):
if diff_letters(x, y) == 1:
pairs.setdefault(x, []).append(y)
pairs.setdefault(y, []).append(x)
result = [{ "target_word": key, "near_matching_word": head, } for key, (head, *tail) in pairs.items() if not tail]
print(result)
输出
[{'target_word': 'AALS', 'near_matching_word': 'AAHS'}, {'target_word': 'DAHS', 'near_matching_word': 'AAHS'}]
在对
字典中,键是目标词
,值是近匹配词
。然后使用列表理解来筛选出那些有1个以上的接近匹配的单词
import itertools
import functools
import operator
def diff_letters(a, b):
return sum(a[i] != b[i] for i in range(len(a)))
w = ['AAHS', 'AALS', 'DAHS', 'XYZA']
pairs = []
for x, y in itertools.combinations(w, 2):
if diff_letters(x, y) == 1:
pairs.append((x, y))
full_list = functools.reduce(operator.add, pairs)
result = []
for x in set(full_list):
if full_list.count(x) == 1:
pair = next((i for i in pairs if x in i))
match = [i for i in pair if i != x][0]
result.append({
"target_word": x,
"near_matching_word": match
})
print(result)
产出:
[{'target_word': 'DAHS', 'near_matching_word': 'AAHS'}, {'target_word': 'AALS', 'near_matching_word': 'AAHS'}]
其他答案保留所有配对,即使找到多个配对。因为不需要它们,这似乎浪费了内存。这个答案对于每个字符串最多只保留一对
import collections
import itertools
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
# Marker for pairs that have not been found yet.
NOT_FOUND = object()
# Collection of found pairs x => y. Each item is in one of three states:
# - y is NOT_FOUND if x has not been seen yet
# - y is a string if it is the only accepted pair for x
# - y is None if there is more than one accepted pair for x
pairs = collections.defaultdict(lambda: NOT_FOUND)
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
if pairs[x] is NOT_FOUND:
pairs[x] = y
else:
pairs[x] = None
if pairs[y] is NOT_FOUND:
pairs[y] = x
else:
pairs[y] = None
# Remove None's and change into normal dict.
pairs = {x: y for x, y in pairs.items() if y}
for x, y in pairs.items():
print("Target = {}, Only near matching word = {}".format(x, y))
输出:
Target = AALS, Only near matching word = AAHS
Target = DAHS, Only near matching word = AAHS
聪明地使用对
,将x和y作为键,每个键都附加y和x。这可能不是最简洁的方法,但它可以工作。要不要解释一下投票结果?
import collections
import itertools
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
# Marker for pairs that have not been found yet.
NOT_FOUND = object()
# Collection of found pairs x => y. Each item is in one of three states:
# - y is NOT_FOUND if x has not been seen yet
# - y is a string if it is the only accepted pair for x
# - y is None if there is more than one accepted pair for x
pairs = collections.defaultdict(lambda: NOT_FOUND)
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
if pairs[x] is NOT_FOUND:
pairs[x] = y
else:
pairs[x] = None
if pairs[y] is NOT_FOUND:
pairs[y] = x
else:
pairs[y] = None
# Remove None's and change into normal dict.
pairs = {x: y for x, y in pairs.items() if y}
for x, y in pairs.items():
print("Target = {}, Only near matching word = {}".format(x, y))
Target = AALS, Only near matching word = AAHS
Target = DAHS, Only near matching word = AAHS