Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/opengl/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 从字符串列表中查找公共子字符串_String_List_Python 2.7 - Fatal编程技术网

String 从字符串列表中查找公共子字符串

String 从字符串列表中查找公共子字符串,string,list,python-2.7,String,List,Python 2.7,如何从字符串列表中仅提取字符串的前缀?需要注意的是,我不知道前面的前缀。只有通过这个函数,我才能知道前缀 (eg): string_list = ["test11", "test12", "test13"] # Prefix is test1 string_list = ["test-a", "test-b", "test-c"] # Prefix is test- string_list = ["test1", "test1a", "test12"] # Prefix is test1 str

如何从字符串列表中仅提取字符串的前缀?需要注意的是,我不知道前面的前缀。只有通过这个函数,我才能知道前缀

(eg):
string_list = ["test11", "test12", "test13"]
# Prefix is test1
string_list = ["test-a", "test-b", "test-c"]
# Prefix is test-
string_list = ["test1", "test1a", "test12"]
# Prefix is test1
string_list = ["testa-1", "testb-1", "testc-1"]
# Prefix is test
如果列表中的所有字符串都没有共同点,那么它应该是一个空字符串。

这样做

def get_large_subset(lis):
    k = max(lis, key=len) or lis[0]
    j = [k[:i] for i in range(len(k) + 1)]
    return [y for y in j if all(y in w for w in lis) ][-1]

>>> print get_large_subset(["test11", "test12", "test13"])
test1
>>> print get_large_subset(["test-a", "test-b", "test-c"])
test-
>>> print get_large_subset(["test1", "test1a", "test12"])
test1
>>> print get_large_subset(["testa-1", "testb-1", "testc-1"])
test
解决方案 此功能的工作原理是:

def find_prefix(string_list):
    prefix = []
    for chars in zip(*string_list):
        if len(set(chars)) == 1:
            prefix.append(chars[0])
        else:
            break
    return ''.join(prefix)
测验 输出:

['test11', 'test12', 'test13']
test1
['test-a', 'test-b', 'test-c']
test-
['test1', 'test1a', 'test12']
test1
['testa-1', 'testb-1', 'testc-1']
test
速度 计时总是很有趣的:

string_list = ["test11", "test12", "test13"]

%timeit get_large_subset(string_list)
100000 loops, best of 3: 14.3 µs per loop

%timeit find_prefix(string_list)
100000 loops, best of 3: 6.19 µs per loop

long_string_list = ['test{}'.format(x) for x in range(int(1e4))]

%timeit get_large_subset(long_string_list)
100 loops, best of 3: 7.44 ms per loop

%timeit find_prefix(long_string_list)
100 loops, best of 3: 2.38 ms per loop

very_long_string_list = ['test{}'.format(x) for x in range(int(1e6))]

%timeit get_large_subset(very_long_string_list)
1 loops, best of 3: 761 ms per loop

%timeit find_prefix(very_long_string_list)
1 loops, best of 3: 354 ms per loop
结论:以这种方式使用sets是快速的。

一行程序(使用进口的
itertools作为它的
):


加入
string\u list

所有成员共有的所有首字母列表,如果列表小于其中字符串的最大长度,则会抛出列表索引超出范围错误。像
get\u large\u subset([“testt1”,“testt2”])
很抱歉给您带来了困惑。。这对我来说也很好:-)使用set检查(字符)列表元素是否相同的好方法。
string_list = ["test11", "test12", "test13"]

%timeit get_large_subset(string_list)
100000 loops, best of 3: 14.3 µs per loop

%timeit find_prefix(string_list)
100000 loops, best of 3: 6.19 µs per loop

long_string_list = ['test{}'.format(x) for x in range(int(1e4))]

%timeit get_large_subset(long_string_list)
100 loops, best of 3: 7.44 ms per loop

%timeit find_prefix(long_string_list)
100 loops, best of 3: 2.38 ms per loop

very_long_string_list = ['test{}'.format(x) for x in range(int(1e6))]

%timeit get_large_subset(very_long_string_list)
1 loops, best of 3: 761 ms per loop

%timeit find_prefix(very_long_string_list)
1 loops, best of 3: 354 ms per loop
''.join(x[0] for x in it.takewhile(lambda x: len(set(x)) == 1, zip(*string_list)))