Python 如何从字符串而不是整数列表中获取整数列表?

Python 如何从字符串而不是整数列表中获取整数列表?,python,list,numpy,Python,List,Numpy,我想从名为a的字符串列表创建一个整数列表。我要创建的列表显示在名为b的整数列表中 a = ["ab","ac","ad","ae","af","ab","ab"] b = [1,2,3,4,5,1,1] 我已经尝试过这种解决方案,但需要很长时间才能处理成千上万的数据 a = ["ab","ac","ad","ae","af","ab","ab"] b = list(set(a)) for i in range(0,len(a)): if a[i] in b: a[i]

我想从名为a的字符串列表创建一个整数列表。我要创建的列表显示在名为b的整数列表中

a = ["ab","ac","ad","ae","af","ab","ab"]
b = [1,2,3,4,5,1,1]

我已经尝试过这种解决方案,但需要很长时间才能处理成千上万的数据

a = ["ab","ac","ad","ae","af","ab","ab"]
b = list(set(a))
for i in range(0,len(a)):
    if a[i] in b:
        a[i] = b.index(a[i])+1
print(a)

谢谢

仅根据问题中提供的值,您可以将唯一值收集到一个列表中,然后使用唯一列表的索引提供每个值的编号,以获得预期输出

a = ["ab", "ac", "ad", "ae", "af", "ab", "ab"]
unique_list = []
b = []

for value in a:
    if value not in unique_list:
        unique_list.append(value)

for value in a:
    for ndex, unique_value in enumerate(unique_list):
        if value == unique_value:
            b.append(ndex+1)
            break

print(b)
结果:

[1, 2, 3, 4, 5, 1, 1]
Average time for set:  0.8128192901611329
Average time for list:  0.6368690490722656
Average time for dict:  0.00530548095703125
也就是说,如果问题中没有更多的上下文,就无法确定字符串值转换为什么

更新:

根据评论中的问题,我还用一组测试了这一点,并对
set()
list
分别随机运行了10次40000个值。除非我在这里做错了什么,否则列表似乎运行得更快

import time
import random

times_for_set = []
times_for_list = []
times_for_dict = []

def run_comparison_set():
    a = []
    for _ in range(40000):
        x = random.choice('abcdefghijklmnopqrstuvwxyz')
        y = random.choice('abcdefghijklmnopqrstuvwxyz')
        a.append('{}{}'.format(x, y))

    unique_list = set(a)
    b = []

    start_time = time.time()
    for value in a:
        for ndex, unique_value in enumerate(unique_list):
            if value == unique_value:
                b.append(ndex+1)
                break
    times_for_set.append(time.time() - start_time)


def run_comparison_list():
    a = []
    for _ in range(40000):
        x = random.choice('abcdefghijklmnopqrstuvwxyz')
        y = random.choice('abcdefghijklmnopqrstuvwxyz')
        a.append('{}{}'.format(x, y))

    unique_list = []
    b = []
    for value in a:
        if value not in unique_list:
            unique_list.append(value)
    start_time = time.time()
    for value in a:
        for ndex, unique_value in enumerate(unique_list):
            if value == unique_value:
                b.append(ndex + 1)
                break
    times_for_list.append(time.time() - start_time)


def run_comparison_dict():
    a = []
    for _ in range(40000):
        x = random.choice('abcdefghijklmnopqrstuvwxyz')
        y = random.choice('abcdefghijklmnopqrstuvwxyz')
        a.append('{}{}'.format(x, y))

    counter = 0
    d = {}
    b = []
    start_time = time.time()
    for item in a:
        if item not in d:
            counter += 1
            d[item] = counter
        b.append(d[item])
    times_for_dict.append(time.time() - start_time)


for i in range(10):
    run_comparison_set()
    run_comparison_list()
    run_comparison_dict()

print('Average time for set: ', sum(times_for_set) / len(times_for_set))
print('Average time for list: ', sum(times_for_list) / len(times_for_list))
print('Average time for dict: ', sum(times_for_dict) / len(times_for_dict))
结果:

[1, 2, 3, 4, 5, 1, 1]
Average time for set:  0.8128192901611329
Average time for list:  0.6368690490722656
Average time for dict:  0.00530548095703125
因此,列表和集合似乎比dict慢得多

a = ["ab","ac","ad","ae","af","ab","ab"]
from itertools import count
counter = count(1)
a_dict = dict()
b = []
for elem in a:
    a_dict[elem] = a_dict.get(elem, next(counter))
    b.append(a_dict[elem])

print(b)
产出: [1,2,3,4,5,1,1]


只需记录下看到了哪些,以及它们各自的数量。如果看不到,它将捕获计数器上的下一个数字。

将字符串值转换为整数的规则是什么?是否只有4个可能的字符串值或更多?到目前为止您尝试了什么,结果如何?程序从txt文件读取,这就是为什么可能有数千个。我计划导入收到的值​​在这种情况下,字符串值不可用@mikeSo你只想给每个不同的字符串分配一个唯一的数字?创建一个以字符串作为键的字典。当你得到一个不在字典中的字符串时,增加一个计数器并将其添加到字典中。然后将字典值附加到
b
上使用字典不是更快吗?@HarshalParekh不确定。但是字典是比列表更复杂的数据结构,占用的资源也更多,所以可能不是。对于包含检查,您绝对应该更喜欢集合或字典而不是列表。举一个很小的例子,它可能会产生很大的不同,但是如果您的唯一列表变大,您将看到一个较大的O(n)与O(1)进行查找。@user3483203我刚刚用4000个随机值进行了测试,运行时间为0.16秒。对我来说足够快了。测试了40000个值,只花了1.44秒。你是用集合而不是列表计时的吗?有什么不同?这和我的方法很相似,我想这需要很长时间,我会尽量告诉你。我们的解决方案非常不同。考虑到
b
已转换为列表,如果b中的[i]呼叫很昂贵,则您的
。索引调用也很昂贵。是的,我看到通过应用你的代码,它所花费的时间比我的要少,谢谢。