Python 如何有效地将特定数字匹配到数字集？_Python

Python 如何有效地将特定数字匹配到数字集？

python

Python 如何有效地将特定数字匹配到数字集？,python,Python,我有一个在txt文件中包含2375013个唯一数字的数字集。数据结构如下所示： 11009 900221 2 3 4930568 293 102 6 def get_US_users_IDs(filepath, mode): 7 IDs = [] 8 with open(filepath, mode) as f: 9 for line in f: 10 sp = line.strip() 11

我有一个在txt文件中包含2375013个唯一数字的数字集。数据结构如下所示：

   6 def get_US_users_IDs(filepath, mode):
   7     IDs = []
   8     with open(filepath, mode) as f:
   9         for line in f:
  10             sp = line.strip()
  11             for id in sp:
  12                 IDs.append(id.lower())
  13         return IDs


  75         IDs = "|".join(get_US_users_IDs('/nas/USAuserlist.txt', 'r'))
  76         matcher = re.compile(IDs)
  77         if matcher.match(user_id):
  78             number_of_US_user += 1
  79             text = tweet.split('\t')[3]

我想将另一个数据行中的数字与用于提取所需数据的数字集相匹配。所以，我这样编码：

   6 def get_US_users_IDs(filepath, mode):
   7     IDs = []
   8     with open(filepath, mode) as f:
   9         for line in f:
  10             sp = line.strip()
  11             for id in sp:
  12                 IDs.append(id.lower())
  13         return IDs


  75         IDs = "|".join(get_US_users_IDs('/nas/USAuserlist.txt', 'r'))
  76         matcher = re.compile(IDs)
  77         if matcher.match(user_id):
  78             number_of_US_user += 1
  79             text = tweet.split('\t')[3]

但是跑步需要很多时间。有什么办法可以减少运行时间吗？

据我所知，文件中有大量id，您想知道此文件中是否有特定的用户id

您可以使用python集

fd = open(filepath, mode);
IDs = set(int(id) for id in fd)
...
if user_id in IDs:
  number_of_US_user += 1
  ...

我的理解是，您在一个文件中有大量id，您想知道此文件中是否有特定的用户id

您可以使用python集

fd = open(filepath, mode);
IDs = set(int(id) for id in fd)
...
if user_id in IDs:
  number_of_US_user += 1
  ...

在这种情况下我需要使用“set”吗？数字集已经只有唯一的数字。我刚试过你的建议，但也试了很多次。嗯@MINSUPARK：如果ti解决了您的问题，您应该通过单击左侧的复选标记来接受此答案。在这种情况下，我需要使用“设置”吗？数字集已经只有唯一的数字。我刚试过你的建议，但也试了很多次。嗯@如果我解决了你的问题，你应该点击左边的复选标记来接受这个答案。改变你存储数字的方式。将它们放在sqlite数据库中并编制索引。然后使用SQL进行查询。如果您的数据不适合内存，sqlite确实是一个很好的解决方案。这里不是这样（200万整数需要小于20 Mb）。请更改存储数字的方式。将它们放在sqlite数据库中并编制索引。然后使用SQL进行查询。如果您的数据不适合内存，sqlite确实是一个很好的解决方案。这里的情况并非如此（200万整数占用的空间小于20 Mb）。