在python中使用大型列表

在python中使用大型列表,python,itertools,poker,Python,Itertools,Poker,我如何管理一个包含1亿多个字符串的庞大列表? 我如何开始处理如此庞大的列表 大型列表示例: cards = [ "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As" "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah" "2d","3d","4d","5d","6d","

我如何管理一个包含1亿多个字符串的庞大列表? 我如何开始处理如此庞大的列表

大型列表示例:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"

有很多很多的记忆。Python列表和字符串实际上相当有效,所以只要有内存,就不会有问题

这就是说,如果您存储的是专门的扑克牌,那么您肯定可以得到更紧凑的表示形式。例如,您可以使用一个字节对每张卡进行编码,这意味着您只需要一个64位int来存储整个手牌。然后,您可以将它们存储在NumPy数组中,这将比Python列表更高效

例如:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]
  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               
为了加快最后一行的速度:
hands[num]=map(卡片到字节).\uuu getitem\uuuuuuuuu,hand)


这只需要7*133784560=~1gb的内存……如果将四张卡装入每个字节中,内存就会减少(我不知道这样做的语法…

通常在编码时间和代码运行时间之间进行权衡。如果您只是想快速完成某件事,而不希望它经常运行,那么您建议的方法就可以了。如果你没有足够的RAM,你的系统会搅动虚拟内存,但是你得到答案的速度可能比学习如何编写更复杂的解决方案要快

但是,如果这是一个您希望定期使用的系统,那么您应该找到其他方法,而不是将所有内容存储在RAM中。SQL数据库可能就是您想要的。它们可能非常复杂,但因为它们几乎无处不在,所以有很多优秀的教程


您可以使用django这样的文档丰富的框架,它简化了通过ORM层对数据库的访问。

如果您只想循环所有可能的指针来计数它们或找到具有特定属性的指针,则无需将它们全部存储在内存中

您可以只使用迭代器,而不转换为列表:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."
德克萨斯州holdem扑克中的85900584手牌组合


另一个无内存的选项是使用生成器,它允许您创建一个数据流,以便按照自己的喜好进行处理。比如说

打印手的总数:

sum (1 for x in combinations(cards, 7))
打印包含球杆王牌的手数:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)
我的公共域库有一些组合函数,这将非常方便。它有一个迭代器类,可以为您提供关于组合集的信息,而无需存储它们,甚至不用运行它们。例如:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]
  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               
您可以使用Iterator.rank()函数将每只手简化为一个int,store
压缩数组中的字符串,然后使用Iterator.hand_at()按需生成它们。

处理1亿字符串的最佳方法是将它们放入数据库中。但是这里可以避免枚举1亿个字符串。您真的需要存储所有的指针吗?或者迭代器可以工作吗?我建议不要使用字符串。看起来你用一个2字节的字符串来表示卡片。您可以更有效地使用整数来表示每个cardi正在考虑将它们转换为md5并将它们存储在哈希表中以进行qucik查找,但我对python有点陌生,我想对它们做很多事情,将它们存储在磁盘上的数据文件中是否有效,或者我是否必须求助于sql db?嗯,迭代器听起来不错,我要了解更多关于这个的信息,谢谢。啊,是的,好主意TJD,我可以使用一个数组,这在内存上更好!!!我有8g内存,这足以在上面的例子中做一个len?你发现它为你正确运行了吗?我可以问一下你有多少内存吗?这是关于数组的一个很好的观点,但我不想使用python标准的库,也许我必须使用。使用数组会比sql db快得多吗?谢谢你的示例,非常快得多。如果你只需要做索引查找(例如,“给我10”),绝对没有什么比一个大数组更快。也就是说,如果你只需要这么做,我相信有一种算法可以简单地从一组组合中返回一个特定的组合…我不知道它在我的头脑中,但我相信你可以找到它或询问它。谢谢仁慈有道理,我甚至没有考虑到这一点,谢谢很多。