Python 如何通过正则表达式查找数字范围

Python 如何通过正则表达式查找数字范围,python,regex,mongodb,Python,Regex,Mongodb,我有一个名为“size”的字段 其格式为: 135.0 MB 75MB 2687 MB 只有“MB” 在我的python代码中,有两个变量名为minSize和maxSize query = {} minSize = '0' maxSize = '888' if minSize is not None: minSize='0' if maxSize is not None: maxSize = '999' query['size'] = {'$gte': str(minSize)

我有一个名为“size”的字段

其格式为:

135.0 MB
75MB
2687 MB

只有“MB”

在我的python代码中,有两个变量名为minSize和maxSize

query = {}

minSize = '0'
maxSize = '888'
if minSize is not None:
    minSize='0'
if maxSize is not None:
    maxSize = '999'
query['size'] = {'$gte': str(minSize), '$lt': str(maxSize)}

print(query)
article = mongo.db.Article.find_one(query, {'_id': 0})
如何构建查询才能找到数字范围?
我可以删除“MB”字符串,然后使用“$gte”和“$lt”吗?
或者使用正则表达式查找数字范围

下面是我的DB对象示例:

{ 
    "_id" : ObjectId("59c3522b57bd432a6ccaea41"), 
    "title" : "IBW-267 video name", 
    "torrent" : "https://sukebei.pantsu.cat/download/", 
    "ImagePath" : [
        {
            "url" : "https://www.pixsense.net/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/site/v/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }
    ], 
    "articlelink" : "https://sukebei.pantsu.cat/view/", 
    "pubDate" : "2017-09-21 04:42:00", 
    "size" : "1373.2 MB"
}
{ 
    "_id" : ObjectId("59c3522b57bd432a6ccaea42"), 
    "title" : "IBW-261 video name", 
    "torrent" : "https://sukebei.pantsu.cat/download/", 
    "ImagePath" : [
        {
            "url" : "http://imageteam.org/", 
            "domain" : "imageteam.org"
        }, 
        {
            "url" : "http://imageteam.org/upload/small/2017/09/21", 
            "domain" : "imageteam.org"
        }, 
        {
            "url" : "http://imagedecode.com/", 
            "domain" : "imagedecode.com"
        }, 
        {
            "url" : "http://imagedecode.com/upload/small/2017/09/21", 
            "domain" : "imagedecode.com"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }
    ], 
    "articlelink" : "https://sukebei.pantsu.cat/view/", 
    "pubDate" : "2017-09-21 04:40:00", 
    "size" : "900.0 MB"
}
json结构:

标题
洪流
图像路径
  |_url
  |_域
articlelink
发布日期

大小

我想你是在问如何将“1373.2MB”转换成一个数字?如果是这样的话,有很多方法可以做到。在下面的示例代码中,我使用
split
将“1373.2MB”分解为
[“1373.2”,“MB”]
。然后将“1373.2”转换为浮点(例如
float(“1373.2MB”.split()[0])


我建议添加此字段的另一个数字版本,您可以用于这些类型的查询。它的实际值为135000000、75000000、2687000000。范围可以按如下方式查找:{$gte:{75},{$gte:{$size:{$lte:{$2687}}}对于正则表达式,可以使用
{“size”进行查找:/.*75.*/i}
也许使用数字域是最简单的方法。谢谢JohnnyHK的建议。@Deano也许,你可以添加你的建议作为答案。你能发布你正在使用的整个文档吗。我将尝试提供工作示例:)谢谢。我从未想过这个主意。有很多方法可以做到这一点。我刚给你看了一个。但基本上,您希望隔离字符串的数字部分,然后将其转换为浮点。
# This is an example record.
record = { 
    "_id" : "59c3522b57bd432a6ccaea41", 
    "title" : "IBW-267 video name", 
    "torrent" : "https://sukebei.pantsu.cat/download/", 
    "ImagePath" : [
        {
            "url" : "https://www.pixsense.net/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/site/v/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/", 
            "domain" : "www.pixsense.net"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }, 
        {
            "url" : "https://sukebei.nyaa.si/view/", 
            "domain" : "sukebei.nyaa.si"
        }
    ], 
    "articlelink" : "https://sukebei.pantsu.cat/view/", 
    "pubDate" : "2017-09-21 04:42:00", 
    "size" : "1373.2 MB"
}

import copy
import uuid

# Create a bogus database based on the record example above.
db = []
for size in range(0,1001, 100):
    rec = copy.copy(record)
    rec['size'] = "{size} MB".format(**locals()) # change the size
    rec['id'] = uuid.uuid4().hex # Generate a new ID
    db.append(rec)

# Now that we have an example database we can filter the database for records
# that meet the size criteria.
def filterDb(db, minSize, maxSize):
    for record in db:

        ######
        # This is where you do your conversion from string to float.
        # split breaks the string up into two strings.  The first will be the
        # numerical portion.
        #####
        size = float(record['size'].split()[0])
        if size <= maxSize and size >= 0:
            yield record

# Do the filter
#        
results = [ x for x in filterDb(db, 0, 800)]

print "here are the filtered results"
for record in results:
    print "    ", record['id'], record['size']
here are the filtered results
     89f9425d0d7c40338b8c9d65e71b7fc4 0 MB
     2af8621ab6b9470aa619ba1fd903eaf1 100 MB
     892ed6914e634725a347e0374a45a0a7 200 MB
     e63d1fc2fcad462a8ce7ad5141b22e71 300 MB
     0ef1dc4ad1f54b3d8bfdb79bcf9c86fc 400 MB
     69e12da5a0e74ab1a597e47ff9a296bd 500 MB
     b964c9f70c6a49bb84dd997f7a7166ed 600 MB
     7532958524ac4f99bd8666678529d418 700 MB
     06c500e2a3864b7ea9d1243b07342134 800 MB