Python 如何通过正则表达式查找数字范围
我有一个名为“size”的字段 其格式为: 135.0 MBPython 如何通过正则表达式查找数字范围,python,regex,mongodb,Python,Regex,Mongodb,我有一个名为“size”的字段 其格式为: 135.0 MB 75MB 2687 MB 只有“MB” 在我的python代码中,有两个变量名为minSize和maxSize query = {} minSize = '0' maxSize = '888' if minSize is not None: minSize='0' if maxSize is not None: maxSize = '999' query['size'] = {'$gte': str(minSize)
75MB
2687 MB 只有“MB” 在我的python代码中,有两个变量名为minSize和maxSize
query = {}
minSize = '0'
maxSize = '888'
if minSize is not None:
minSize='0'
if maxSize is not None:
maxSize = '999'
query['size'] = {'$gte': str(minSize), '$lt': str(maxSize)}
print(query)
article = mongo.db.Article.find_one(query, {'_id': 0})
如何构建查询才能找到数字范围?我可以删除“MB”字符串,然后使用“$gte”和“$lt”吗?
或者使用正则表达式查找数字范围 下面是我的DB对象示例:
{
"_id" : ObjectId("59c3522b57bd432a6ccaea41"),
"title" : "IBW-267 video name",
"torrent" : "https://sukebei.pantsu.cat/download/",
"ImagePath" : [
{
"url" : "https://www.pixsense.net/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/site/v/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
}
],
"articlelink" : "https://sukebei.pantsu.cat/view/",
"pubDate" : "2017-09-21 04:42:00",
"size" : "1373.2 MB"
}
{
"_id" : ObjectId("59c3522b57bd432a6ccaea42"),
"title" : "IBW-261 video name",
"torrent" : "https://sukebei.pantsu.cat/download/",
"ImagePath" : [
{
"url" : "http://imageteam.org/",
"domain" : "imageteam.org"
},
{
"url" : "http://imageteam.org/upload/small/2017/09/21",
"domain" : "imageteam.org"
},
{
"url" : "http://imagedecode.com/",
"domain" : "imagedecode.com"
},
{
"url" : "http://imagedecode.com/upload/small/2017/09/21",
"domain" : "imagedecode.com"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
}
],
"articlelink" : "https://sukebei.pantsu.cat/view/",
"pubDate" : "2017-09-21 04:40:00",
"size" : "900.0 MB"
}
json结构:
标题洪流
图像路径
|_url
|_域
articlelink
发布日期
大小我想你是在问如何将“1373.2MB”转换成一个数字?如果是这样的话,有很多方法可以做到。在下面的示例代码中,我使用
split
将“1373.2MB”分解为[“1373.2”,“MB”]
。然后将“1373.2”转换为浮点(例如float(“1373.2MB”.split()[0])
我建议添加此字段的另一个数字版本,您可以用于这些类型的查询。它的实际值为135000000、75000000、2687000000。范围可以按如下方式查找:{$gte:{75},{$gte:{$size:{$lte:{$2687}}}对于正则表达式,可以使用
{“size”进行查找:/.*75.*/i}
也许使用数字域是最简单的方法。谢谢JohnnyHK的建议。@Deano也许,你可以添加你的建议作为答案。你能发布你正在使用的整个文档吗。我将尝试提供工作示例:)谢谢。我从未想过这个主意。有很多方法可以做到这一点。我刚给你看了一个。但基本上,您希望隔离字符串的数字部分,然后将其转换为浮点。
# This is an example record.
record = {
"_id" : "59c3522b57bd432a6ccaea41",
"title" : "IBW-267 video name",
"torrent" : "https://sukebei.pantsu.cat/download/",
"ImagePath" : [
{
"url" : "https://www.pixsense.net/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/site/v/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://www.pixsense.net/themes/latest/ssd/small/1069/",
"domain" : "www.pixsense.net"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
},
{
"url" : "https://sukebei.nyaa.si/view/",
"domain" : "sukebei.nyaa.si"
}
],
"articlelink" : "https://sukebei.pantsu.cat/view/",
"pubDate" : "2017-09-21 04:42:00",
"size" : "1373.2 MB"
}
import copy
import uuid
# Create a bogus database based on the record example above.
db = []
for size in range(0,1001, 100):
rec = copy.copy(record)
rec['size'] = "{size} MB".format(**locals()) # change the size
rec['id'] = uuid.uuid4().hex # Generate a new ID
db.append(rec)
# Now that we have an example database we can filter the database for records
# that meet the size criteria.
def filterDb(db, minSize, maxSize):
for record in db:
######
# This is where you do your conversion from string to float.
# split breaks the string up into two strings. The first will be the
# numerical portion.
#####
size = float(record['size'].split()[0])
if size <= maxSize and size >= 0:
yield record
# Do the filter
#
results = [ x for x in filterDb(db, 0, 800)]
print "here are the filtered results"
for record in results:
print " ", record['id'], record['size']
here are the filtered results
89f9425d0d7c40338b8c9d65e71b7fc4 0 MB
2af8621ab6b9470aa619ba1fd903eaf1 100 MB
892ed6914e634725a347e0374a45a0a7 200 MB
e63d1fc2fcad462a8ce7ad5141b22e71 300 MB
0ef1dc4ad1f54b3d8bfdb79bcf9c86fc 400 MB
69e12da5a0e74ab1a597e47ff9a296bd 500 MB
b964c9f70c6a49bb84dd997f7a7166ed 600 MB
7532958524ac4f99bd8666678529d418 700 MB
06c500e2a3864b7ea9d1243b07342134 800 MB