Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用django中的djongo在mongodb中使用regex-fast进行搜索_Regex_Django_Mongodb_Search_Djongo - Fatal编程技术网

如何使用django中的djongo在mongodb中使用regex-fast进行搜索

如何使用django中的djongo在mongodb中使用regex-fast进行搜索,regex,django,mongodb,search,djongo,Regex,Django,Mongodb,Search,Djongo,我有一个DB表,有将近2000万条记录。我想在正则表达式中搜索。当唱片数量达到10万张时,一切都很好。但现在它需要相当多的时间,甚至有时会导致超时。我是否需要迁移到SQL数据库可能是postgresql或类似弹性搜索的内容。由于此表中的记录预计将增加200多亿。是否有一种方法可以使其高效,方法是保持与我使用djongo连接django到mongodb时相同的设置,或者我必须使用任何其他数据库进行快速搜索 我的模型模式是 from djongo import models as model cla

我有一个DB表,有将近2000万条记录。我想在正则表达式中搜索。当唱片数量达到10万张时,一切都很好。但现在它需要相当多的时间,甚至有时会导致超时。我是否需要迁移到SQL数据库可能是
postgresql
或类似弹性搜索的内容。由于此表中的记录预计将增加200多亿。是否有一种方法可以使其高效,方法是保持与我使用
djongo
连接
django
mongodb
时相同的设置,或者我必须使用任何其他数据库进行快速搜索

我的模型模式是

from djongo import models as model
class User(model.Model):
    email = model.CharField(max_length=50, default='')
    source = model.CharField(default='unknown',max_length=150)
    username = model.CharField(max_length=150, default='')
    hash = model.CharField(max_length=255, default='')
    salt = model.CharField(max_length=255, default='')
    ipaddress = model.CharField(max_length=50,default='')
    lastipaddress = model.CharField(max_length=50,default='')
    name = model.CharField(max_length=150, default='')
    dateofbirth = model.CharField(max_length=100, default='')
    phonenumber = model.CharField(max_length=100, default='')
    firstname = model.CharField(max_length=150, default='')
    lastname = model.CharField(max_length=150, default='')
    address = model.CharField(max_length=255, default='')
    objects = model.DjongoManager()
向django发送post请求时调用此方法

@api_view(['POST'])
@authentication_classes([authentication.TokenAuthentication])
@permission_classes([permissions.IsAdminUser])
def search(request):

if 'username' in request.data:
    username = request.data['username']

if 'email' in request.data:
    useremail = request.data['email']

if 'userid' in request.data:
    userid = request.data['userid']

if 'query' in request.data:
    query = request.data['query']
else:
    return Response(status.HTTP_400_BAD_REQUEST)

obj = {}
obj['query'] = query
obj['type'] = type
obj['wildcard'] = wildcard
obj['regex'] = regex
if not (type in ['email', 'domain', 'username'] and wildcard == 'false' and regex == 'false'):
    obj['request'] = request
final = []
print('wildcard', wildcard)
print('regex', regex)
print('type', type)
if wildcard == 'true' or regex == 'true':
    with concurrent.futures.ThreadPoolExecutor() as executor:
        t1 = executor.submit(getRecordsFromDB, obj)
        final = t1.result()

return final
由上述方法调用,其中执行正则表达式查询

def getRecordsFromDB(obj):
    max_limit = 10000
    if obj['wildcard'] == "false" and obj['regex'] == "true":
        print("yes regex thing")
        if obj['type'] == 'domain':
            obj['query'] = r'.+@{1}' + obj['query']
            obj['type'] = 'email'

        try:
            pagination_class = LimitOffsetPagination
            paginator = pagination_class()
            queryset = User.objects.mongo_find({
                obj['type']: {'$regex': obj['query']}
            }).count()
            if queryset > max_limit:
                return Response(status.HTTP_507_INSUFFICIENT_STORAGE)
            else:
                queryset = User.objects.mongo_find({
                    obj['type']: {'$regex': obj['query']}
                })
            page = paginator.paginate_queryset(queryset, obj['request'])
            serializer = UserSerializer(page, many=True)
            return paginator.get_paginated_response(serializer.data)
        except Exception as err:
            print(f'Other error occurred: {err}')
            return Response(status.HTTP_422_UNPROCESSABLE_ENTITY)

    elif obj['wildcard'] == "true" and obj['regex'] == "false":
        print("yes wildcard thing")
        #obj['query'] = obj['query'].replace('.', r'\.')
        obj['query'] = re.escape(obj['query'])
        obj['query'] = obj['query'].replace('\*', r'[a-zA-Z0-9-_.]*')
        print('below is the respective regex for the given query')
        print(obj['query'])
        if obj['query'][0] != r'*' and obj['type'] != 'domain':
            print('yes here where it should not be')
            obj['query'] = r'^' + obj['query']
        if len(obj['query']) > 1:
            if obj['query'][-1] != r'*':
                obj['query'] = obj['query'] + r'$'

        print('final regex ', obj['query'])
        if obj['type'] == 'domain':
            obj['query'] = r'.+@{1}' + obj['query']
            obj['type'] = 'email'
            print('very final regex ', obj['query'])
        try:
            pagination_class = LimitOffsetPagination
            paginator = pagination_class()
            queryset = User.objects.mongo_find({
                obj['type']: {'$regex': obj['query']}
            }).count()
            if queryset > max_limit:
                return Response(status.HTTP_507_INSUFFICIENT_STORAGE)
            else:
                queryset = User.objects.mongo_find({
                    obj['type']: {'$regex': obj['query']}
                })
            page = paginator.paginate_queryset(queryset, obj['request'])
            serializer = UserSerializer(page, many=True)
            return paginator.get_paginated_response(serializer.data)
        except Exception as err:
            print(f'Other error occurred: {err}')
            return Response(status.HTTP_422_UNPROCESSABLE_ENTITY)

    return records

基本上
queryset=User.objects.mongo\u find({obj['type']:{'$regex':obj['query']})
查询需要很长时间吗?你能给我们举一个例子,说明一下
obj['type']
obj['query']
在哪里查询需要很长时间吗?是的,就是这个query.queryset=User.objects.mongo\u find({“email”:“{'$regex':“+@{1}@apple\.com$”})你为
电子邮件编制了索引吗?我知道索引是专业djongo软件包的一部分,所以它不是免费的,但在这个规模上它是至关重要的。我没有。这有点贵,就像每月付50美元一样。