Mongodb query optimization for large collection with large documents

Asked

Viewed 38 times

0

I am using Pymongo with Flask and would like to know how to optimize a query as I am filtering within a large collection (2947 documents) with large documents.

This is the structure of the collection documents:

inserir a descrição da imagem aqui

As you can see, it has 4 properties (simulationID, simulationPartID, Timepass and status, which stores many arrays). This Collection has a size of 1.6 GB.

Basically, I’m trying to find the document that has a specific Timepass and simulationPartID 7

@app.route('/get-node-data/<timePass>', methods=['GET'])
def get_node_data(timePass):
    if (request.method == "GET"):
        print("Before filt")
        node_data = filt('node_data', 'timePass', timePass, 'simulationPartID', 7)
        print("After filt", node_data)
        return json.dumps(node_data, default=json_util.default)

Below are the filtering methods I’m using:

def filt(collection_name, filter1_name, filter1, filter2_name=None):
    Collections = list(db[collection_name].find({}))
    Collections = run_filter(Collections, filter1_name, filter1)
    if filter2 and filter2_name:
        Collections = run_filter(Collections, filter2_name, filter2)

    if Collections == []:
        Collections = None
    return Collections

def run_filter(Collections, filter_name, filter):
    newCollections = []
    for i in range(len(Collections)):
        if Collections[i][filter_name] == filter:
            newCollections.append(Collections[i])
    return newCollections

But, as I mentioned, the collection is very large, I tested and the query stuck. I ran it over an hour ago and it is still filtering. I need to get the filtration as fast as possible (instantly, if possible).

inserir a descrição da imagem aqui

I wonder if there is a way to optimize my filtering methods to make filtering faster and increase its performance.

Thank you very much!

  • Just to be clear, you’re using the method find no parameters searching all information in the collection collection_name and then filtering in Python? If so, why not use an expression or aggregation with the match that data you want to find by mongodb’s own connector?

  • Wouldn’t that be what you’re looking for? db[collection_name].find({'timePass': valor, 'simulationPartID': 7}). How did you not say the specific value for timePass just put valor. If you only want the first, you can use the find_one()

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.