0
I am using Pymongo with Flask and would like to know how to optimize a query as I am filtering within a large collection (2947 documents) with large documents.
This is the structure of the collection documents:
As you can see, it has 4 properties (simulationID, simulationPartID, Timepass and status, which stores many arrays). This Collection has a size of 1.6 GB.
Basically, I’m trying to find the document that has a specific Timepass and simulationPartID 7
@app.route('/get-node-data/<timePass>', methods=['GET'])
def get_node_data(timePass):
if (request.method == "GET"):
print("Before filt")
node_data = filt('node_data', 'timePass', timePass, 'simulationPartID', 7)
print("After filt", node_data)
return json.dumps(node_data, default=json_util.default)
Below are the filtering methods I’m using:
def filt(collection_name, filter1_name, filter1, filter2_name=None):
Collections = list(db[collection_name].find({}))
Collections = run_filter(Collections, filter1_name, filter1)
if filter2 and filter2_name:
Collections = run_filter(Collections, filter2_name, filter2)
if Collections == []:
Collections = None
return Collections
def run_filter(Collections, filter_name, filter):
newCollections = []
for i in range(len(Collections)):
if Collections[i][filter_name] == filter:
newCollections.append(Collections[i])
return newCollections
But, as I mentioned, the collection is very large, I tested and the query stuck. I ran it over an hour ago and it is still filtering. I need to get the filtration as fast as possible (instantly, if possible).
I wonder if there is a way to optimize my filtering methods to make filtering faster and increase its performance.
Thank you very much!
Just to be clear, you’re using the method
find
no parameters searching all information in the collectioncollection_name
and then filtering in Python? If so, why not use an expression or aggregation with thematch
that data you want to find by mongodb’s own connector?– Rfroes87
Wouldn’t that be what you’re looking for?
db[collection_name].find({'timePass': valor, 'simulationPartID': 7})
. How did you not say the specific value fortimePass
just putvalor
. If you only want the first, you can use thefind_one()
– Paulo Marques