Threading would work in this case - since the longest delay is the latency and response time of the twitter API.
Threading does not work well in Python in almost no other scenario due to a number of other things.
Even for this scenario, the ideal would be to use asyncio, not threading - but the gain is very small, and the way of thinking for this to work with asyncio changes a lot of conventional programming -
Now, threadings also have their tricks and are not a few - the recommended there is you use the concurrent.futures
- is a package from the standard Python library, which given tasks that you divide into functions, creates a fixed number of threads, and performs these tasks in threads (without having to create a thread for each task), and still has mechanisms for signaling errors, etc....
In case, what will get slow is the call to the twitter API ( vi agora, you use a except
with nothing there - that is lousy - because if some other error occurs, like, the network failed to access the API, you delete your local user even when it still exists on twitter - you need to check which exact exception happens when the API responds but the user there does not exist, and capture only THAT exception to run the block that excludes a user - I quickly looked at Python twitter here, and it looks like you can do from twitter.error import TwitterError
and use except TwitterError
- is already improving).
Anyway, going back to Concurrent.Utures: when you have several time-consuming tasks, you create the tasks that are called "Futures" - Python then runs these tasks in the various threads, and using the call
concurrent.futures.as_completed
you get the result of each task. Even if the result is an execution. Since db local is not a bottleneck in this case, you don’t have to call db in separate threads, and risk errors because "cursor" objects can be changed in more than one thread at the same time.
That being said, your code might look something like (I won’t test it here, so it might need some adjustment):
from database import Database
from twitter import TwitterProfile
from twitter.error import TwitterError
from concurrent.futures import ThreadPoolExecutor, as_completed
database = Database()
profile = TwitterProfile()
def select_user(database):
query = "SELECT * FROM users WHERE status = 0 AND premium = 0"
select_user = database.select(query)
return select_user
def check_user(username):
return profile.get(username)['id_str']
def delete_user(database):
with ThreadPoolExecutor(max_workers=20) as executor:
tasks = {executor.submit(check_user, user["username"]):user["usrername"] for user in select_user(database)}
for task in as_completed(tasks):
try:
twitter_id = task.result()
except (TwitterError, KeyError) as error:
username = tasks[task]
database.delete('users', {
'username': username
})
print(f"{username!r} is now deleted")
except Exception as error:
print("Problem acessing user profile: {tasks[task]!r}:\n", error)
return True
deleted = delete_user(database)
print(deleted)
It is interesting to note that the code that creates the tasks is normal Python code - simply a call to the method .submit
of the executor object. I am saying this because how I used a "Dict comprehension" to make these calls - tasks = {executor.submit(check_user, user["username"]):user["usrername"] for user in select_user(database)}
it may seem that this "different" syntax is necessary to create the tasks. This syntax actually only creates a dictionary, tying each task (which is an object called "Future") with a string - the "username" - that can be used in the next step. Just the method call submit
already creates the Fund, and for the as_completed
below, any Python iterable that will return objects of type "Future" serves.
(I changed a few more things there - for example, it doesn’t make sense, in Python, to call the for variable i
- that name is a shortening of index
and comes from languages where there is only the numeric is, and the value of the for is is used as an index for the sequence where your data is. In Python, for already traverses the elements of the sequence - so it’s best to give a name that makes sense to the variable)
I just noticed, this working,
task.result()
doesn’t exist, so he falls inexcept
and delete all users. That lineexcept (TwitterError, KeyError) as error:
for me will not work, because when I referred that connect Twitter API, was login, to get the user data I made using librequests
, Some way to fix it ?– user148010
it is only exchange ro twittererror for the mistake that comes from your
profile.get
.– jsbueno
Oops now I do,
KeyError
andIndexError
, either it’s an account that the user changed their username, or it’s an account that’s been suspended. These two exceptions are the ones I need to delete, man, thank you very much, your answers are all to be congratulated!!!– user148010
The "20" of max_workers was a kick, if it went well, maybe to increase.
– jsbueno