Implement Thread in Python

Asked

Viewed 762 times

3

Following the theory of this question and answer

I understood that I can run more than one program, right?

I have a database with more than 3,000 users to check, this check works as follows, it connects in the Twitter API, checks if a user ID exists, if it does not exist it deletes from my database.

The functions of CRUD are OK for me.

It just takes a really, really long time to check everyone, I’d like to try to implement the concept of threading

But I’m without the logic behind it all:

My code at the moment is this one:

from database import Database
from twitter import TwitterProfile

database = Database()

profile = TwitterProfile()

def select_user(database):
    query = "SELECT * FROM users WHERE status = 0 AND premium = 0"
    select_user = database.select(query)

    return select_user

def delete_user(database):
    for i in select_user(database):
        try:
            twitter_id = profile.get(i['username'])['id_str']
        except:
            username = i['username']
            database.delete('users', {
                'username': username
            })

            print(username, 'Is now deleted')

    return True

deleted = delete_user(database)

print(deleted)

Remembering that I didn’t make an example of thread implementation, simply by not knowing how to do.

I would like a help, I already understood the concept (?), maybe :(

1 answer

4


Threading would work in this case - since the longest delay is the latency and response time of the twitter API.

Threading does not work well in Python in almost no other scenario due to a number of other things.

Even for this scenario, the ideal would be to use asyncio, not threading - but the gain is very small, and the way of thinking for this to work with asyncio changes a lot of conventional programming -

Now, threadings also have their tricks and are not a few - the recommended there is you use the concurrent.futures - is a package from the standard Python library, which given tasks that you divide into functions, creates a fixed number of threads, and performs these tasks in threads (without having to create a thread for each task), and still has mechanisms for signaling errors, etc....

In case, what will get slow is the call to the twitter API ( vi agora, you use a except with nothing there - that is lousy - because if some other error occurs, like, the network failed to access the API, you delete your local user even when it still exists on twitter - you need to check which exact exception happens when the API responds but the user there does not exist, and capture only THAT exception to run the block that excludes a user - I quickly looked at Python twitter here, and it looks like you can do from twitter.error import TwitterError and use except TwitterError - is already improving).

Anyway, going back to Concurrent.Utures: when you have several time-consuming tasks, you create the tasks that are called "Futures" - Python then runs these tasks in the various threads, and using the call concurrent.futures.as_completed you get the result of each task. Even if the result is an execution. Since db local is not a bottleneck in this case, you don’t have to call db in separate threads, and risk errors because "cursor" objects can be changed in more than one thread at the same time.

That being said, your code might look something like (I won’t test it here, so it might need some adjustment):

from database import Database
from twitter import TwitterProfile
from twitter.error import TwitterError
from concurrent.futures import ThreadPoolExecutor, as_completed

database = Database()

profile = TwitterProfile()

def select_user(database):
    query = "SELECT * FROM users WHERE status = 0 AND premium = 0"
    select_user = database.select(query)

    return select_user

def check_user(username):
    return profile.get(username)['id_str']

def delete_user(database):
    with ThreadPoolExecutor(max_workers=20) as executor:
        tasks = {executor.submit(check_user, user["username"]):user["usrername"] for user in select_user(database)}
        for task in as_completed(tasks):
            try:
                twitter_id = task.result()
            except (TwitterError, KeyError) as error:
                username = tasks[task]
                database.delete('users', {
                    'username': username
                })

                print(f"{username!r} is now deleted")
            except Exception as error:
                print("Problem acessing user profile: {tasks[task]!r}:\n", error)

    return True

deleted = delete_user(database)

print(deleted)

It is interesting to note that the code that creates the tasks is normal Python code - simply a call to the method .submit of the executor object. I am saying this because how I used a "Dict comprehension" to make these calls - tasks = {executor.submit(check_user, user["username"]):user["usrername"] for user in select_user(database)} it may seem that this "different" syntax is necessary to create the tasks. This syntax actually only creates a dictionary, tying each task (which is an object called "Future") with a string - the "username" - that can be used in the next step. Just the method call submit already creates the Fund, and for the as_completed below, any Python iterable that will return objects of type "Future" serves.

(I changed a few more things there - for example, it doesn’t make sense, in Python, to call the for variable i- that name is a shortening of index and comes from languages where there is only the numeric is, and the value of the for is is used as an index for the sequence where your data is. In Python, for already traverses the elements of the sequence - so it’s best to give a name that makes sense to the variable)

  • I just noticed, this working, task.result() doesn’t exist, so he falls in except and delete all users. That line except (TwitterError, KeyError) as error: for me will not work, because when I referred that connect Twitter API, was login, to get the user data I made using lib requests, Some way to fix it ?

  • 1

    it is only exchange ro twittererror for the mistake that comes from your profile.get.

  • Oops now I do, KeyError and IndexError, either it’s an account that the user changed their username, or it’s an account that’s been suspended. These two exceptions are the ones I need to delete, man, thank you very much, your answers are all to be congratulated!!!

  • 1

    The "20" of max_workers was a kick, if it went well, maybe to increase.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.