How to limit the number of threads running at the same time?

Asked

Viewed 360 times

1

I’m trying to create a proxy checker using multithreading, this is my code:

import requests
import threading
from time import sleep
from colorama import Fore
url = 'http://google.com'

proxyList = list(open('proxies.txt').read().splitlines())

timeout = 2

def checarProxy(x):
    proxy = {
        'http': 'http://' + x
    }
    try:
        a = requests.get(url, proxies=proxy, timeout=timeout)
        print(' ' + Fore.GREEN + x)
        open('workingProxy.txt', 'a+').write(f'{x}\n')
    except:
        print(' ' + Fore.RED + x)


threads = []

for proxy in proxyList:
    threads.append(threading.Thread(target=checarProxy, args=(proxy,)))

for i in threads:
    sleep(0.025)
    i.start()

The only way I could narrow down the threads was by using sleep, I wonder if I can narrow it down by number of threads, something like that:

with maxThreads(10):
    for i in threads:
        i.start

Is there any way to do this? or am I using multithreading wrong?

  • 1

    This Sleep is kind of "suspicious" there. You need it for what? Something else, if you only want 10 make a range with 10 instead of iterating all. Or a loop that sees how many there are and starts one more if you have less than 10

  • But my thread list has over 20,000 threads, so I need to know if there’s any way to do this

  • 2

    There are infinite ones, and each one depends on its goal. But from what you said about 20000 threads, either you are doing something VERY different, or you are really making a bad use of them (I would bet on the 2nd option, but without you give more details I’m not sure) - Very likely you should create a smaller thread pool and distribute the tasks on a state machine or something.

  • I’ll put in my full code then

  • Try to reduce to a [mcve], and better explain the purpose of the code that helps a lot.

  • Working with threads it will be inevitable that you understand What is Global Interpreter Lock (GIL).

  • I edited it, I hope it’s clearer now

Show 2 more comments

1 answer

2


The most direct solution is to use a variable threading.Semaphore to count how many threads are currently running:

s = threading.Semaphore(10)
for i in threads:
    s.acquire()
    i.start()

Hence it is only release this variable after running the task at the end of its function:

def checarProxy(x):
    ...
    s.release()

The Semaphore.acquire() will wait for a thread to end so that there are only 10 running at a time. Different from time.sleep, will always wait the exact time.


Solved your problem, here’s a tip: your code is using threads to make a requests.get which is an IO operation, which means that your processor is creating all these threads for wait around, they don’t process anything, they just stay awaiting IO response from a remote site. You are using all the features of your machine to nothingness but wait.

I suggest taking a look at asynchronous programming. With asynchronous programming you use network operations that do not block script execution. So you no need to wait the end of one request to make another, and that threadless. All in the same thread.

  • I will try to do using asynchronous programming, the most "user friendly" module would be asyncio?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.