Refactoring Python code

Asked

Viewed 161 times

2

How do I improve this code?

import random
import string
import requests
from django.core.management.base import BaseCommand, CommandError
from django.core.exceptions import ValidationError
from optparse import make_option
from core.models import Movie


class Command(BaseCommand):
    help = 'Faz o crawler numa api de filmes e retorna os dados.\n \
    Uso: ./manage.py initdata\n \
    ou: ./manage.py initdata -m 20\n \
    ou: ./manage.py initdata -m 20 -y 2015'
    option_list = BaseCommand.option_list + (
        make_option('--movies', '-m',
                    dest='movies',
                    default=10,
                    help='Define a quantidade de filmes a ser inserido.'),
        make_option('--year', '-y',
                    dest='year',
                    default=None,
                    help='Define o ano de lançamento do filme.'),
    )

    def get_html(self, year):
        '''
        Le os dados na api http://www.omdbapi.com/ de forma aleatoria
        e escolhe um filme buscando por 2 letras
        '''
        # escolhe uma letra aleatoriamente
        c = ''.join(random.choice(string.ascii_lowercase)for _ in range(2))
        # se não for definido o ano, então escolhe um randomicamente
        if year is None and isinstance(year, int):
            year = str(random.randint(1950, 2015))
        url = 'http://www.omdbapi.com/?t=' + c + \
            '*&y=' + str(year) + '&plot=short&r=json'
        return requests.get(url).json()

    def save(self, title, year, released, director, actors, poster, imdbRating, imdbVotes, imdbID):
        ''' transforma "imdbVotes" em numero inteiro '''
        if imdbVotes == "'N/A'":
            imdbVotes = 0
        else:
            imdbVotes = imdbVotes.replace(',', '')
        try:
            ''' Salva os dados '''
            Movie.objects.create(
                title=title,
                year=year,
                released=released,
                director=director,
                actors=actors,
                poster=poster,
                imdbRating=imdbRating,
                imdbVotes=imdbVotes,
                imdbID=imdbID,
            )
        except ValidationError:
            print('O objeto não foi salvo.')

    def handle(self, movies, year, **options):
        ''' se "movies" não for nulo, transforma em inteiro '''
        if movies is not None:
            movies = int(movies)
        ''' busca os filmes n vezes, a partir da variavel "movies" '''
        for i in range(movies):
            h = self.get_html(year)
            j = 1  # contador
            ''' Se a resposta for falsa, então busca outro filme '''
            while h['Response'] == 'False' and j < 100:
                h = self.get_html(year)
                print('Tentanto %d vezes' % j)
                j += 1
            ''' se "year" não for nulo, transforma em inteiro '''
            if "–" in h['Year']:
                h['Year'] = year
            # salva
            self.save(h['Title'], h['Year'], h['Released'], h['Director'], h['Actors'],
                      h['Poster'], h['imdbRating'], h['imdbVotes'], h['imdbID'])
            print(i + 1, h['Year'], h['Title'], h['Released'], h['Director'],
                  h['Actors'], h['Poster'], h['imdbRating'], h['imdbVotes'], h['imdbID'])

        # print('Foram salvos %d filmes' % movies)

For example,

def save(self, title, year, released, director, actors, poster, imdbRating, imdbVotes, imdbID):

Enter only a dictionary of arguments.

title=title,
year=year,
released=released,
director=director,
actors=actors,
poster=poster,
imdbRating=imdbRating,
imdbVotes=imdbVotes,
imdbID=imdbID,

Can you turn it into something smaller, a dictionary maybe.

self.save(h['Title'], h['Year'], h['Released'], h['Director'], h['Actors'],
          h['Poster'], h['imdbRating'], h['imdbVotes'], h['imdbID'])

Could also be optimized?

  • I strongly recommend reading the "Clean Code": http://www.buscape.com.br/codigo-limpo-robert-cecil-martin-8576082675.html . And avoid using "Try except" in your codes, even more so if you will just say: "print('The object has not been saved.')". You put a Try except because you expect an error. What’s the point of making a piece of code that causes an error? Python’s Zen says: "errors should not be silenced unless they are explicitly silenced". If you have an except, you must somehow receive a traceback email or some notification that such a problem has happened.

  • 1

    It’s worth your search for bulk_create as well. Instead of making 100 Movie.objects.create(), you would make 1 Movie.objects.bulk_create(list). It would optimise this process ai. https://docs.djangoproject.com/en/1.8/ref/models/querysets/#Bulk-create

  • @Puamdias I agree with almost everything you say, except in this matter of try...except. The ideal would be to inform the reason for the error, of course, but in this case it is obvious that it is a validation error, and if the intention of AP is that the program does not stop after finding an invalid data (i.e. that it ignores the invalid and continues processing the others) then there’s not much out but to use try...except. Likewise, bulk_create would be ideal in an "all or nothing" scenario, but in the same way would not be able to use (I think) if you wanted to process valid ones and ignore invalid ones.

  • 1

    @mgibsonbr, what I meant in this part of Try except, is that many people find it "normal" to put this in any part of the code. In this case it is fair to have a Try except because it is importing something from external agent the application of it (it seems right), but it needs to have a notification of the reason of traceback. If the imported data is from himself, I think "nonsense" develop something I expect error from that and have to put Try except in the code.

  • I put trybecause I didn’t handle all errors from the api data.

1 answer

3


If the problem is the input of arguments, just use the operator **, without tampering with your code:

h = {
    "title":"foo",
    "year":2015,
    ...
    "imdbID":12345,
}

self.save(**h)
# é o mesmo que
self.save(h["title"], h["year"], ..., h["imdbID"])

For you to use this way, it is important that the dictionary h has the same number and names of all parameters. If you want some parameters to be optional, you can declare them with a default value, and so the absence of it will not bring problems:

def save(self, title, year, released=False, director, actors, poster, imdbRating, imdbVotes, imdbID):
    ...

self.save(**h) # Mesmo que o anterior
del h["released"]
self.save(**h) # Ainda funciona; released é recebido como False

Conversely, if you want to simplify receiving the parameters, and still allow the function to be called with positional arguments, you can use the operators * and ** the definition of the function (usually called *args and **kwargs). But this you already know how to use, I see you already have this in your code.

Finally, you can use both: receive the parameters in a dictionary, using **, then passes on this dictionary to Movie.objects.create also using **:

def save(self, **kwargs):
    ''' transforma "imdbVotes" em numero inteiro '''
    if kwargs["imdbVotes"] == "'N/A'":
        kwargs["imdbVotes"] = 0
    else:
        kwargs["imdbVotes"] = kwargs["imdbVotes"].replace(',', '')
    try:
        ''' Salva os dados '''
        Movie.objects.create(**kwargs)

And in the handle:

# salva
self.save(**h)

P.S. If the parameters of your function are positional and named, as in your example, you can also call them using a list or tuple:

h = ("foo", 2015, ..., 12345)
self.save(*h) # title recebe "foo", year recebe 2015, etc

Browser other questions tagged

You are not signed in. Login or sign up in order to post.