Refactoring generation of random datetime (Python)

Asked

Viewed 94 times

1

accept suggestions to improve this code

https://github.com/rg3915/django-orm/blob/master/fixtures/gen_random_values.py#L34-L45

import random
import datetime

def gen_timestamp(min_year=1915, max_year=1996):
    # gera um datetime no formato yyyy-mm-dd hh:mm:ss.000000
    year = random.randint(min_year, max_year)
    month = random.randint(11, 12)
    day = random.randint(1, 28)
    hour = random.randint(1, 23)
    minute = random.randint(1, 59)
    second = random.randint(1, 59)
    microsecond = random.randint(1, 999999)
    date = datetime.datetime(
        year, month, day, hour, minute, second, microsecond).isoformat(" ")
    return date

PR accepted.

https://github.com/rg3915/django-orm/issues/1

  • Is the month only November and December, or was it a typo? (i.e. you wanted to month = random.randint(1, 12)) And what is "PR"?

  • PR is Pull Request.

2 answers

3


As I do not know the purpose of your code, in principle it seems ok to me except for the fact that it never draws days 29, 30 and 31. If that date (which I see as "naive", or naive) represents a date in UTC, so he also never draws leap seconds (Leap Seconds) - although the Python documentation does not support them anyway.

Including these missing values brings an additional complication: the probability of a random date falling in a 31-day month is slightly higher than falling in a 30-day month (idem to 28 and 29), as well as falling in a leap year vs. in an ordinary year. So that if the goal is a uniform distribution, drawing fields by field would become excessively laborious, long and subject to errors.

An alternative is to draw a delta: get the value of min_year-01-01 00:00:00.000000 least (max_year+1)-01-01 00:00:00.000000 (i.e. the total seconds of a timedelta, float) and draw a number of seconds between zero and this delta, then convert back to date:

def gen_timestamp(min_year=1915, max_year=1996):
    min_date = datetime(min_year,  1,1)
    max_date = datetime(max_year+1,1,1)
    delta = random()*(max_date - min_date).total_seconds()
    return (min_date + timedelta(seconds=delta)).isoformat(" ")

So any date in the range can be drawn, and the draw will be uniform. See for example he drawing a 29 February:

>>> i, d = 0, gen_timestamp()
>>> while d[5:10] != '02-29' and i < 100000:
...   i, d = i+1, gen_timestamp()
...
>>> i,d
(770, '1960-02-29 21:28:40.688135')

Note: second the documentation, if the interval between the longest and shortest date is very large (270 years on most platforms) this method loses precision in microseconds.

1

import datetime
import random
def random_datetime(start, end):
    assert isinstance(start, datetime.datetime)
    assert isinstance(end, datetime.datetime)
    start = (start - datetime.datetime(1970, 1, 1)).total_seconds()
    end = (end - datetime.datetime(1970, 1, 1)).total_seconds()
    return datetime.datetime.fromtimestamp(random.randint(start, end))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.