Program to simulate the anniversary paradox

Asked

Viewed 872 times

11

In probability theory, the anniversary paradox states that given a group of 23 (or more) randomly chosen people, the chance that two people will have the same birthday date is more than 50%. For 57 or more people, the probability is greater than 99%, however, it cannot be exactly 100% except that there are at least 367 people.

Write a program that receives an n number of people and an x number of repetitions and draw lists with birthday dates, and check if there is any matching date. For each loop one must re-draw the lists, and for each list where there are coincident cases one must add 1 to the number of favorable cases. After rounds the x loops of a percentage of times when there were coincidental birthdays.

Detail: Use list compression to generate birthday dates

Attempt to resolve:

import random

datas =[x for x in range(1,366)]
#print(datas)


n = int(input("Digite o número de pessoas: "))
x = int(input("Digite o número de repetições: "))
datas_sorteadas = []
favoraveis = 0
for i in range(x):
    for i in range(n):
        datas_sorteadas.append(random.choice(datas))
        print(datas_sorteadas)

    for data in datas_sorteadas:
        if datas_sorteadas.count(data)>=2:
            favoraveis +=1
    datas_sorteadas = []
    datas_sorteadas.append(random.choice(datas))
    print(datas_sorteadas)

print("Casos Favoráveis: ", favoraveis)

print("n*x",n*x)
print("Percentual: ", (favoraveis/(n*x)))
#print(datas_sorteadas)

The program is running without errors but I suspect it is not correct. The results do not match the theory. Any idea how to correct?

  • Write a battery of unit tests that reflect the expected behavior, then it is easy to identify which cases are not correct.

2 answers

20


From the description of the problem, I believe you’re doing some unnecessary things, it’s simpler than it seems.

What is the birthday paradox/problem?
Video explanatory PT

To simulate this we can do so:

  1. Receive as input the number of people (num_p) and number of tests/repetitions (num_loops) to make;
  2. In each test we generate a list with num_p (number of people) random integers, each may have a value between 1 and 365 to simulate each person’s birthday. Ex: in a test with 6 people we can get [4, 144, 233, 98, 144, 43];
  3. Check if in the generated list there is any value that occurs more than once in the same list, in this case I use any() and count(). Ex: [4, 144, 233, 98, 144, 43] (birthdays of 6 people), in this case 144 is repeated, ie two people have the same birthday;
  4. If the point above (3) is checked is because we have at least 1 value that repeats, then we increment our variable favoraveis, and at the end of the tests calculate the percentage of favoraveis in num_loops (number of tests).

Code:

import random

num_p = int(input("Digite o número de pessoas: "))
num_loops = int(input("Digite o número de repetições: ")) # num de testes
favoraveis = 0
for _ in range(num_loops):
    ani_dates = [random.randint(1, 366) for _ in range(num_p)] # sortear nova lista de datas (dias) de aniversario
    if(any(ani_dates.count(i) > 1 for i in ani_dates)): # verificar se existe a mesma data (valor) mais do que uma vez na lista
        favoraveis += 1

probs_perc = (favoraveis/num_loops)*100
print('Em {} pessoas e {} testes deram-se {} vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: {}%'.format(num_p, num_loops, favoraveis, probs_perc))
# Em 23 pessoas e 1000 testes deram-se 504 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 50.4%
# Em 57 pessoas e 1000 testes deram-se 993 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 99.3%

DEMONSTRATION

Editing

After reviewing this response and performing some tests I noticed that we achieved a better performance if we make use of set() and its characteristic of not storing duplicate values, so we can simply check whether the set is different (smaller) than the number of people (num_p), if it’s because there were duplicate values, and we had at least two people having their birthday on the same day (favoraveis += 1):

import random

num_p = int(input("Digite o número de pessoas: "))
num_loops = int(input("Digite o número de repetições: ")) # num de testes
favoraveis = 0
for _ in range(num_loops):
    ani_dates = {random.randint(1, 366) for _ in range(num_p)} # sortear novo set de datas (dias) de aniversario
    if(len(ani_dates) != num_p): # se o comprimento do set for diferente do num de pessoas
        favoraveis += 1

probs_perc = (favoraveis/num_loops)*100
print('Em {} pessoas e {} testes deram-se {} vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: {}%'.format(num_p, num_loops, favoraveis, probs_perc))
# Em 23 pessoas e 1000 testes deram-se 506 vezes em que pelo menos duas pessoas fazem anos no mesmo dia, percentagem: 50.6%

DEMONSTRATION

The exponential probability increase equation may be seen here, and I while testing this same equation in python also made the graph.

It has nothing to do with the question but here I leave the code to make this graph:

import matplotlib.pyplot as plt

def birthday_probs(x):
    p = (1.0/365)**x
    for i in range((366-x),366):
        p *= i
    return 1-p

plt.plot([birthday_probs(i)*100 for i in range(366)])
plt.xlabel("Num de pessoas")
plt.ylabel("Probabilidades de partilha de dia de aniversário")
plt.ylim(ymin=0)
plt.xlim(xmin=0, xmax=80)
plt.show()

inserir a descrição da imagem aqui

  • What do any? I don’t understand!

  • @Eds I’m on the phone, I won’t be able to explain it properly but basically it is any(condicao for j in...) and returns True as soon as the condition has been established, otherwise False, in this case if any of the values occurs more than once in that list. https://docs.python.org/3/library/functions.html#any

4

This link can help you: Wikipedia: Birthday paradox

Basically the probability code is:

def birthday(x):
    p = (1.0/365)**x
    for i in range((366-x),366):
        p *= i
    return 1-p

Browser other questions tagged

You are not signed in. Login or sign up in order to post.