PYTHON - Difficulties in arranging COH-PIAH code

Asked

Viewed 3,514 times

1

Good evening, I’m doing a course in python and this is the last exercise of the course, but I’m having a lot of trouble correcting it, when I send gives the following error:

"***** [0.5 points]: Testing evaluation of texts (Texts = ['Ancient navigators had a glorious phrase:"To navigate is necessary; to live is not necessary."I want for myself the spirit [d]this phrase,transformed the way to marry her as I am: To live is not necessary; what is necessary is to create.I do not expect to enjoy my life; nor in enjoying itI just want to make it big, even if for that it has to be my body and (my soul) the wood of that fire.I just want to make it of all humanity;Yet for that I have to lose it as my own.More and more I think so.More and more I put of the soul essence of my blood impersonal purpose to magnify the fatherland and contribute to the evolution of humanity.It is the form that in me took the mysticism of our Race.' 'I turned to her; Capitu had his eyes on the ground. He lifted them up quickly, slowly, and we looked at each other... Confession of children, you were well worth two or three pages, but I want to be spared. Truly, we do not speak anything; the wall spoke for us. We did not move, The hands spread out little by little, all four of them, grabbing, clutching, merging. I did not mark the exact time of that gesture. I should have marked it; I miss a note written that same night, and that I would put here with the spelling errors that brought, but it would not bring any, such was the difference between the student and the teenager. I knew the rules of writing, without suspecting those of loving; I had Latin orgies and was a virgin of women.', 'OUR joy before a metaphysical system, our satisfaction in the presence of a construction of thought, in which the spiritual organization of the world shows itself in a logical, coherent and harmonic set, always depend eminently on aesthetics; they have the same origin as pleasure, that high satisfaction, always serene after all, that artistic activity provides us when creating order and form allows us to cover with sight the chaos of life, giving it transparency.'] , Subscription = [4.79, 0.72, 0.56, 80.5, 2.5, 31.6] ) - Failed ****** Assertionerror: Expected: 2; received: 1"

That’s the code I used:

import re


def le_assinatura():
    """
    A função lê os valores dos traços linguísticos do modelo e devolve uma
    assinatura a ser comparada com os textos fornecidos.
    """
    print("Bem-vindo ao detector automático de COH-PIAH.")

    tam_m_pal = float(input("Entre o tamanho medio de palavra: "))
    type_token = float(input("Entre a relação Type-Token: "))
    h_lego = float(input("Entre a Razão Hapax Legomana: "))
    tam_m_sent = float(input("Entre o tamanho médio de sentença: "))
    compx_med = float(input("Entre a complexidade média da sentença: "))
    tam_m_frase = float(input("Entre o tamanho medio de frase: "))

    return [tam_m_pal, type_token, h_lego, tam_m_sent, compx_med, tam_m_frase]


def le_textos():
    i = 1
    textos = []
    texto = input("Digite o texto: " + str(i) + "(aperte enter para sair):")
    while texto:
        textos.append(texto)
        i += 1
        texto = input("Digite o texto: " + str(i) + "(aperte enter para sair):")
    return textos


def calcula_assinatura(texto):
    """
    Essa função recebe um texto e deve devolver a assinatura
    do texto.
    """
    if type(texto) != list:
        aux = texto
        texto = []
        texto.append(aux)
    for i in texto:
        sentencas = []
        sentencas = separa_sentencas(str(i))  
        frases = []
        num_tot_sentencas = 0
        soma_cat_sentencas = 0
        for i in range(len(sentencas)):
            frase_i = separa_frases(str(sentencas[i]))
            frases.append(frase_i)  
            num_tot_sentencas += 1
            soma_cat_sentencas = soma_cat_sentencas + len(sentencas[i])
        palavras = []
        num_tot_frases = 0
        soma_cat_frases = 0
        for lin in range(len(frases)):
            for col in range(len(frases[lin])):
                palavra_i = separa_palavras(str(frases[lin][col]))
                palavras.append(palavra_i)  
                num_tot_frases += 1
                soma_cat_frases = soma_cat_frases + len(str(frases[lin][col]))
        mtrx_para_lista = []  
        for lin in range(len(palavras)):
            for col in range(len(palavras[lin])):
                mtrx_para_lista.append(palavras[lin][col])
        palavras = mtrx_para_lista[:]
        soma_comp_palavras = 0
        num_tot_palavras = 0
        for lin in range(len(palavras)):
            for col in range(len(palavras[lin])):
                soma_comp_palavras = soma_comp_palavras + len(str(palavras[lin][col]))
            num_tot_palavras += 1
        matriz_ass_input = []
        matriz_ass_input.append(tam_m_pal(soma_comp_palavras, num_tot_palavras))
        matriz_ass_input.append(type_token(palavras, num_tot_palavras))
        matriz_ass_input.append(h_lego(palavras, num_tot_palavras))
        matriz_ass_input.append(tam_m_sent(soma_cat_sentencas, num_tot_sentencas))
        matriz_ass_input.append(compx_med(num_tot_frases, num_tot_sentencas))
        matriz_ass_input.append(tam_m_frase(soma_cat_frases, num_tot_frases))
    return matriz_ass_input  


def tam_m_pal(soma_comp_palavras, num_tot_palavras):
    if num_tot_palavras != 0:
        tam_m_pal = soma_comp_palavras / num_tot_palavras
    else:
        tam_m_pal = 0
    return tam_m_pal


def type_token(lista_palavras, num_tot_palavras):
    num_pal_dif = n_palavras_diferentes(lista_palavras)
    if num_tot_palavras != 0:
        type_token = num_pal_dif / num_tot_palavras
    else:
        type_token = 0
    return type_token


def h_lego(lista_palavras, num_tot_palavras):
    num_pal_uni = n_palavras_unicas(lista_palavras)
    if num_tot_palavras != 0:
        h_lego = num_pal_uni / num_tot_palavras
    else:
        h_lego = 0
    return h_lego


def tam_m_sent(soma_num_cat, num_sent):
    if num_sent != 0:
        tam_m_sent = soma_num_cat / num_sent
    else:
        tam_m_sent = 0
    return tam_m_sent


def compx_med(num_tot_frases, num_tot_sentencas):
    if num_tot_sentencas != 0:
        compx_med = num_tot_frases / num_tot_sentencas
    else:
        compx_med = 0
    return compx_med


def tam_m_frase(soma_cat_frases, num_tot_frases):
    if num_tot_frases != 0:
        tam_m_frase = soma_cat_frases / num_tot_frases
    else:
        tam_m_frase = 0
    return tam_m_frase


def separa_sentencas(texto):
    """
    A função recebe um texto e devolve uma lista das sentenças dentro
    do texto.
    """
    sentencas = re.split(r'[.!?]+', texto)
    if sentencas[-1] == '':
        del sentencas[-1]
    return sentencas


def separa_frases(sentenca):
    """
    A função recebe uma sentença e devolve uma lista das frases dentro
    da sentença.
    """
    return re.split(r'[,:;]+', sentenca)


def separa_palavras(frase):
    """
    A função recebe uma frase e devolve uma lista das palavras dentro
    da frase.
    """
    return frase.split()


def n_palavras_unicas(lista_palavras):
    """
    Essa função recebe uma lista de palavras e devolve o numero de palavras
    que aparecem uma única vez.
    """
    freq = dict()
    unicas = 0
    for palavra in lista_palavras:
        p = palavra.lower()
        if p in freq:
            if freq[p] == 1:
                unicas -= 1
            freq[p] += 1
        else:
            freq[p] = 1
            unicas += 1

    return unicas


def n_palavras_diferentes(lista_palavras):
    """
    Essa função recebe uma lista de palavras e devolve o numero de palavras
    diferentes utilizadas.
    """
    freq = dict()
    for palavra in lista_palavras:
        p = palavra.lower()
        if p in freq:
            freq[p] += 1
        else:
            freq[p] = 1

    return len(freq)


def compara_assinatura(ass_main, matriz_ass_input):
    """
    Essa função recebe duas assinaturas de texto e deve devolver o grau de
    similaridade nas assinaturas.
    """
    lista_Sab = []
    soma_mod = 0
    if type(matriz_ass_input[0]) is list:
        for lin in range(len(matriz_ass_input)):
            for col in range(len(matriz_ass_input[lin])):
                soma_mod += abs(ass_main[col] - matriz_ass_input[lin][col])
            Sab = soma_mod / 6
            lista_Sab.append(Sab)
        return lista_Sab
    else:
        for i in range(len(matriz_ass_input)):
            soma_mod += abs(ass_main[i] - matriz_ass_input[i])
        Sab = soma_mod / 6
        return Sab


def avalia_textos(textos_main, ass_comparadas):
    """
    Essa função recebe uma lista de textos e deve devolver o numero (0 a n-1)
    do texto com maior probabilidade de ter sido infectado por COH-PIAH.
    """
    aux_ass_com = (ass_comparadas[:])
    aux_ass_com.sort()
    for indice in range(len(ass_comparadas)):
        if aux_ass_com[0] == ass_comparadas[indice]:
            copiah = indice
    return copiah -1

If someone can give me a light, I’ll insert how the program should work, to make it a little more clear.

"Several studies have been compiled and today we know precisely the signature of a carrier of COH-PIAH. Your program should receive several texts and calculate the values of the different linguistic traits as follows:

Average word size is the sum of word sizes divided by the total number of words. Type-Token ratio is the number of different words divided by the total number of words. For example, in the phrase "The cat hunted the mouse", we have 5 words in total (the, cat, hunted, the, mouse) but only 4 different (the, cat, hunted, mouse). In this sentence, the Type-Token ratio is 45=0.8 Hapax Legomana is the number of words that appear once divided by the total number of words. For example, in the phrase "The cat hunted the mouse", we have 5 words in total (the, cat, hunted, the, mouse) but only 3 that appear only once (cat, hunted, mouse). In this sentence, the Hapax Legomana ratio is 35=0.6 Average sentence size is the sum of the numbers of characters in all sentences divided by the number of sentences (the characters separating one sentence from the other should not be counted as part of the sentence). Sentence complexity is the total number of sentences divided by the number of sentences. Average sentence size is the sum of the number of characters in each sentence divided by the number of sentences in the text (characters separating one sentence from the other should not be counted as part of the sentence). After calculating these values for each text, you should compare with the signature provided for those infected by COH-PIAH. The degree of similarity between two texts, a and b, is given by the formula:

Sab= 6i=1||fi,a fi,b|6

Where:

Sab is the degree of similarity between texts a and b; fi,a is the value of each linguistic trait i in the text a; and fi,b is the value of each linguistic dash i in text b. Note that the more similar a and b are, the smaller Sab will be. For each text, you should calculate the degree of similarity with the signature of the COH-PIAH carrier and at the end display which text was most likely written by an infected student."

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.