How to access the index correctly for data in the same file?

Asked

Viewed 50 times

1

I have a file

txt file.:

>gb:KX262887|Organism:Zika virus|Strain Name:103451|Segment:null|Subtype:Asian|Host:Human
GTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAG
GTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCC
GGATTGTCAATATGCTAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGC
>gb:HM045792|Organism:Chikungunya virus|Strain Name:'Vereeniging'|Segment:null|Host:Human
ACGTAGCCTACCAGTTTCTTACTGCTCTACTCTGCAAAGCAAGAGATTAAGAACCCATCATGGATCCTGT
GTACGTGGACATAGACGCTGACAGCGCCTTTTTGAAGGCCCTGCAACGTGCGTACCCCATGTTTGAGGTG
GAACCTAGGCAGGTCACACCGAATGACCATGCTAATGCTAGAGCGTTCTCGCATCTAGCTATAAAACTAA
TAGAGCAGGAAATTGATCCCGACTCAACCATCCTGGATATCGGTAGTGCGCCAGCAAGGAGGATGATGTC
>gb:KY474305|Organism:Dengue virus 1|Strain Name:00099-S|Segment:null|Subtype:1|Host:Human
CGAATCGGAAGCTTGCTTAACGTAGTTCTAGCAGTTTTTTATTAGAGAGCAGATCTCTGATGAACAACCA
ACGGAAAAAGACGGGTCGACCGTCTTTCAATATGCTGAAACGCGCGAGAAACCGCGTGTCAACTGGTTCA
CAGTTGGCGAAGAGATTCTCAAAAGGATTGCTTTCAGGCCAAGGACCCATGAAATTGGTGATGGCTTTCA
TAGCATTTCTAAGATTTCTAGCCATACCCCCAACAGCAGGAATTTTGGCTAGATGGAGCTCATTCAAGAA
GAATGGAGCGATCAAAGTGTTACG

I am storing this information in a list and later in a table in the database but the index or index differs in Host.

code:

for item in SeqIO.parse('arquivo.txt', 'fasta'):
    dado = item.description.replace('|', '\n').splitlines()
    print(dado)
    resumo = []

    for i in dado:
        d = i.replace(':', '\n').splitlines()
        resumo.append(d[1])

    id_name = resumo[0]
    organism = resumo[1]
    strain_name = resumo[2]
    segment = resumo[3]
    host = resumo[4]
    seq = item.seq


saída:
 id_name    host     organism         seq           strain_name   segment
 KX262887   Asian   Zika virus      GTTGTTGATCG     103451        null 
 KY474305    1      Dengue virus 1  CGAATCTTACG     00099-S       null
 ...

expected exit:

 id_name    host     organism         seq           strain_name   segment
 KX262887   Human   Zika virus      GTTGTTGATCG     103451        null 
 KY474305   Human   Dengue virus 1  CGAATCTTACG     00099-S       null
...

How can I access the indexes correctly and store them in the table?

  • Consider using pandas?

  • Yes. If you solve the problemI consider using any Python library.... @Miguel

  • And the names of the columns are fixed? Or will they be dynamic?

  • Fixed. Mysql Table

1 answer

1


There is a problem because not all data has the same length, so accessing these values from the indexes can cause unexpected results.

So here’s another way to access this data in a simpler way and without using any other additional library. The data is separated with the split() and grouped in a dictionary using dict():

for item in SeqIO.parse('arquivo.txt', 'fasta'):
    resumo = dict(x.split(":") for x in item.description.split("|"))
    id_name = resumo["gb"]
    organism = resumo["Organism"]
    strain_name = resumo["Strain Name"]
    segment = resumo["Segment"]
    host = resumo["Host"]
    seq = item.seq

Browser other questions tagged

You are not signed in. Login or sign up in order to post.