Calculate k-Mers of DNA sequences in a fasta file- In python

Asked

Viewed 96 times

2

I want to calculate the amount of k-Mers in a DNA sequence. However, the file I have has several sequences and also identifiers .

Note: k-Mers are the lengths of a sequence for example a sequence "AAC" if I want to compute Kmer =1 would be A=2 C=1 if it were k=2 would be AA=1 AC=1. If it were k=3 (the maximum for that sequence) it would be AAC=1

For example:

m54200_170907_19495 GGGTTACTGACATGTCTTGCATAATACTTAACTTCTTAGCTGGGACGTAGTCTATACTCG TTTTCAACCTCCAGTTTTCCTTTCTTTTTCTTTCTCTTTTCTTTTTCTTTTGTTTTCCTC TTGTTTTTTGTTTGGAGAGGGCACCCTTAGTACGAAGAACTGACTTTAAGCGGTTTATTGCTGCCGGACATAA

m53000_170907_194957 TTTAGCAGCCCAAAAAAAAGATAGAAATATTTATAAATAAGAAAGAAAAATGATATGTAA TGTCTAAAACAGGTTTACATTATCGTGATTTTGTTATATTTATAGAGTTTTAAATATCAG CGTATGTCACATATAGGATTTATGCATTGATGAATTTAGAAGATAACTTACACACCAATT TTAGTAGGGCTGAAATCTCTATTAGTAGAGAATTATATAATTTAAC

I can calculate the k-Mers for the whole file, so it also calculates the Kmer that contains the ID as well, but I would like you to calculate from the sequence, not counting the identifiers.

# Importando a Biblioteca SeqIO
from Bio import SeqIO

# Lendo o arquivo fasta em uma lista
seq_records = list(SeqIO.parse("filee.fasta", 'fasta'))

# Criando uma função para ler a quantidade de k-mers
def kmers(seq_records,k):
    kfreq = {}
    for i in seq_records:
        kmer= seq_records[i:i+k].seq
        if kmer in kfreq:
            kfreq[kmer]+ =1
        else:
            kfreq[kmer] =1

    return kfreq

#Chamando a função e colocando um kmer =2
rf = kmers(seq_records,2)
print(rf)

1 answer

3

Test this

from Bio import SeqIO

def build_kmers(sequence, ksize):
    kmers = []
    n_kmers = len(sequence) - ksize + 1

    for i in range(n_kmers):
        kmer = sequence[i:i + ksize]
        kmers.append(kmer)

    return kmers, n_kmers

seq_records = list(SeqIO.parse("sequence.fasta", 'fasta'))

for x in seq_records:
    re = build_kmers(x.seq, 2)
    print(re[1])

You can check more on An introduction to k-Mers for genome comparison and analysis

  • Lucas pick thanks for the help

Browser other questions tagged

You are not signed in. Login or sign up in order to post.