2
I want to calculate the amount of k-Mers in a DNA sequence. However, the file I have has several sequences and also identifiers .
Note: k-Mers are the lengths of a sequence for example a sequence "AAC" if I want to compute Kmer =1 would be A=2 C=1 if it were k=2 would be AA=1 AC=1. If it were k=3 (the maximum for that sequence) it would be AAC=1
For example:
m54200_170907_19495 GGGTTACTGACATGTCTTGCATAATACTTAACTTCTTAGCTGGGACGTAGTCTATACTCG TTTTCAACCTCCAGTTTTCCTTTCTTTTTCTTTCTCTTTTCTTTTTCTTTTGTTTTCCTC TTGTTTTTTGTTTGGAGAGGGCACCCTTAGTACGAAGAACTGACTTTAAGCGGTTTATTGCTGCCGGACATAA
m53000_170907_194957 TTTAGCAGCCCAAAAAAAAGATAGAAATATTTATAAATAAGAAAGAAAAATGATATGTAA TGTCTAAAACAGGTTTACATTATCGTGATTTTGTTATATTTATAGAGTTTTAAATATCAG CGTATGTCACATATAGGATTTATGCATTGATGAATTTAGAAGATAACTTACACACCAATT TTAGTAGGGCTGAAATCTCTATTAGTAGAGAATTATATAATTTAAC
I can calculate the k-Mers for the whole file, so it also calculates the Kmer that contains the ID as well, but I would like you to calculate from the sequence, not counting the identifiers.
# Importando a Biblioteca SeqIO
from Bio import SeqIO
# Lendo o arquivo fasta em uma lista
seq_records = list(SeqIO.parse("filee.fasta", 'fasta'))
# Criando uma função para ler a quantidade de k-mers
def kmers(seq_records,k):
kfreq = {}
for i in seq_records:
kmer= seq_records[i:i+k].seq
if kmer in kfreq:
kfreq[kmer]+ =1
else:
kfreq[kmer] =1
return kfreq
#Chamando a função e colocando um kmer =2
rf = kmers(seq_records,2)
print(rf)
Maybe that can help.
– Lacobus