How to separate a file with multiple FASTA into different variables


I wanted to know how I can put each FASTA in a different variable, all of which are in the same file. Or even put in one array and retrieve each of them by numbers.

Each Fasta starts with the symbol of >, as in the example:



This method is OK if your file is not too large:

sub ler_fasta {
    my %seqs;
    my $header;
    my $seq;
    open (IN, $arq) or die "abrir o arquivo falhou $arq: $!\n";
    while (<IN>) {
        if (/>/) {
            if ($seq) {
                $seqs{$header} = $seq;

            $header =~ s/^>//; # remove o ">"
            $header =~ s/\s+$//; # remove espacos / tabs no final

            $seq = ""; # apaga a sequencia antiga
        }  else {
            s/\s+//g; # tira os espacos etc.
            $seq .= $_; # adiciona a nova sequencia
    close IN;

    if ($seq) { # a ultima sequencia
        $seqs{$header} = $seq;

    return \%seqs; # retorna o array das sequencias



As in Fasta files there are the headers and suspicious contents it would be interesting for you to keep them separate and recover what you want from each of them:

use strict;
use warnings;

my $file = 'arquivo.fasta';
open my $info, $file or die "Nao foi possivel abrir o arquivo $file: $!";
@cabecalho = ();
@conteudo = ();
while( my $linha = <$info>)  { 
    if($linha =~ '>'){
       push(@cabecalho, $linha);
       push(@conteudo, $linha);
    last if $. == 2;

close $info;

Then to recover:

print $cabecalho[0];  
  • (we lack a more formal specification of the format and the intended but) how relates the header to the content?


(I know this question is old, but it is so rare different things of html and js that I do not resist...)

use strict;

sub ler_fasta { my $file=shift;
  local $/="'>'";        # separador de registo=  '>'
  my %val;

  open(FASTA, "fasta.txt") or die "Nao foi possivel abrir o arquivo: $!";
  while( <FASTA>)  { chomp;
      if(/(.+)\n(.+)/){ $val{$1}=$2 }
  return \%val

This way the values are associated with the identifier (Ex: print $val->{Pvivax_1})

use Data::Dumper;    print Dumper( ler_fasta("fasta.txt"))


$VAR1 = { 'Pvivax_2' => 'TTGGCCC',
          'Pvivax_1' => 'AAGGTTT'
  • 1

    the problem in doing this with fasta files is that most of them (>80%) have irregular headers, because by the nature of these files can be written anything there. However, I think it is valid to keep your answer to have more examples of perl in Portuguese.

  • @Kyllopardiun, thanks for the comment, I’m missing a FASTA format definition! I looked at wikipedia and the example presented in our "question" is not exactly according to the format described in wikipedia...

