How to separate a file with multiple FASTA into different variables

Asked

Viewed 302 times

3

I wanted to know how I can put each FASTA in a different variable, all of which are in the same file. Or even put in one array and retrieve each of them by numbers.

Each Fasta starts with the symbol of >, as in the example:

'>'Pvivax_1
AAGGTTT

'>'Pvivax_2
TTGGCCC

3 answers

2

This method is OK if your file is not too large:

sub ler_fasta {
    my %seqs;
    my $header;
    my $seq;
    open (IN, $arq) or die "abrir o arquivo falhou $arq: $!\n";
    while (<IN>) {
        if (/>/) {
            if ($seq) {
                $seqs{$header} = $seq;
            }

            $header =~ s/^>//; # remove o ">"
            $header =~ s/\s+$//; # remove espacos / tabs no final

            $seq = ""; # apaga a sequencia antiga
        }  else {
            s/\s+//g; # tira os espacos etc.
            $seq .= $_; # adiciona a nova sequencia
        }
    }
    close IN;

    if ($seq) { # a ultima sequencia
        $seqs{$header} = $seq;
    }

    return \%seqs; # retorna o array das sequencias

Reference.

2


As in Fasta files there are the headers and suspicious contents it would be interesting for you to keep them separate and recover what you want from each of them:

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'arquivo.fasta';
open my $info, $file or die "Nao foi possivel abrir o arquivo $file: $!";
@cabecalho = ();
@conteudo = ();
while( my $linha = <$info>)  { 
    if($linha =~ '>'){
       push(@cabecalho, $linha);
    }else{
       push(@conteudo, $linha);
    }   
    last if $. == 2;
}

close $info;

Then to recover:

print $cabecalho[0];  
  • (we lack a more formal specification of the format and the intended but) how relates the header to the content?

1

(I know this question is old, but it is so rare different things of html and js that I do not resist...)

#!/usr/bin/perl
use strict;

sub ler_fasta { my $file=shift;
  local $/="'>'";        # separador de registo=  '>'
  my %val;

  open(FASTA, "fasta.txt") or die "Nao foi possivel abrir o arquivo: $!";
  while( <FASTA>)  { chomp;
      if(/(.+)\n(.+)/){ $val{$1}=$2 }
  }
  return \%val
}

This way the values are associated with the identifier (Ex: print $val->{Pvivax_1})

use Data::Dumper;    print Dumper( ler_fasta("fasta.txt"))

gives

$VAR1 = { 'Pvivax_2' => 'TTGGCCC',
          'Pvivax_1' => 'AAGGTTT'
        };
  • 1

    the problem in doing this with fasta files is that most of them (>80%) have irregular headers, because by the nature of these files can be written anything there. However, I think it is valid to keep your answer to have more examples of perl in Portuguese.

  • @Kyllopardiun, thanks for the comment, I’m missing a FASTA format definition! I looked at wikipedia and the example presented in our "question" is not exactly according to the format described in wikipedia...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.