Perl script - how to calculate frequencies based on the size of the sequences?

Asked

Viewed 62 times

2

I have this perl code to calculate the di-peptide count in my sequences (are 400 combinations, for example AA, AC, AD, AE...). But I want to calculate the frequency of these counts, for that I only need to divide the count by the size of each sequence (length), but I tried several ways and could not do the correct calculation. My script:

use strict;
use warnings;
use Bio::SeqIO;

my @amino=qw/A C D E F G H I K L M N P Q R S T V W Y/;
my @comb=();

foreach my $a (@amino){
    foreach my $b (@amino){
                push (@comb,$a.$b)
        }
}
my $in  = Bio::SeqIO->new(-file => "myfile.fasta" , '-format' => 'Fasta');
while ( my $seq= $in->next_seq ) {
my @dipeps=($seq->seq()=~/(?=(.{2}))/g);
my %di_count=();
$di_count{$_}++ for @dipeps;
print $seq->id();
map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb;
print "\n";
}

I tried to:

map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb/length;

map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0/length}sort @comb;

But it didn’t work out as expected. Or do I need to calculate the size of the sequences before and set before? Suggestions?

  • care that lenght alone gives the length of $_ in your case the combination length (=2).

1 answer

0

Instead of the map (which in this case is a bit confusing) he proposed:

$len=scalar @dipeps;
for(sort keys %di_count){
   print " ",$di_count{$_}/$len}

(untested)

  • did not work, the results are incorrect.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.