I have this perl code to calculate the di-peptide count in my sequences (are 400 combinations, for example AA, AC, AD, AE...). But I want to calculate the frequency of these counts, for that I only need to divide the count by the size of each sequence (length), but I tried several ways and could not do the correct calculation. My script:
use strict;
use warnings;
use Bio::SeqIO;
my @amino=qw/A C D E F G H I K L M N P Q R S T V W Y/;
my @comb=();
foreach my $a (@amino){
foreach my $b (@amino){
push (@comb,$a.$b)
my $in = Bio::SeqIO->new(-file => "myfile.fasta" , '-format' => 'Fasta');
while ( my $seq= $in->next_seq ) {
my @dipeps=($seq->seq()=~/(?=(.{2}))/g);
my %di_count=();
$di_count{$_}++ for @dipeps;
print $seq->id();
map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb;
print "\n";
I tried to:
map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb/length;
map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0/length}sort @comb;
But it didn’t work out as expected. Or do I need to calculate the size of the sequences before and set before? Suggestions?
care that
alone gives the length of$_
in your case the combination length (=2).– JJoao