Double-hand encryption with fixed LENGTH

Asked

Viewed 694 times

4

Does anyone know any way to encrypt any data, two-Handle, so I can decrypt it? I loved the MD5 and the way it fixes the length of the hash, the problem is that there is no way to decrypt (unidirectional/one-Handle).

Base64 is bad for me because it usually increases the size of what I’ve written, 30% larger (as described in the manual), and actually the intention of cryptography is to decrease...

The text to be encrypted must vary from 10~100 characters, alpha-numeric, and the field where I want to store the encryption will not accept more than 32.

  • 1

    You can detail a little bit more ?

  • 3

    base64 is not encryption algorithm. Are you sure that what you want is not a compression algorithm? Cryptography is not decreasing data, but hiding data. Who decreases the size of the data is compression. Putting something the size of 100 characters in a field that accepts only 32 will be quite complicated, unless your dice have very specific characteristics that allow great compression. And yet, I don’t know algorithms that work with so little overhead. Detail more your problem.

  • @Wineusgobboa.deOliveira , so Base64 is coding instead of encryption? I confused the concepts.. yes, of compression it may help me, the idea is that it occupies up to 32 characters a text that can vary from 10~100. It’s hard to explain why I have to do this, but it really needs to be this way. I saw something with gzcompress, but it seems that the characters are bugged, will it disturb when saving in the database?..

1 answer

3


What you seek is not encryption (technique to hide data) but compression (technique to store data using a smaller amount of bits than in its "raw" format). Base64 is neither one nor the other (it is only a form of serialization - encoding an arbitrary binary data in a set of 64 characters).

To represent a data of up to 100 characters in a field of 32 you face a number of problems:

  • No lossless compression algorithm has positive performance in the worst case (beginning of pigeon house). This means that there is a lower limit to the space that these 100 alphanumeric characters will occupy, which is the following:

    • 7 bits to store sequence size (|10-100| = 90 < 128) +
    • Without any alignment:
      • 517 bits for [a-z0-9] (log236100)
      • 596 bits for [a-zA-Z0-9] (log262100)
      • 676 bits if we consider the accents in all the variations used in the Portuguese language.
    • With character alignment for character
      • 600 bits to [a-z0-9] or [a-zA-Z0-9] (100*6)
      • 700 bits if we also consider the accents.

    As you can see, even in the simplest case it takes at least 66 bytes to represent all alphanumeric sequences of size up to 100. Unless there is some peculiarity in your data (say, possible sequences are just a subset of the total, say only those that form words, etc.) you will not be able to lossless compress all these sequences.

  • Even if you could "squeeze" your data into 32 bytes, if the field is "text" type (i.e. saves up to 32 bytes characters) it will not necessarily accept all possible values by "character":

    • If the field has UTF-8 encoding, each character can occupy more than one byte (up to 5 or 6 if I am not mistaken). Depending on the form of compression you can make your dice greater instead of minor...
    • If the field has UTF-16 encoding (unlikely), not every pair of bytes can occur in isolation - some are surrogate pairs, that need to occur together (complicating coding).
    • If the field has UTF-32 encoding (even more unlikely), then you are in luck: there are at least 19 bits available in each character, for a total of 608 (still insufficient for the case of accented data, but enough for others).

    Now if the field size limit is expressed in bytes even, then it is useless (maximum 256 bits, much smaller than the 524 you need at least).

  • Converting to and from this "optimized" format will take work, not only from the point of view of coding a solution (or finding one already ready) but also in the time that your PHP script will take to do the conversion process.

In conclusion, there is nothing that can be done in the general case. In specific cases, on the other hand, there may be a solution. In the example I gave of words, the frequency of each letter is different, so usually you can compress a bit with the huffman encoding. But still nothing is guaranteed (there may always be words that will be larger than the 32 characters desired, if your letters do not obey the typical frequency of the language in question).

  • Great, reply show.. Congratulations! I think I’ll settle for solving in another way, rather than compressing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.