How to encode an array of bytes (string) in another database in order to represent the result with the characters of A-Z and 0-9 in Delphi?

Asked

Viewed 610 times

5

I need to reduce the size of a string, but keep it in a predetermined character range.

Use an encryption routine that the result returned by it is a hexadecimal character set, par a par, representing the code ASCII of the characters of a sentence.
That is, at the end of the processing of this routine the result is still subjected to a conversion using the function for conversion into hexadecimal:

for char in valResult do
    result := result + IntToHex( ord(char), 2);

The result returned by this routine is what I need compact.

However, we will soon be changing this kind of routine to strong encryption, with asymmetric encryption, and then it will be necessary reduce the size/compact the result of the return of this type of routine.

So, How to compress string so that the result of this routine would have only the predetermined characters, in my case the following characters: A-Z and 0-9?

To facilitate understanding, good would be if we could have an example in , or even in another language that could later be converted.

  • @Bacchus, A-Z and 0-9 only. Letters only uppercase. Grateful!

  • Before I try to adjust, you are aware that anything in these characters will be mere base 36 right? That is, it increases by about 40% in size...

  • @Bacco, sorry, I have no idea!

  • This is tense! = ) I need a smaller string and using these characters to solve my problem. We cannot give up one or the other. However, as long as there is hope, there is struggle (rs). I will continue to insist a little more to try to find a solution to this problem.

1 answer

2


As already quoted, in the general case this problem is impossible: no compression algorithm has positive performance in the average case (The Pigeon House Principle), and by restricting which characters are allowed in the output you have fewer bits to represent the same information - so that the demand for space increases. Meanwhile, in specific cases there may be a solution - just not at all easy. I don’t know Delphi, so I won’t risk producing an example, but I will "put the cards on the table" so that you can evaluate its possibilities.

  1. If your string is random (e.g., a randomly generated cryptographic key), give up. It is theoretically impossible, or need to wait for a better answer (unless someone is willing to explain why is impossible, and you have an interest in hearing).

  2. If your strings are low entropy (ex.: "male" and "female"), so it should be possible to compress them in some way (e.g., "m" and "f"). Natural language texts (e.g., English) usually satisfy this requirement, as well as data that take up more space than the minimum necessary to represent the information they carry (as in the example of the above genre). Problem: this compression has to reduce its original string at the very least for about 70% of the original to continue at zero-a-zero (assuming a 40% increase when coding at base 36, as per comment from Bacco) - and even more if you want it to get smaller.

    • Ideally, this is done using a compression algorithm that’s already done - less work for you, and probably a better result (makes the details complicated for professionals). This is more feasible if the strings are long. A lame example would be to create a file with the contents of the string, zip, and convert the resulting binary file to base 36 (and capitalize).

    • If you have a few (2 or 4) different compression methods that have better or worse results depending on the string, it may be advantageous to use 1 or 2 bits to tell which method is being used, and then the compressed string (Warning: 99% of the time is unnecessary, even because many compression algorithms already do it for you).

    • If a ready-made method is not available, and you need to implement it by hand, I suggest studying (and if necessary adapting) one of the classic compression algorithms, such as the Huffman encoding. It’s simpler than it looks at first glance, and it still leaves room for optimizations if you can move some of the information out of the string (see next item).

  3. Any and all information that is equal for every string must be removed from the string. One problem with compression algorithms is that - in addition to the compressed string itself - they also have to store the dictionary that maps bits/bytes to strings. If you can employ a single dictionary for all strings (and this dictionary is good enough to compress them all), then the problem becomes much more doable.

    • Example: if all your string starts with AVE, aB1 or 0*( without exception, then you can replace the first three letters of it with the binary prefix 0, 10 or 11, and then compress from the fourth letter forward.

    • One way to apply this would be to take a large set of strings (all of which are available), concatenate them, and apply the Huffman encoding to the result. So you take the symbol table and you put it hardcoded in your source code. Henceforth every string to be compressed can use the same table, so it is virtually guaranteed that they will get smaller by going through this process (and unless some of them are substantially different from the initial set - see item 1 again).

Notes:

  • Normally I would suggest using base 32 instead of 36, to simplify much the conversion. But as in your case each bit fraction counts, a good output can be to use an "arbitrary precision integer" (Bigint) or "bit sets" library (Bitset). Here’s a, But I can’t judge if it serves that purpose. The general idea is: 1) create a Bigint with the binary data resulting from the compression phase; 2) convert it to a string in base 36, using some function of the library itself. To get back the original string, just do the reverse process: create Bigint with the data in base 36, take the binary data and reverse the compression process.

  • I explained the possibilities in general, without going into too much detail to not get too extensive. If you can further detail your problem (i.e. put more context), explain better what you intend to do, why, what your limitations, what types of data you are dealing with, etc., then it may be possible for us to see an unexpected output. In compression, exploring particular cases is a must. In the general way the question is, I fear that the first answer received (i.e. "is impossible") is in fact the correct one...

  • I changed the text of the question, and I believe it fits right into the first point of your answer, ex.: uma chave criptográfica gerada aleatoriamente.

  • @Tiago From what I understand in your description, it is not a "cryptographic key" that you want to compress, but the "ciphertext". If so, there is hope! I will update the answer, but first tell me: 1) What types of data are being encrypted? Can you compress them before cipher? 2) What is the average length of these strings? A strong encryption adds a few extra bytes to the string (IV or nonce, and maybe a MAC), but otherwise does not change its length much. P.S. Already defined which encryption algorithm will be used?

  • @Tiago Ops, now that I’ve read "asymmetric encryption", then in fact there is a key to be transmitted yes... In this case, the overhead will be more than "a few bytes". : ( This seems to me a very complicated problem, so the maximum context that you can give, better. Read this post on the goal to better understand why this context is important. In particular in this case, where a generalized solution is impossible, we need to cling to the particular details.

  • Thank you for all the explanation. And our strategy regarding the problem has been redirected. It is a subject that I want to deepen, in this case, compression, cryptography.. Not always the answers will be the way we expect. Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.