Check out of Base64?

Asked

Viewed 1,212 times

4

I’m making an email application (Hapijs) and found that some emails have his text encoded for Base64, but others do not.

In this application I will need to receive emails from all services (Gmail, Hotmail, ...) and I need to do a method to check if the text is in Base64 or not, only forward it for decoding or direct to the client.

I’ve searched a lot and so far I haven’t been able to find anything that works 100% as I need, and since I’m new to programming, I still don’t have enough knowledge to figure out how to do it myself...

Code I’m using to try to verify:

let base64 = /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/;

        let isBase64Valid = base64.test(mail.text); // base64Data is the base64 string

        if (isBase64Valid) {   
            // true if base64 formate
            console.log('base64');
        } else {
            // false if not in base64 formate
            console.log('String');
        }
  • Does that help? https://stackoverflow.com/questions/8571501/how-to-check-whether-the-string-is-base64-encoded-or-not

  • It helped a lot, clarified a lot, but it still hasn’t solved the problem...

  • So maybe it’s a good time to [Dit] the question and increment it. You must have already tested some code, so put it in the question and indicate what were the errors or difficulties encountered.

  • I used some codes from the link you gave me they work but not correctly, because even the emails that are in Base64 it is identifying as if it was not, and I will edit the question and add code to it to clarify the doubt

  • All e-mail is in Base64 or only part of it?

  • His text is in Base64 and if you have any image it is also converted to Base64

Show 1 more comment

2 answers

1

Using javascript the most correct approach to check whether a given {String} was (is) coded in Base64 on the front end is enveloping in a block try\catch the return of function atob() compared to the hidden return itself since, the VM of javascript browser will already issue an exception in case of failure.

Some examples here from the Stackoverflow community (Portuguese, English) say that the following approach is the most correct:

function isBase64(str) {
    try {
        return atob(str) ? true : false
    } catch(e) {
        return false
    }
}

However this approach is incorrect since the following example would return a "false-positive":

isBase64('jgjhgj hg') // true

When actually returning from the above example using atob() would be:

console.log(atob('jgjhgj hg')) // "á8`"

The most correct front-end approach

The correct one would be to "find" the "decoding" and compare it to the input thus:

function isBase64(str) {
    try {
        return btoa(atob(str)) === str ? true : false
    } catch(e) {
        return false
    }
}

This refutes the cases of "false positives":

isBase64('jgjhgj hg') // false

No back-end (Nodejs)

There are no native functions in Nodejs as btoa() or atob() so it is very common to use third party modules or use Buffer to achieve the same result.

It is important to note that not all third-party libraries report "exceptions" or make a comparison against the input and so it is easy to pass by "false positives".

The following example uses Buffer to encode and decode in addition to check against input:

function atob(str) {
    return new Buffer(str, 'base64').toString('binary');
}

function btoa(str) {
    let buffer;
    if ( str instanceof Buffer ) {
        buffer = str
    } else {
        buffer = new Buffer(str.toString(), 'binary')
    }
    return buffer.toString('base64')
}

function isBase64(str) {
    try {
         return btoa(atob(str)) === str ? true : false
    } catch(ex) {
        false
    }
}

Testing is possible to realize that does not report "false positives":

console.log(isBase64('SGVsbG8gV29ybGQh')) // true

console.log(isBase64('jgjhgj hg')) // false

The use of Regexp (opinion question)

If it is not possible to credit that the input (origin) of the {String} in fact the need for verification (and therefore the need for verification) is not always the use of RegExp should be understood as "the best option" the following example expresses this question:

function isBase64(str) {
    return /^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$/.test(str)
}

isBase64('SGVsbG8gV29ybGQh') // true

isBase64('jgjhgj hg') // false

isBase64("regexnaofunciona") // true

isBase64("hoje") // true

isBase64("errado+tanto+faz") // true

The above expression is flawed because it validates any {String} with a length of 4 or multiple of 4.

It is worth noting that if it is not possible to affirm that the {String} de facto input was encodada in Base64 there is no guarantee that the RegExp above does not validate it thus forming a "false-positive".

  • Excellent answer, seeking to abstract the falsos-positivos thing that happens quite often mostly talking about javascript.

0


Strings in Base64 have only the characters of a-z, A-Z, 0-9,'+','/' and '=' that is, if there is any character other than these, for example a space ' ', then that string is not in Base64. This is the test that must be done by regex.

Try changing the Regex to this one: ^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$

To encode and decode use the functions btoa() to encode and atob() to decode. Following is reference to the functions: JS Base64 Encode and Decode

  • Thank you very much Alexandre...

  • 1

    Sorry, but I don’t think this regex is correct because it validates any set with multiple length of four ex: "regexnaofuncion", "today", "wrong+whatever+does"... the correct would be to envelop in a try\catch the return of function atob using the possible encoding string. This would actually define whether the string was encoded in Base64

Browser other questions tagged

You are not signed in. Login or sign up in order to post.