Validation of string size

Asked

Viewed 219 times

3

I was creating a function to validate large text fields (description, observation, ...), which in SQL will be saved as TEXT, so I basically did this:

function($valor, $min = 0, $max = 65000) {
    if (strlen($valor) >= $min || strlen($valor) <= $max) {
        return true;
    } else {
        return false;
    }
}

When I went to find the maximum size of the type fields TEXT of the found that question of the Soen that responds to this, and it shows that the size of the text depends on the characters in the string. Actually text types do not have a character limit but a byte limit, so two questions:

  • How to validate a text entry according to the maximum byte size that that string should contain?

  • This validation is only required in fields TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT?

  • I didn’t add the tag mysql because I believe the question is relevant to more than one database

  • I don’t understand this logic, if $valor is 0 (zero) must return true?

  • If the value size is greater than the minimum and less than the maximum @Leocaracciolo

  • https://ideone.com/YOph1P

  • @Leocaracciolo $valor is the text

  • I actually meant $value="";

  • 3

    The tag is fundamental, because the Dbs handle it differently. Mysql, for example, does not measure CHAR and VARCHAR in bytes, and yes and characters and this can influence the details of the answers. Encoding is also fundamental, because this concern of yours only makes a difference in multibyte encodings (which unfortunately are used for no reason by most developers - programmer who knows what he’s doing is a minority, the rest is just a fashion follower).

  • 2

    For example, if you don’t need support for emoticons (which are usually silly in most applications) and you don’t use Eastern languages, Latin 1 has only advantages. Faster, less space, all Latin language characters, etc.

  • @Bacco in case use UTF-8

  • 2

    And do you need it? Another thing. What’s the text for if your limit is 65000 bytes? And if you need more, use a LONGTEXT then. All these variable formats, however much you abuse them, will spend at most 4 bytes more. 65000 bytes fits in Varchar, which gives more than 15000 UTF8 characters of the most complex

  • 2

    In other words: if you use Latin 1 (it can only be in that field, Mysql allows) you gain 4x the space limit, or if you use UTF-8 with 15000 limit fits in varchar, or if you use LONGTEXT you can limit to 1/4 of the maximum value without fear. And no Ifs for nothing.

  • @Bacco was worth the explanation, in the end I will use the VARCHAR, I’ll study them some more encoding

Show 7 more comments

1 answer

2


Use mb_strlen().

But it’s not that simple. This can be a problem for user usability. It shouldn’t bother you with implementation details. How are you gonna tell him what to do with something he can’t see? Either you will have to make an extremely complex algorithm or you will bother with trial and error. That’s just one of the reasons I consider using multi byte encodings a mistake.

Our industry kind of defined that it would use UTF-8 by default, I prefer Latin-1 or something like that. Whichever encoding single byte works best in space occupation and processing.

This is one of the reasons to prefer the VARCHAR that does not have this problem. If you think that you can not use, prefer a LONGTEXT of open size and be happy. But almost always the TEXT is a mistake.

One solution is to place a larger space. In general, in UTF-8, if you put the double character is already well protected, unless you expect names in Chinese, Klingon or something like that. Then it doesn’t matter the bytes. I would go that way.

See more in:

  • I believe a simple "Oversize maximum" is enough

  • What would be the encoding that I must pass on?

  • 1

    So, what I said, let him guess what’s the most he can use, doesn’t give any indication of what’s really wrong, just something on top. That’s why most software is bad to use nowadays.

  • The software in the case is for an internal system, if someone has a question they can ask pro support, in addition the maximum size that the system will receive will hardly reach more than 500 characters. It’s just to make sure you don’t try to pass something you shouldn’t to the bank

  • So it’s OK, the support is made for this very.

  • Thanks for the reply, looking at the 1° link of the list I’ll stay with VARCHAR same. I will leave the question open yet in case any person answer the second question

  • Have I read the links you’ve gone through and, with some other things you’ve read, I think I understand more or less how they work and the difference between some, would you have any more complete trusted sources? (with more options than those already mentioned in the links, differences in size, performance, compatibility, flexibility, etc.). Another question, the difference in performance and storage between something more complex like UTF-8 and something simpler like ANSI will only be noticeable by manipulating large files, right? Or the difference is considerable even in simple fields (name, address, description, etc)?

  • No, no and depends.

  • Was it worth opening a question about the second part? (storage difference and performance)

  • I don’t know, you need to make sure everything hasn’t already been answered in other questions.

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.