How does the int function handle the n character?

Asked

Viewed 643 times

9

I created a list:

usuarios = ['123\n','123\n4']

I tried to turn the index 0 into integer using the int()

int(usuarios[0])

Upshot:

123

But when I tried to do the same with index 1:

int(usuarios[1])

outworking:

Valueerror: invalid literal for int() with base 10: '123 N4'

I would like to know if possible all the rules of int(), because I can’t find them in Portuguese at least.

  • 1

    When you have doubts like that, I suggest using Python’s interactive mode and doing some free trials - and seeing the answers. In fact, I always suggest mante rum promprt interactive open and test almost everything there - the autocomplete of the IDE dimuni the need for some things to move "really" and "live" in the code - but it is not even remotely didactic.

2 answers

11


The rule is simple, it needs to be a string with a valid integer numeric value, that is, not having characters that hinder the correct understanding of its value, nor even decimal point. A few characters are accepted after the numbers are considered neutral (usually white space, tab, line breaks, etc.).

When it is identified that the number can be interpreted in some different way, an exception will occur ValueError.

These examples work:

print(int('12\n'))
print(int('\n123'))
print(int('1234 '))
print(int(' 1235'))

These not:

print(int('1236c'))
print(int('a1237'))
print(int('123 8'))

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

8

In accordance with the official documentation:

class int([x])

[...] If x is not a number or if base is Given, then x must be a string, bytes, or bytearray instance Representing an integer literal in Radix base. Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace.

As highlighted excerpt, the value of the parameter can be surrounded of white spaces. For practical purposes, a Trim in string before converting it to whole, thus ignoring white spaces at the beginning and the end.

TL;DR

The information below is based on the official Python implementation, known as Cpython.

To confirm this information, you can analyze the Python implementation in C:

/* Parses an int from a bytestring. Leading and trailing whitespace will be
 * ignored.
 *
 * If successful, a PyLong object will be returned and 'pend' will be pointing
 * to the first unused byte unless it's NULL.
 *
 * If unsuccessful, NULL will be returned.
 */
PyObject *
PyLong_FromString(const char *str, char **pend, int base);

The value you pass as parameter in int will be the pointer *s. When analyzing the body of the function, you will see that early on (line 2226) there:

while (*str != '\0' && Py_ISSPACE(Py_CHARMASK(*str))) {
    str++;
}

I mean, walk the string and if it is a blank space increments the pointer, causing the character to be ignored in the later steps. Any character which Py_ISSPACE true return.

#define Py_ISSPACE(c)  (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)

// pyctype.c

PY_CTF_SPACE, /* 0x9 '\t' */
PY_CTF_SPACE, /* 0xa '\n' */
PY_CTF_SPACE, /* 0xb '\v' */
PY_CTF_SPACE, /* 0xc '\f' */
PY_CTF_SPACE, /* 0xd '\r' */

That is, the characters \t, \n, \v, \f and \r will be disregarded in string.

>>> int('\t1')
1
>>> int('\n2')
2
>>> int('\v3')
3
>>> int('\f4')
4
>>> int('\r5')
5

Continuing the analysis of the body of function, we see the excerpt (line 2399):

scan = str;

# ...

while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {
    # ...
}

It assigns the input pointer str for scan and traverses it as long as the character is a valid digit, that is, less than the informed base, or the character _. Any character that does not meet these conditions will cause it to be executed goto onError, ending the function with error. Therefore, within the number the character will be allowed _ only, but any other character, including whitespace, will result in error.

>>> int('1_000')
1000
>>> int('1\n000')
...
ValueError: invalid literal for int() with base 10: '1\n000'

Finally, continuing the analysis of the function cup, we see again (line 2535):

while (*str && Py_ISSPACE(Py_CHARMASK(*str))) {
    str++;
}

if (*str != '\0') {
    goto onError;
}

Similar to the previous, to ignore the whitespace at the beginning of the string, the pointer is traversed ignoring the whitespace of the end. The condition of ending in \0 ensures that the string end with whitespace and not other characters.

In short,

  • Any white space from the start will be ignored (' ', '\t', '\n', '\v', '\f', '\r');
  • During the string, any character other than a digit or _ make a mistake;
  • Any blank space at the end will be ignored;
  • Any character that is not a digit or _ will give error, except in the above cases (beginning and end spaces);
  • 2

    Amazing answer! In your final summary, it was missing to list that "any other character at the end of the string will give error."

  • @Márioferoldi well placed, thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.