In terms of user experience, is it impactful to remove accents and special characters from a URL?

Asked

Viewed 158 times

5

I wonder if there is a reason to remove special characters from a URL. For example, the URL below (here from Sopt):

en.stackoverflow.com/questions/27177/o-que-is-callback

It has, of course, a special character (é). I consider special character all those who nay is part of the ASCII table, characters that usually are removed in that part of the URL that some call "Slug".

Unlike sites like Stackoverflow, some others seem to prefer to "remove the accent" or simply remove the character altogether in order to turn, for example, the previous address into something like:

  • en.stackoverflow.com/questions/27177/o-que-e-callback
  • en.stackoverflow.com/questions/27177/o-callback

I would like to know if there is any motivation to remove these characters. Is it connected to the user experience? What impacts can I cause by removing them or keeping them?

It seems to me that this issue is more focused on the scope of the user experience because I see many sites (like the OS itself) keep accents and special characters in the URL, while others opt for removal. So, as some do and others do not, I believe there is no "technological limitation" or a problem of actually using them.

If there is any "technological limitation" that motivates the removal of these characters (going beyond the field of user experience), I would also like to know.

  • 2

    I’m new in the area, but I believe that in the past there were problems naming files with special characters. Although the browser shows the special character, the URL is encoded and is sent "%C3%A9" instead of "is". Even today if you copy the URL you will receive encoded. You may also have problems if the charset is not configured correctly, the same problem that occurs when you see an interrogation (?) instead of the accents, then instead of saving a "is-[...]" file, it is saved "? -[...]".

  • Probably because this decision to trade á for a or remove is due to encoding, á will be coded as %E1 and in UTF-8 will be %C3%A1, then there will be the problem of solving it at all times, take into account which part of the URL can be used in the query, of a document or bank, being bank depending on the encoding another example that will be equivalent would be ß and ss, what could conflict with other things, on an "international" website (which uses more than one language) ...

  • .... not working with accents is always easier, it’s not just a matter of UX (for multi-language websites), it’s an issue that technically within the site system facilitates (of course, if you know what you’re doing). There is of course the fad to talk that we should always use UTF-8 because it is better (lie told by a staff there), on a site that does not need emoji and only accents, latin1 (and equivalent) will solve very well.

1 answer

1

The question has two points, an objective, if there is a technical impediment, and a more subjective, the impact on the user experience.

Starting with the simplest, there is some technical impediment?

Not today, as I had commented, I have heard that in the past it was not possible (or at least not feasible) to use special characters in filenames (an older user can confirm this), but this is an old problem, there are still certain reserved characters, for example, the / on Linux, but accents and even emojis are allowed.

It is not a limitation, but a caution that should be taken, if the encoding is not well configured, may occur to receive or save files with the characters , ?, é, etc, which can lead to bugs and of course not good for the user.

Even if the browser URL shows the special characters (not all) below the screens, they are encoded before the request is made. Obviously a URL like pt.stackoverflow.com/questions/27177/o-que-é-callback is not valid, the browser encodes it in pt.stackoverflow.com/questions/27177/o-que-%C3%A9-callback, some browsers do this conversion (although IE has some problems), but the URL shown to the user is the encoded one (again the IE), even in the major browsers there are differences in what is shown in the URL bar, for example in Firefox, shows the space character ( ), in Chrome shows encoded (%20).

Some parsers, like Stack Overflow, which identifies a chunk as a URL and renders it as a link (for example, / is transformed into <a href="https://[...]" [...]>https://[...]</a>) may understand that certain special characters are a delimiter of the URL making it is not displayed as desired, as in en.stackoverflow.com/questions/27177/o-que-is-callback (here I am forcing with the markdown).

So even if there is no concrete problem, there are some precautions that can make the removal make sense.

Moving on to the more subjective part, there is impact on user experience?

It does not impact the removal of accents by maintaining the letter without accentuation, or impact very little. We are intelligent beings (or at least we should), so someone who reads the text "o-que-e-callback" understands that that "e" there is actually a "is", since the context helps.

But the complete removal of the character is bad, especially in small words. Even in a context, many may have difficulties and even not understand that in the URL filmes.com/categorias/ao, "to" is actually "action".

Depending on your user it will not make a difference, illiterate people (for example, children of preschool age) will obviously not read the link, just click to find out (or not) what this is about.

In some specific cases it can cause some confusion, similar to the case of bad/unused commas, for example, noticias.com/pais-sem-dinheiro can be "parents without money" (legal guardians) or "country without money" (homeland, nation). Other specific cases may require you to maintain the accent, for example, an online dictionary will have the Urls dicionario.com/esta and dicionario.com/está, which refer to totally different words.

  • I tried to look for references that corroborate the subjective part, although I even found something that helped me to answer, it was nothing directly related, at most something about SEO, problems arising and use of non-Latin characters

Browser other questions tagged

You are not signed in. Login or sign up in order to post.