Should people’s names be stored in two or just one column?

Asked

Viewed 2,649 times

37

In business systems, especially international Erps or even in websites we see that the register asks for the last name (last name) and the name (first name). In some cases there is even the middle name (Middle name).

In Brazil I don’t remember seeing it anywhere. A single-name column is always used without separation of the parts.

What are the advantages and disadvantages of each from the perspective of modeling and user experience?

Bonus point (is not the main focus of the question and I do not know if it can be answered in an authoritative way):

Why is this happening here? It would be a historical cultural issue?

Note that it is a matter of data modeling and also user experience. they do not always directly match.

One reference on the subject.

  • 9

    On some websites, they even ask what treatment pronoun the user/person prefers to be called (Mr., Mrs., Miss, Ms , Sir.). And maybe that’s why they separate the name from the surname.. Sir Oliveira, Mr. Silva

  • 7

    @emanuelsn In fact this is a common reason to get a separate surname. In formal English, the title "Mr." is used only with the surname. Another reason is the very presentation of the name: when you take a flight, the Americanized system of the airline locates you by your last name and not by your first name. The same for American scheduling systems and CRM in general. Finally, in the USA, in social coexistence itself, people call themselves by their surname when not intimate (in Brazil we call friends by surname if the surname has a nice sonority). Cool but off-topic.

  • 2

    Like any other information (data), the choice of how it is stored should be considered according to how it will be used.

  • 7

    Unfortunately this question is being discussed at the goal. At this point the website It’s about time I didn’t need this anymore: http://meta.pt.stackoverflow.com/q/2412/101

  • 1

    Imagining then the most general case possible (including references of the comments) that would be an international ERP as you said, I think that perhaps the minimum requirement I can imagine would be pronoun, name, middle name, last name. This should suit all cultures cited here, from presenting a "Hello Bruno Silva", to a "Hello Silva, Bruno" or "Welcome MR Silva". But it would not be necessary in my answer because I do not know all cultures. I prefer to leave to someone more informed answer :o) Good luck!

  • I never know how to fill in these fields. I have two names and two surnames. What is the "middle"? And even if the site doesn’t ask that, when filling in the "Last name" do they really want two last names or just the last one? Will I know the American/English/German/French/Portuguese/Chinese culture of the site owner??

  • @But there’s another problem :)

  • @But I always have this question when filling out an English form.

Show 3 more comments

7 answers

25

In theory, knowing the culture name pattern that most users of your system are part of will make things much easier, since it will look closer to what they are used to. But this could also present disadvantages in several cases:

  • If you need to centralize multi-site user data (cultures) into a single database.
  • Even within a single country, people have different ways of forming personal names. For example, there may be foreigners in the country or even different regions of the country following different cultures.
  • Not every culture has names formed with Nome + Sobrenome or Nome + Nome da Família, in many cultures names may have only one word and this being only 2 or 3 characters (for example, An).

Okay, but divide into 2 fields or not?

This you should ask yourself if you really need to have separate fields for the name and surname, but obviously it will be simpler to leave only 1 field for the user. That’s why, it is recommended that your system be as flexible as possible.

Generally, in professional systems they use 2 fields for name and surname to treat the user formally and accurately, that is, without having to appeal to logics that take the user’s surname based on their full name (which, however well structured, are subject to failure).

Therefore, it is increasingly common for you to notice forms that ask for your full name in a field and below, or at a future time, present the message "Hi, how would you like to be called?", there goes the user, many give the first name, others the last name and some others a more informal nickname.

But answering the advantages and disadvantages of each approach in a simple way:

  • Name and Surname in 2 fields: provides a system treatment to the most formal and exact user.
  • Name and Surname in 1 field: allows the user to enter his name the way it can be, making things simpler and less laborious.

What do I think be a good way to store the full name and still have a more personal treatment:

  • Full name [__________________________]
  • How I’d like to be called? [________________] may be presented to the user not necessarily at the time of registration.

Recommended readings:

Falsehoods Programmers Believe About Names.

Personal Names Around the world.

What’s First name and Last name supposed to Mean?.

25


Modeling

From a modeling point of view, this choice does not seem to bring much distinction. The space occupied by one or the other (whether in memory, in attributes of an object, or in a table in a database) is practically the same. And there is also no advantage in using name or surname individually as identifiers, since even the full name is not unique (homonyms are very common). Even any additional processing required to concatenate the name and surname for some presentation may be disregarded. The difficulty in separating the surname of a name into a single string is more in making sure which "whole word" forms a surname than in the technical difficulty to do so. In China, for example, it is more common for the family name to come before the given name.

It may be interesting to have the name and/or surname separate for creating search or agglutination facilitator indexes in reports, or for the system to "speak" directly to the user in a formal way. But this need depends heavily on the field of application. For example, a family tree management system would most likely need to have separate surnames, as this data is a single entity by itself, manipulated by the system. A ticket sales system would also need to do this separation, not by any internal manipulation of the system but because it needs to be printed correctly on the travel ticket for attendants to use with passengers.

User Experience

Thus, it seems that this consideration of whether or not to keep name and surname separate comes more from the user experience point of view. When such information is requested this is usually done through a form with text fields for data entry. Request the user to fill in two (or more) of these fields can be more stressful than asking for just one.

Consider, for example, an entertainment app for listening to music. The mere fact that the application asks the user to enter its name may already be bad for the experience, because the user may be bothered by:

  • the use the application will make of the information (privacy issues), since the usefulness of this information is not apparent ("why I need to provide my name to listen to music?")
  • (especially) the interruption of your usage flow ("Oh, I wanted to listen to a song, not answer useless questions")

In the case of games, for example, the name information may have a very apparent utility character for the user, such as being used for the maintenance of scoreboards (high Scores). Still, it is completely unnecessary and disruptive to separately request a first and last name. That’s why, in these cases, the user commonly decides how he wants to be called (as already well cited in other answers).

Concluding

The choice about capturing (and keeping) or not separate first and last names depends on the field of application, mainly in relation to the use that will be given for this separate information (usability criterion) and how the information will be requested and used with the user (user experience criterion).

I have heard many arguments like "leave it separate because, although today we don’t need it, tomorrow we may need it". But my experience indicates that this is nonsense. First of all, because as systems are made for people to use, this kind of need is usually very visible to designers early in development. Secondly because it is not so difficult to extract automatically suggestions of surnames of a base with full names, it is clear considering the target audience of the domain of the application (for example, is it used in Brazil or China?). That is, if necessary, maintaining a system so that names originally kept in a single string are separated into surnames is not necessarily difficult, but requesting separate surnames unnecessarily is probably bad for the usage experience.

  • 2

    +1. I think you could highlight the excerpt "depends on the domain of the application".

  • @wryel Done. Good suggestion, thank you. :)

  • I disagree a little bit because depending on the application and the size of it, it is very important. Facebook, for example, searches by name constantly and certainly in a very high volume. If each part of the name is placed in a single table and there is a structure of tables n->m with composite key, there is no need to store all "John". It would be necessary to assemble a canonical tree to do the tests, but I’m sure there would be a considerable performance gain in that case. If a company spends $100,000 just on searches, 5% improvement is already excellent.

  • 1

    @Cleitonoliveira I don’t know if I understand your comment, colleague. What is the "this" that you disagree with and that you think is very important? In addition, your example seems to consider the separate name in tables, but the question (and the answer) is about separation in columns. Actually, separating into tables seems to make little sense...

  • @Luiz the text as a whole gives an impression of lack of necessity and of being a simple matter of taste, and depending on the domain of the application there is perhaps a utility. It is not a maybe: taking into account a large volume in an application where the search by name is intense, it is necessary a deeper evaluation, the assembly of use cases and optimization of queries for each case to know their computational cost. But of course, yes, it is a matter of taste and practicality in the vast majority of cases.

  • 1

    @Cleitonoliveira At no time does the text say that it is "a matter of taste". This is an interpretation of yours (and mistaken, I would say). The text essentially says that from a modeling point of view it has little difference, and from a UX point of view it may have some, but it needs to be evaluated in each scenario. The "maybe" that appears concerns a eventual specific need to search by last name. Also, I still do not understand your question about the performance of searches, since separated or not in columns of the same table the name can be searched for efficiently.

Show 1 more comment

12

It depends on the local business model and culture.

In a general summary there is no right. There is that which is most suitable. The decision of how to model strongly depends on the following factors:
1. Cultural
2. Grammar (pronunciation, phonetics)
3. Locality
4. Business model
5. Adaptations to globalisation.

From the list above, I intentionally made it in a priority order.

As already mentioned in other answers, the format of names has specific patterns according to locality/culture and language.

Many in Brazil choose the full name in a single field because it is more practical. But even in Brazil, 3-column modeling is common (name, surname and middle name).

In a country like Brazil this is even more confusing because people can have more than 3 names, not counting names that have particles (da, do, de) and other variants for names that indicate that the name is inherited from an ancestor (son, grandson). Example:

Maria Conceição da Silva
João Vasconcelos Neto

In the case of João, the family name is Vasconcelos. There is no family named "Neto".

In this case, how to define what is the surname and middle name for 3-column models?

This complicates when the names make mixtures like "João Vasconcelos Neto da Silva"

Usually the last name prevails as surname, but this also depends on the business logic of the company.

For those who do not want complications, it is more practical to opt for a single field and end of conversation. However, system modeling will be tied to a localized rule.

This same system will hardly be reusable in another place like Asia, for example.

In order to clarify the list of priority factors, read below:

Globalization

In Japan people have only 2 names. Although this has been changing due to the presence of foreigners. But in general, a native Japanese has only a name and surname, and culturally the surname must come before the name.

There is also a peculiarity, it is common for forms to ask for the name furigana. This is due to ideograms that can be difficult to understand. Even a native Japanese may get confused or have a different interpretation about a particular Korean.

Example: 大 道 ー "Daidou". But many Japanese can read as "Oomichi".
The example here is not a person name. I used as an example to simplify about the importance of furigana in forms.

Therefore, to avoid confusion, especially when serving a customer mispronouncing his name, there are 2 more columns where the furigana version is registered.

Example

Kanji: 山 田 太 郎
furigana (katakana): ヤ マ ダ タ ロ ウ
*Yamada Tarou

Furigana is used to assist in the correct pronunciation of the original Kanji. It can be represented in hiragana or Roman alphabet. But it’s usually in katakana.

Note that the Japanese/Chinese script has no spacing between words. This would make it difficult in the abstraction of names and surnames in 1-column modeling.

Since culturally they don’t have more than 2 names, it’s easy to adopt the 2 column pattern (surname + name).

When a foreigner moves to Japan, mainly foreigners from the West where they have more than 2 names, this is a huge complication for the government and companies to adapt long and "non-standard" names. Then comes the personal choice of each company. There are those who suppress the name of the person and register only the first and last name and there are those who mix the last name all together.

The name João Vasconcelos Neto da Silva becomes:

Surname: Da Silva
Name: Joao Vasconcelos Neto (without accent)

Obviously the vast majority of them are not sure how to distinguish which surname (family name) correctly and can sometimes look like this

Surname: Silva
Name: Joao Vasconcelos Neto Da

Surname: Neto da Silva
Name: Joao Vasconcelos

This gets weird when the name is printed on a document, letter, etc. Getting something like "Da Silvajoao Vasconcelos Neto". Depending on the combination can even generate an offensive word or depending on the culture of the person, it can be something not very acceptable. Thinking about it, a column is created that serves as a flag to say that the name is foreign. In this case the name is printed with spacing and up to a comma "Da Silva, Joao Vasconcelos Neto". Others are more sophisticated and print the name first. "Joao Vasconcelos Neto da Silva". This depends a lot on the company’s mindset. More modern companies with a globalized mindset tend to take care of details like this.

Document conflicts

It is common for the same person to have the name "disfigured" in different documents. In the passport, by international standards it is forbidden to delete any part of the original name, so the name remains complete. But this same person opens an account in a bank where he has a 2-name standard and is adamant, not accepting foreign names. But as a rule the banks need to register exactly as the name is in the foreign registration and, the foreign registration follows the same pattern as the passport. At that point begins the "lambança".

The bank will register the name more or less this way:

Surname: Silva
Name: Joao Vasconcelos Neto Da

And they had to adapt, increasing the space of the columns in the database because they would not fit names with so many letters.

To make it more complicated, don’t forget that there is the furigana, because they need to know how to pronounce.

If you make a credit card, the pattern of names on a credit card is always 2 names and no spacing. This is an inflexible norm in most operators.

John’s example would be printed on a credit card like:
"SILVA JOAOVASCONCELOSNETODA"


There’s no accent in their language, so I removed the accents

The conflict occurs when that person needs to register in services where the name is authenticated by certain documents. So the person fills in the name as it is in the passport and the name as it is on the credit card. This generates conflict and is even prevented from making a simple registration. This situation is common in models that do not fit globalization.

A very broad theme and here I spoke only superficially. As the focus is not specific to Japan, I think so far is enough to explain the different situations.

I could talk more about taking care not to "misread" the correct pronunciation of a person’s name, but it would make the response extensive and very localized.

Cultural changes

In China, there are also certain peculiarities. Contrary to what many imagine, they do not have only 1 or 2 names. Current generations in China often adopt an English name, which is officially registered in their birth/identity cards.

As in Japan, they have only a name and surname, but they add a third and even a fourth name in the Roman alphabet, usually in English.
Example: Chen Lee / Alfred Osbourne

The first would be the Chinese name "Chen Lee" and the second an English name that does not mean that it is the translation of the Chinese name. It is merely a second name and surname in English.

It is not difficult to find in China people with names like Marcelo, Sheila, Sarah, Alex, Candy, etc., but obviously incorporated into the name in Chinese ideograms, as in the example above. The adoption of an English name is optional at the time of birth registration and serves to facilitate when traveling to other countries because Chinese names are complicated for a person from another country to read and pronounce.

Usually forms on websites and systems in general do not ask for the English name of a Chinese and even show such an option. This is mostly used for official documents.

I cited cases like Japan and China because it is interesting to see that "homogeneous" countries (without many mixtures of race or culture) are more globalized than countries like Brazil which is even one of the most mixed/heterogeneous in the world. Theoretically, Brazil should be one of the most globalized countries in the world.

  • 2

    Excellent point regarding localized formats. + 1!

11

Different cultures has different concepts of what a name is and how to write the names. In some cultures, there’s no last name! Unless there is some important requirement to store the separate surname you can avoid enough headache using a single and very liberal field for the name. This way the user has more freedom to choose how to write his name and the system does not presuppose any rules.

If the only reason for separating the name into parts is for presentation purposes (for example, write Mr. Silva instead of the full name) an alternative possible in some cases is to put a separate field for the user himself to fill in as he prefers to be called.

  • 1

    The link is very useful and the addendum on the treatment is also welcome, but I do not know if the answer itself is sufficient to point out the advantages and disadvantages. It seems like she’s just saying do everything together, without pointing out why.

  • 1

    made an issue emphasizing that it is a matter of giving more freedom to the user.

3

Short response: in 2 or more columns. Short justification: to meet standards.


A little history helps to contextualize and understand the relevance of standards, and how they fit or not to each type of project decision.

In Brazil it is never enough to repeat, "There are patterns!". Unlike Portugal that has Europe to force the Portuguese to respect standards, Brazilian is reluctant to standardization: both the spatter colonization in the past, of dictatorship of recent history, as well as widespread and still current corruption and bureaucracy, created in the Brazilian collective imaginary a huge bias against standards... I respect anyone who holds a grudge, but that response requires us to put it aside.

Timeline of the patterns

  • 1986 - the standard has been consolidated vCard with its version 2.1, on the initiative of a consortium of large companies in the communication and computer industries, including Apple, AT&T, IBM and Siemens.
    PS: vCard is still the strongest reference for the storage and exchange of personal data.

  • ~1987 - the ISO 8859-1 standard (with our alphabet and accents!) has been consolidated.

  • ~1993 - the UTF-8 standard has been created.

  • 1999 - first version of RDF standard, supported by IBM, Microsoft, Netscape, Nokia, Reuters and several universities.

  • ~2001 - to Web Semantics was launched as a proposal, along with RDF and other standards.

  • ~2003 - "last call" of the standardization committees to definitively replace ISO 8859-1 (and others) by UTF-8.

  • 2004 - Standards of Interoperability in Electronic Government (e-PING) in Brazil.

  • ~2010 - the UTF-8 has become "de facto standard", having surpassed ASCII and ISO 8859-1 together in the web pages.

  • 2011 - arises the Schema.org as "continuous standard", supported by the consortium of Google, Yahoo, Microsoft, and Yandex, to meet practical needs of the Semantic Web and end the "Babel tower" of the basic vocabularies RDF.

  • ~2013 - to Web Semantics "took".

  • 2014 - RDF 1.1 consolidated the Semantic Web.

  • ... Today Schema.org is a de facto standard. We can use directly or indirectly with RDF, Rdfa, Microdata or JSON-LD.
    That’s where the vCard. The proper names of individuals can be expressed with the properties name, givenName, familyName, additionalName, honorificPrefix and honorificSuffix of Person.
    The estate name, means full name, but it is kind of informal, and can always be built by the concatenation of others, as a rule without affixes.

Suggestion of "minimum respect"

Store or provide indirect storage of name, and also remember that the format is also relevant (cannot be ASCII for example). Ideally have the other fields provided for Person... But there’s also a priority order suggestion.

  1. UTF-8: Let’s respect the ePING and the rest of the universe... the alphabet of the dictionary-pt is subset, and given names may or may not go further (e.g. Ångstron of the SI unit).

  2. Always have how to express the full name (full name) of Person / name.
    It does not matter if it was stored like this, or if there is a (e.g. SQL VIEW) function that guarantees 100% recovery of the full name.

  3. Never delete accents: store in a cache unnacent(nome) or use direct function, it makes no sense to "spoil forever" something as precious as our name.

Other tips:

  • Have to recover or normalize the record of "beautiful spelling", with upper and lower case. See reminder below.

  • Make use of at least, givenName to distinguish more standardized parts (not ISO but 6.4% of the hab. BR are called Maria) and may, e.g. apply spelling correction.
    The honorificSuffix is also relevant as it requires standardized spelling (e.g. "Junior" is preferable to "Jr.") and is not the surname.

  • Next, the most important property of the "name break", is familyName.


Reminder 1 - Internal name normalization

The conventions are more or less consensual to Portuguese and can be expressed more universally as regular Expression.
Step-by-step normalization of a full name (property name) of a natural person (class Person) provided in block capitals:

  1. Maíusculas initials, referred to in languages and frameworks as ToTitleCase(name), InitCap(name) and others.

  2. For security a trim() or better still a Trim after replacing all multiple or exotic spaces with the single standard space, i.e. regex /\s+/gu by space.

  3. Let us not forget the junior, which strictly needs to be written also with capital initial and full text. Therefore: replace / (?:jr\.?|j[uú]nior)$/i for " Júnior".

  4. The "e" alone, replace all the " E " for " e ".

  5. The other prepositions, / (d)([eao]|[ao]s) /gi for " d\2 ".

 

PS: are the basic rules and "consensual". Unfortunately it is impossible to dream of "embracing the world", surnames such as "van der Waals" may look like "Van Der Waals", etc. in English.
Each project or normalization context decides whether or not to risk expanding the rules to "di", "dello", "della", "Dalla", "dal", "del", "der", "em", "na", "no", "nas", "nos", "van", "von", "y", and perhaps others. Normalizing Roman suffixes (I, II, III, etc.) is also interesting but has its risks.

Reminder 2 - Advantages

The fields standardized by the class Person of Schemaorg are not an invention "out of nowhere", it has behind a story that comes from vCard.

The Semantic Web (e.g. complementing your HTML5 with Microdata) provides open data and semantic granularity in information retrieval. Not only will Google better understand, as any other tool, simple GIFT (without millions of dollars invested in statistics) will also better understand your name, your page, your data...

In terms of data logging, the storage of complete and standardized information ensures not the immediate operation of your application, but the long-term preservation (see for example databases of CRM systems, or official data) and the interoperability with other systems.

2

Giving a possible simpler explanation.

Many people have compound names, for example: João Pedro Da Silva Machado. In this case the person’s name is John Peter and not just John. This division of Name + Surname makes it easy to identify the person’s exact name. This is not the only way to get around this, as has been said in the previous answers some registrations also present the option of choosing how you want to be called. Anyway, I understand that when the application wants to present a more cordial and close way to talk to the user, saving his full name makes things a little difficult.

-3

It goes a lot of the need of your project, but even prefer only in a name field. It looks better for the searchers because you can put your table with prefix and suffixes like this, it would look. surname. It would help in the search, in the code, and would make the form cleaner and easier and attractive for mobile devices. That’s what I think, otherwise correct me.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.