REGEX - Capitalized words in the middle of the sentence

Asked

Viewed 2,884 times

6

Is there any regex/replace to uppercase words in the middle of the sentence to lowercase?

(Yes, I could pass everything to Lower) but there is a catch, the rule should be ignored if the word is later to point (.).

Example:

  • Non-authenticated User. Contact ADM.

for:

  • User not authenticated. Contact ADM.

3 answers

8


You can use this regex:

/^[^]([^.]*)/

It captures text from the beginning to the last character before the first ., ignoring the first character ([^]) storing in group 1. Then you convert to lowercase with .toLowerCase() in the replace:

var string = "UsuÁrio Não AUtenticadO. Contate o ADM.";

var res = string.match(/^[^]([^.]*)/)[1];
string = string.replace(res,res.toLowerCase());

console.log(string);

Or you can take everything up to the word "Contact":

/[^](.+?(?=.\sContate))/

var string = "UsuÁrio Não AUtenticadO. Contate o ADM.";

var res = string.match(/[^](.*?(?=.\sContate))/)[1];
string = string.replace(res,res.toLowerCase());

console.log(string);

EDIT

If there are dots in the middle of the string, this regex ignores the first uppercase letter after the dot. As the result returns an array with more than 1 match, it was necessary to loop the array ignoring the last match (Contate o ADM):

/([^.\sA-Z][^.]*)/g

var string = "Usuário Não Autenticado. Contate o ADM. Em Caso De Ou O A. Vou À Bahia. UsuÁrio Não AUtenTicadO. Ok Vamos testa. Contate o ADM.";

var regex = /([^.\sA-Z][^.]*)/g
var res = string.match(regex);

for(let x=0; x<res.length-1; x++){
   string = string.replace(res[x],res[x].toLowerCase());
}

console.log(string);

  • var string = "User Not Authenticated. Ok Let’s test. Contact ADM." ; .. I made a change and it didn’t work out.

  • @Marconciliosouza Truth. I reread the question and it seemed to me that when there is some point in the middle, the first word should keep the caps. Blz, I’ll try to fix that in the answer.

  • 1

    @Marconciliosouza I got it with a simpler regex. I did an Edit in the reply.

  • 1

    @danieltakeshi I’ve updated the Edit of the answer. I think you’re right. Obg!

1

You can use this Regex:

(?<!\.)(?:\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff][a-zA-Z\u00C0-\u00ffA-Z]*)|\s(A|O|À)(?=\s|\.))

or

(?<!\.)(?:\s([A-ZÀ-Ý][A-ZÀ-Ý]*[a-zà-ÿ][a-zA-ZÀ-ÿ]*)|\s(A|O|À)(?=\s|\.))

And the demo on Regex101.

However, there is the problem with Given Names, but if you do not use them this Regex can capture what you want.

This Regex captures general words and text in general and not only the example phrase, I suggest in the next questions that post more examples of Regex and that are "proof of errors".

Explanation

1st alternative

(?<!\.)\s([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[a-z\u00E0-\u00ffA-Z\u00C0-\u00dd]*)

  • (?<!\.) - Negative Lookbehind - If there is a character . before the word, does not capture the string.
  • \s - Captures any white space (equal to [\r\n\t\f\v ]).
  • ([A-Z\u00C0-\u00dd][A-Z\u00C0-\u00dd]*[a-z\u00E0-\u00ff]+[A-Z\u00C0-\u00dd]*) - Capture Group () - Captures words that are not completely uppercase.
    • [A-Z\u00C0-\u00dd] - First capital letter - Corresponds a letter between A-Z and index 192 and 221 of Unicode.
    • [A-Z\u00C0-\u00dd]* - The second letter may be uppercase or not - Corresponds zero to infinite letters between A-Z and index 192 and 221 of Unicode.
    • [a-z\u00E0-\u00ff] - Lower case letter in word - Matches a letter between a-z and index 224 and 255 from Unicode.
    • [a-zA-Z\u00C0-\u00ffA-Z]* - After lowercase, captures lowercase or uppercase letters - Corresponds zero to infinite letters between a-z and between A-Z and index 192 and 255 of Unicode.

Does not capture fully uppercase letters as they can be acronyms.

Or

|

2nd Alternative

In cases with the capital pronouns o, a or crase. Which are letters "alone".

\s(A|O|À)(?=\s|\.)

  • \s - Captures any white space (equal to [\r\n\t\f\v ]).
  • (A|O|À) - Capture Group - Capture literally A or O or À.
  • (?=\s|\.) - Positive Lookahead - After capture group, a white space is required \s or | a point \..

1

If you want a solution clean using regex use this expression for capture:

(^.)|(ADM)|((?<!\. )[A-zÀ-ÿ ])

And that expression for substitution:

$1$2\L$3

In your JS code, you can use:

str.replace(/(^.)|(ADM)|((?<!\. )[A-zÀ-ÿ ])/, "$1$2\L$3")

You can test these expressions on this link


Explanation captures

(^.) - Captures the first character to prevent it from getting lowercase.
| - OR
(ADM) - Captures exactly ADM
| - OR
(?<!\. ) - Egative lookbehind, prevents the following sequence capture if any . before.
[A-zÀ-ÿ ] - captures all characters, either lowercase or uppercase.

Replacement explanation

$1 - capture group 1 ((^.))
$2 - capture group 2 ((ADM))
\L$3 - capture group 3 lowercase ((?<!\. )[A-zÀ-ÿ ])

Browser other questions tagged

You are not signed in. Login or sign up in order to post.