Regex to pick up capitalized words in the middle of a sentence

Asked

Viewed 93 times

0

my need is to pick up words that start with capital letters in the middle of a sentence.

Example: "Facebook and Twitter are interesting. You can register anytime".

In the example above, I need to get Facebook and Twitter, but not "You".

My idea would be to take all the words with capital letters in which before there was no point and a space in sequence. I’ve tried a few things, but nothing’s right.

1 answer

0


The solution is to use Negative Lookbehind. So your regex would look like this:

'(?<!\.\s)[A-Z][a-z]+'

Complete code:

const regexp = /(?<!\.\s)[A-Z][a-z]+/g
const text = "O Facebook e Twitter são interessantes. Você pode fazer cadastro a qualquer hora"

const result_array = text.match(regexp)
console.log(result_array)

What are lookbehinds

lookbehinds are specific cases of lookarounds. Here is the definition of book by Jeffrey Friedl:

lookaround are similar to word limit metacharacters like b or anchors and $ in the sense that they do not match the text, but rather positions with the text. However, lookaround is a much more general construct than the limit and special case word anchors

A lookaround type, called Lookahead, looks ahead in the text (to the right) to see if its subexpression can match and is successful as a regex component. The positive Lookahead is specified with the special sequence! (?=...) as in (?= d), which is successful at the positions where the digit comes next. Another type of lookaround is the lookbehind, looking back (to the left). It is given with the special sequence (?<=...), as in (?<= d)", which is in successful positions with one digit to the left (i.e., in the positions after one digit). (FREE TRANSLATION)

Only complementing, (?<=...) is the Positive Lookbehind, while in your case we use (?<!...) which is the Negative Lookbehind. I recommend reading this section of the book for further clarification. (See pages 59-67).

  • Yeah. That way it picks up, but I couldn’t change the original string, taking out the spaces. Maybe I didn’t explain myself well. What I’m trying to do is identify these words, and then put a "#" before each one. It’s for a Twitter bot. But from your idea you can already get a sense!

  • Right @Marciosouza . I edited the answer to do the regex in the direct string.

  • 1

    Thanks @Lucas, it worked. I didn’t know about the concept of Lookaround. Very interesting.

  • Complementing the solution with the addition of hashtags: const regexp = /(?<!\.\s)[A-Z][a-z]+/g&#xA;const text = "O Facebook e Twitter são interessantes. Você pode fazer cadastro a qualquer hora"&#xA;&#xA;const result_array = text.match(regexp)&#xA;&#xA;result_array.forEach((word)=>{&#xA; text = text1.replace(word, #${word})&#xA;});

Browser other questions tagged

You are not signed in. Login or sign up in order to post.