How do I egrep print the matchlist of a file?

Asked

Viewed 28 times

1

I’m reading a book of regular Expressions where the author uses egrep to show some examples. It happens that when I try to replicate the examples on my computer my output is different from the one shown in the book. In my case, the output of egrep is the full text, with the regex matchs highlighted in red. On the other hand, the author, who uses the same command, has as output the matchlist.

Example:

egrep '\bS[a-z]+' bezos.txt

Output:

Bezos was born Jeffrey Preston Jorgensen in Albuquerque, New Mexico, on January 12, 1964, the son of Jacklyn (née Gise) and Ted Jorgensen.[9] At the time of his birth, his mother was a 17-year-old high school student and his father was a bike shop owner.[10] After his parents divorced, his mother married Cuban immigrant Miguel "Mike" Bezos in April 1968.[11] Shortly after the wedding, Mike adopted four-year-old Jorgensen, whose surname was then changed to Bezos.[12] The family moved to Houston, Texas, where Mike worked as an engineer for Exxon after he received a degree from the University of New Mexico.[13] Bezos attended River Oaks Elementary School in Houston from fourth to sixth grade.[14] Bezos's maternal grandfather was Lawrence Preston Gise, a regional director of the U.S. Atomic Energy Commission (AEC) in Albuquerque.[15] Gise retired early to his family's ranch near Cotulla, Texas, where Bezos would spend many summers in his youth.[13] Bezos would later purchase this ranch and expand it from 25,000 acres (10,117 ha) to 300,000 acres (121,406 ha).[16][17] His maternal grandmother was Mattie Louise Gise (née Strait), through whom he is a cousin of country singer George Strait.[18]

No output as words Shorlty, School and Strait appear in the terminal in red.

The same command in the book would have the following output:

Shortly
School
Strait

How do I print only the matchlist, as in the book, and not the entire text?

P.S: In the book, the author uses an extension file list. Maybe this helps explain the difference in output, but I’ve never seen this file extension before, so I was wondering if it was the same extension or just the file’s sample name.

1 answer

2


According to the documentation, just use the option -o or --only-matching:

egrep -o '\bS[a-z]+' bezos.txt
egrep --only-matching '\bS[a-z]+' bezos.txt

This option is described as:

-the, -only-matching
Print only the Matched (non-empty) Parts of a matching line, with each such part on a Separate output line.

That is, print only the snippet of the string that corresponds to match (instead of the entire line), each stretch of which will be in a row.


Just a detail, testing with this line that is in the question, was returned Strait twice, since this word occurs twice in the line.

And my guess is that the book probably set the variable GREP_OPTIONS, in which you can put options default to be used in grep/egrep. For example, if I do:

export GREP_OPTIONS="-o"

I can rotate the egrep without the option -o, that the command will be run with this option, as this is what is indicated in the variable.

Or else the book defined a alias:

alias egrep="egrep -o"
  • 1

    Oh yes. Very interesting your hypothesis about the alias. It must have occurred just that. Thank you

  • 1

    @Lucas O alias or GREP_OPTIONS are the 2 ways I imagine for this to happen. Maybe he did it in some previous chapter, I do not know...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.