How to group substrings in condition and return them with BASH_REMATCH

Asked

Viewed 141 times

0

I need to split a file-named string, in 5 parts, as an example below:

#!/usr/bin/env bash

str="python-zope-proxy-4.3.5-1-x86_64.chi.zst"
pkg_re='(.+)-[^-]+-[0-9]+-([^.]+)\.chi.zst*'

[[ $str =~ $pkg_re ]] && 
   pkg_base=${BASH_REMATCH[1]} 
   pkg_arch=${BASH_REMATCH[2]}

echo $pkg_base
echo $pkg_arch

Blocks need to be as follows:

bloco 1 - python-zope-proxy # pkg_base
bloco 2 - 4.3.5-1           # pkg_version_build
bloco 3 - 4.3.5             # pkg_version
bloco 4 - 1                 # pkg_build
bloco 5 - x86_64            # pkg_arch

With regex below, I get block 1 and block 5, as shown above.

str="python-zope-proxy-4.3.5-1-x86_64.chi.zst"
pkg_re='(.+)-[^-]+-[0-9]+-([^.]+)\.chi.zst*'


[[ $str =~ $pkg_re ]] && 
    pkg_base=${BASH_REMATCH[1]} 
    pkg_arch=${BASH_REMATCH[2]}

The expression would have to handle filename strings also in the following cases:

str="python-4.3.5-1-x86_64.chi.zst"
str="python-zope-4.3.5-1-x86_64.chi.zst"
str="python-zope-proxy-4.3.5-1-x86_64.chi.zst"

Anyone with a little extra time to help? rsrs

Regards

Vilmar

  • Try to use the expression ([a-zA-Z0-9]+(-[a-zA-Z0-9]+){,2})-(([0-9]+(\.[0-9]+){,2})((\-)([0-9]+))?)-([a-zA-Z0-9]+_[a-zA-Z0-9]+).* and the indices 1, 3, 4, 8 and 9 of array BASH_REMATCH, respectively.

  • @Rfroes87 Your code worked perfectly, thank you

  • Great! When possible, I ask you to check both answers (mine and @Marcelovismari) and accept the answer that best suits what you need, please.

  • 1

    Important in these cases to say what you tried (a [mcve] preferably) and the difficulty found. To better enjoy the site, understand and avoid closures and negativations worth reading What is the Stack Overflow and then the Stack Overflow Survival Guide (summarized) in Portuguese.

  • @Bacco, I disagree on the mistaken closing, claiming lack of examples; both the question and the answers were clear, so much so that it was understood from first and posted answer even with examples.

  • 2

    @vcatafesta was not mistaken no, if it were the community would have voted to reopen then. The system is designed for this. Maybe after editing (that the community has access to the complete history) the community can reconsider, but still remains to explain what tried to solve and failed. Anyway, if you study the links I passed maybe better understand the purpose of the site and the reason for the closure.

  • 1

    The title could also be better worked.

  • 1

    P.ex. How to group substrings in condition and return them with BASH_REMATCH or something like that.

  • 1

    @Bacco thank you for the explanation, sincerely thank you, in order to be the first or perhaps the second publication, in the future I will take care to follow the rules.

  • 1

    @Rfroes87, I will accept the suggestion regarding the title, anyway thank you to all.

Show 5 more comments

2 answers

2


As an alternative to the solution provided by the user @Marcelovismari, here is a regular expression compatible with bash 4.2+:

str="python-zope-proxy-4.3.5-1-x86_64.chi.zst"

if [[ $str =~ ([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*)-(([0-9]+(\.[0-9]+)*)(-([0-9]+))?)-([^.]+).* ]]; then
  for group_num in "${!BASH_REMATCH[@]}"; do
    echo "group ${group_num}: ${BASH_REMATCH[$group_num]}"
  done
fi

Whose output is:

group 0: python-zope-proxy-4.3.5-1-x86_64.chi.zst
group 1: python-zope-proxy
group 2: -proxy
group 3: 4.3.5-1
group 4: 4.3.5
group 5: .5
group 6: -1
group 7: 1
group 8: x86_64

The interest groups in this example would be the 1, 3, 4, 7 and 8.

EDIT 1: Modified quantifier + (1 or more characters) to * (0 or more characters) in suffix of pkg_base (first - onwards after python, in the example) and pkg_version/pkg_version_build (first . onwards after 4, in the example).

EDIT 2: Unlocking the regex proposal:

  • ([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*) - Root group capture starting with a string 1 characters or more ([a-zA-Z0-9]+) and ending with an optional subgroup ((-[a-zA-Z0-9]+)*) starting with a hyphen and followed by a string 1 characters or more
    That should correspond to python-zope-proxy, python-zope-proxy-lorem-ipsum-dolor... (tending to infinity) or just python

  • (([0-9]+(\.[0-9]+)*)(-([0-9]+))?) - Root group capture starting with a subgroup containing an integer number of 1 or more digits ([0-9]+), an optional nested subgroup ((\.[0-9]+)*) starting with a point (\.) and followed by an integer number of 1 or more digits; closed the first subgroup, a new optional subgroup begins ((-([0-9]+))?, with the quantifier ? indicating 0 or 1 occurrences) started by a hyphen and followed by a nested subgroup containing an integer of 1 or more digits.
    That should correspond to 4.3.5, 4.3.5-1, 4 or 4-1. Taking 4.3.5-1 as an example, this part of the regular expression would be grouping together 4.3.5-1, 4.3.5 and 1 in dedicated subgroups (the 2 subgroups adjacent to the root)

  • ([^.]+) - Root group capture starting with a lista negada and capturing any 1 or more characters.
    This should match any size and type of characters until it stops in the character . (or at the end of string); p ex.. x86_64 (x86_64.chi.zst), x86 (x86.zst), x64 (x64.chi), x (x), etc..

EDIT 3: Added output code enforcement bash presented in the solution.

EDIT 4: As @hkotsubo mentioned in her comment, both quantifiers * as + are classified as Greedy; quantifiers non-greedy or greedy - among other information relevant to quantifiers in general - are described in more detail in their answers to the questions Difference between unreasonable quantifiers ?? and *? and Regular Expressions: Lazy quantifier function "?".

  • in the above expression, gave match in the following cases: str="python-zope-4.3.5-1-x86_64.chi.zst" str="python-zope-proxy-4.3.5-1-x86_64.chi.zst" but in the following, it fails: str="python-4.3.5-1-x86_64.chi.zst" However, this expression works in all cases: ([a-za-Z0-9]+(-[a-za-Z0-9]+){,2})-(([0-9]+(. [0-9]+){,2})((-)([0-9]+))? )-([a-zA-Z0-9]+_[a-zA-Z0-9]+).*

  • @vcatafesta I updated the answer. Test again and give me feedback if you have any more problems.

  • show now! Thank you

  • I updated the answer to contain more details about the composition of the regular expression that I proposed.

  • 2

    In fact so much + how much * and ? are Greedy, what changes is only the amount: + is "1 or more times", * is "zero or more times" and ? is "zero or 1 time" (which is another way of saying "optional"), but all are Greedy (try to catch as much as possible, within their limits). To be non-greedy would have to be +?, *? and ??. Understand the difference here and here

  • 1

    @hkotsubo Noticed, grateful for the correction! For a while I’ve been using unagreeative quantifiers in my regular expressions (mainly *?) and I ended up mixing the nomenclature in my head due to the fact that * make the pattern optional. I will be editing the reply according to your observations.

Show 1 more comment

0

To meet the most critical, follow the conversion of the JS example to bash:

str="python-zope-proxy-4.3.5-1-x86_64.chi.zst"
pkg_re='^([a-z-]+)(-)([0-9\\.]+)(-)([0-9])(-)(.*)(.chi.zst)$'

[[ $str =~ $pkg_re ]] &&
    pkg_base=${BASH_REMATCH[1]}
    pkg_version=${BASH_REMATCH[3]}
    pkg_build=${BASH_REMATCH[5]}
    pkg_arch=${BASH_REMATCH[7]}

echo $pkg_base
echo $pkg_version
echo $pkg_build
echo $pkg_arch
  • funfou liso também tua expresso em todos os casos: str="python2-4.3.5-1-x86_64.chi. zst" str="python-zope-4.3.5-1-x86_64.chi.zst"python-zope-proxy-4.3.5-1-x86_64.chi.zst" , which is missing, concatenating is no problem. Thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.