Doubt about Regular Expression

Asked

Viewed 81 times

1

Good night to you all.

I would like to know a regular expression for the following information:

Numbering and title of a chapter.

Example:

  1. INTRODUCTION

number = 1.

title = INTRODUCTION

1.1. GENERAL OBJECTIVE

number = 1.1.

title = GENERAL OBJECTIVE

1.2. SPECIFIC OBJECTIVE

number = 1.2.

title = SPECIFIC OBJECTIVE

I need this to generate a summary in the following format:

  1. | INTRODUCTION | PAGE

1.1. | GENERAL OBJECTIVE | PAGE

1.2. | SPECIFIC OBJECTIVE | PAGE

That is, the regular expression should be able to recognize numbers followed by dots in the following generic format:

Primary Title => x.

Secondary Title => x.x.

Tertiary Title => x.x.x.

Quaternary Title => x.x.x.x.

And so on and so forth.

Thanks for your attention. I embrace you all.

  • 1

    Have you tried any ? And also depending, can use some DOM parser...

  • I tried this: @"[ d]+[. ] .

  • I forgot to mention, I’m using the C language#.

  • I used this site: http://rubular.com/r/v5TNAzCQKa It works, but when programming in the application it is wrong...

  • By the way, what is DOM parser? I looked here and did not find ...

  • Peri, here it appears...

  • But -- in what format is your text there? You know that the most common is the opposite, right? Special markings in the text of what is title, chapter, etc... generate the numbers (and then generate the titles).

  • The posted answer solves your problem. To test expressions, use this online tool: http://www.regexr.com/

Show 3 more comments

1 answer

2


This regular expression here solves:

"^\s*?(?P<numero>(\d\.)+)\s*(?P<titulo>.*)$" 

You did not say what tool you will use to apply the regular expression - this may have something specific to Python regular expressions - where I tested it. The documentation is in: https://docs.python.org/3/library/re.html

An interactive Python 3.5 prompt:

In [47]: import re

In [48]: a = """
    ...: 1.INTRODUÇÃO
    ...: número = 1.
    ...: 
    ...: título = INTRODUÇÃO
    ...: 
    ...: 1.1. OBJETIVO GERAL
    ...: 
    ...: número = 1.1.
    ...: título = OBJETIVO GERAL
    ...: 
    ...: 1.2. OBJETIVO ESPECÍFICO
    ...: 
    ...: 1.2.1. Detalhamento
    ...: 2. Outro Capítulo
    ...: """

In [49]: [(m.group('numero'), m.group('titulo')) for m in re.finditer(r"^\s*?(?P<numero>(\d\.)+)\s*(?P<titulo>.*)$", a, re.MULTILINE) ]
Out[49]: 
[('1.', 'INTRODUÇÃO'),
 ('1.1.', 'OBJETIVO GERAL'),
 ('1.2.', 'OBJETIVO ESPECÍFICO'),
 ('1.2.1.', 'Detalhamento'),
 ('2.', 'Outro Capítulo')]

(The function re.finditer returns a "match Objects" iterator - these in turn have a method group which can be called with the desired group name. The group name, in turn, is given within the regular expression itself, using the (?P<nome>...) ) This part of the group names should be the only thing that changes if the regexp tool you are going to use is different from the Python regexps.)

  • From his example I was able to obtain the following regular expression that met me: s*(( d.)+) From it I could obtain chapter numbers. Thank you very much jsbueno.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.