Block of txt with Regex

Asked

Viewed 193 times

1

Guys how do I pick up block by block of a txt with this format ? I have to get it by the last date. Sometimes there are two dates in each block

AAAAAAAAAA AAAAAAAAAAA - AAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAAA em: 20/04/2017

AAAAAAAAAA AAAAAAAAAAA - AAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAAA em: 20/04/2017
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAA em: 21/06/2016

AAAAAAAAAA AAAAAAAAAAA - AAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAA em: 20/04/2017


AAAAAAAAAA AAAAAAAAAAA - AAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA
AAAAAA em: 20/04/2017
  • What exactly are the values you want to pick up from the text? All dates? If yes, they are always in this form dd/mm/YYYY?

  • Each date is formed by consecutive lines? Separating blocks with blank line?

  • A block may have more than one date.

  • And in fact there is no space between the blocks. I put here with a view to be more visible.

2 answers

0

As Flavor was not specified, Excel VBA was used, that uses javascript.

Regex

Enable Regex in Excel

  1. Regex needs to be enabled, Enable the Developer mode
  2. In the 'Developer' tab, click 'Visual Basic' and the VBA window will open.
  3. Go to 'Tools' -> 'References...' and a window will open.
  4. Search for 'Microsoft Vbscript Regular Expressions 5.5', as in the image below. And enable this option.

Janela Referências

Standard

A simple Regex that identifies if it has numbers:

Demo in Regex101 and the standard: \d{2}\/\d{2}\/\d{4}

Code in Excel VBA

Validate Dados

This pattern only takes a string in 2numbers/2numbers/4numbers format and the validation of Data, to verify if it is valid, is performed by Excel with the IsDate().

Isdate can cause some errors (as it tries to check whether it is date in universal formats, from various countries...), refer to this answer for another way to check if it is date.

Sub ValidarDados()

    Dim objStream As Object
    Dim strData As String
    Dim fileName As String, textData As String, fileNo As Integer
    sFilename = "temp.txt"
    sFilepath = ThisWorkbook.Path & "\" & sFilename
    fileNo = FreeFile                            'Get first free file number


Inicio:
    If Dir(sFilepath) <> "" Then
        Open sFilepath For Input As #fileNo
        strData = Input$(LOF(fileNo), fileNo)

        Dim objMatches As Object, objRegExp As Object
        Set objRegExp = CreateObject("VBScript.RegExp")
        'Regex
        objRegExp.Pattern = "\d{2}\/\d{2}\/\d{4}"
        objRegExp.Global = True

        Set objMatches = objRegExp.Execute(strData)
        If objMatches.Count <> 0 Then
            For Each m In objMatches
                If IsDate(m) Then MsgBox m 'Debug.Print m
            Next m
        End If
    Else
        MsgBox "O arquivo não pôde ser carregado - Escolha o arquivo."
        strPath = EscolherCaminho
        GoTo Inicio
    End If
    'Close
    Close #fileNo

End Sub

Pick Path

If you do not find the path referred in the code, a window will open to choose the file on the system.

Public Function EscolherCaminho() As String
    'Créditos: http://software-solutions-online.com/excel-vba-open-file-dialog/
    Dim intChoice As Long
    Dim strPath As String

    'only allow the user to select one file
    Application.FileDialog(msoFileDialogOpen).AllowMultiSelect = False
    'make the file dialog visible to the user
    intChoice = Application.FileDialog(msoFileDialogOpen).Show
    'determine what choice the user made
    If intChoice <> 0 Then
        'get the file path selected by the user
        strPath = Application.FileDialog( _
                  msoFileDialogOpen).SelectedItems(1)
        'print the file path to sheet 1
        EscolherCaminho = strPath
    End If
End Function

Results

With this data found in a txt test:

  • 20/04/2017
  • 20/04/2017
  • 21/06/2016
  • 20/04/2017
  • 20/04/2017
  • 32/04/2017
  • 30/02/2017
  • 20/13/2017

After validation with IsDate(), are these results:

  • 20/04/2017
  • 20/04/2017
  • 21/06/2016
  • 20/04/2017
  • 20/04/2017

0

regex: (((0[1-9]|[12][0-9]|3[01])[- /.](0[13578]|1[02])|(0[1-9]|[12][0-9]|30)[- /.](0[469]|11)|(0[1-9]|1\d|2[0-8])[- /.]02)[- /.]\d{4}|29[- /.]02[- /.](\d{2}(0[48]|[2468][048]|[13579][26])|([02468][048]|[1359][26])00))

to test: https://regexr.com/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.