To solve the problem I went to study Regex (Regular Expression), but as I was learning, I couldn’t create a complex expression that would solve the problem. So I went to get the response in the Global OS. Credits: Wiktor Stribiżew
Step 1: Enable Regex in Excel
- Regex needs to be enabled, Enable the Developer mode
- In the 'Developer' tab, click 'Visual Basic' and the VBA window will open.
- Go to 'Tools' -> 'References...' and a window will open.
- Search for 'Microsoft Vbscript Regular Expressions 5.5', as in the image below. And enable this option.
2nd step: Creation of Regex
One demonstration of regex can be seen, where the expression used is: \d+(?:\.\d+)*[\s\S]*?[\D]+\.(?=\s*(?:\d+|$))
Where the developer explanation of this code is as follows:
Detailing
\d+
- 1 or more digits
(?:\.\d+)*
- zero or more sequences of:
\.
- dot
\d+
- 1 or more digits
[\s\S]*?
- any 0+ chars, minimum possible, until the first...
\w+\.
- 1 or more chars words followed by .
- The
(?=\s*(?:\d+|$))
check positive requires the presence of zero or more blanks (\s*)
followed by one or more digits (\d+)
or end of string ($)
immediately to the right of the current location.
Step 3: VBA code
Dim str As String
Dim objMatches As Object
str = "1 Estruturas lógicas. 2 Lógica de argumentação: analogias, inferências, deduções e conclusões. 3 Lógica sentencial (ou proposicional). 3.1 Proposições simples e compostas. 3.2 Tabelasverdade. 3.3 Equivalências. 3.4 Leis de De Morgan. 3.5 Diagramas lógicos. 4 Lógica de primeira ordem. 5 Princípios de contagem e probabilidade. 6 Operações com conjuntos. 7 Raciocínio lógico envolvendo problemas aritméticos, geométricos e matriciais."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
For Each m In objMatches
Debug.Print m.Value
Next
End If
Out of Debug.Print
of:
Note
Regex was using any letter [a-zA-Z]
followed by a point to determine the end of a list item. But in the case of ending with ).
was not recognised, therefore it was changed from any letter to any non-digit \D
.
Note that \D
combines any non-digit symbol. You may want to use
\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))
. And for the ^
, It can combine
string start (with Regexp.Multiline = False), or start line
(with Regexp.Multiline = True) - Wiktor Stribiżew 5 hours ago
And another tip: If there are no line breaks in the string, [\s\S]
can
be replaced by .
- Wiktor Stribiżew 4 hours ago
A new test was performed with the new expression and can be seen here. Where the new word has been replaced in the code in an edition of this reply:
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
If the error occurs, Build error: User-defined type not defined. And this line is highlighted: Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
Decode the Createobject and delete New regex code in this way:Set objRegExp = CreateObject("VBScript.RegExp")
Implement in cells
A Brief code to implement in an Excel spreadsheet. To allocate these values in cells just work with the value m.Value
Assuming the string in cell A1 and the values will be inserted from cell B1 to last line B.
Dim str As String
Dim objMatches As Object
Dim i As Long: Let i = 1
str = Cells(1, 1)
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
For Each m In objMatches
Cells(i, 2) = m.Value
i = i + 1
Next
End If