How to separate Edit String in Excel

Asked

Viewed 347 times

1

I need to separate this type of string (Example):

1 Estruturas lógicas. 2 Lógica de argumentação: analogias, inferências, deduções e conclusões. 3 Lógica sentencial (ou proposicional). 3.1 Proposições simples e compostas. 3.2 Tabelasverdade. 3.3 Equivalências. 3.4 Leis de De Morgan. 3.5 Diagramas lógicos. 4 Lógica de primeira ordem. 5 Princípios de contagem e probabilidade. 6 Operações com conjuntos. 7 Raciocínio lógico envolvendo problemas aritméticos, geométricos e matriciais.

I want the result to look like this, cell below cell:

1 Estruturas lógicas.  

2 Lógica de argumentação: analogias, inferências, deduções e conclusões.  

3 Lógica sentencial (ou proposicional). 

3.1 Proposições simples e compostas. 

3.2 Tabelas-verdade.

3.3 Equivalências.

And so on and so forth. However, I did not find a rule/function to do this in excel VBA

1 answer

2


To solve the problem I went to study Regex (Regular Expression), but as I was learning, I couldn’t create a complex expression that would solve the problem. So I went to get the response in the Global OS. Credits: Wiktor Stribiżew

Step 1: Enable Regex in Excel

  1. Regex needs to be enabled, Enable the Developer mode
  2. In the 'Developer' tab, click 'Visual Basic' and the VBA window will open.
  3. Go to 'Tools' -> 'References...' and a window will open.
  4. Search for 'Microsoft Vbscript Regular Expressions 5.5', as in the image below. And enable this option.

Janela Referências

2nd step: Creation of Regex

One demonstration of regex can be seen, where the expression used is: \d+(?:\.\d+)*[\s\S]*?[\D]+\.(?=\s*(?:\d+|$))

Where the developer explanation of this code is as follows:

Detailing

  • \d+ - 1 or more digits
  • (?:\.\d+)* - zero or more sequences of:
    • \. - dot
    • \d+ - 1 or more digits
  • [\s\S]*? - any 0+ chars, minimum possible, until the first...
  • \w+\. - 1 or more chars words followed by .
  • The (?=\s*(?:\d+|$)) check positive requires the presence of zero or more blanks (\s*) followed by one or more digits (\d+) or end of string ($) immediately to the right of the current location.

Step 3: VBA code

Dim str As String
Dim objMatches As Object
str = "1 Estruturas lógicas. 2 Lógica de argumentação: analogias, inferências, deduções e conclusões. 3 Lógica sentencial (ou proposicional). 3.1 Proposições simples e compostas. 3.2 Tabelasverdade. 3.3 Equivalências. 3.4 Leis de De Morgan. 3.5 Diagramas lógicos. 4 Lógica de primeira ordem. 5 Princípios de contagem e probabilidade. 6 Operações com conjuntos. 7 Raciocínio lógico envolvendo problemas aritméticos, geométricos e matriciais."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
  For Each m In objMatches
      Debug.Print m.Value
  Next
End If

Out of Debug.Print of:

Saída Imediata

Note

Regex was using any letter [a-zA-Z] followed by a point to determine the end of a list item. But in the case of ending with ). was not recognised, therefore it was changed from any letter to any non-digit \D.

Note that \D combines any non-digit symbol. You may want to use \d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$)). And for the ^, It can combine string start (with Regexp.Multiline = False), or start line (with Regexp.Multiline = True) - Wiktor Stribiżew 5 hours ago

And another tip: If there are no line breaks in the string, [\s\S] can be replaced by . - Wiktor Stribiżew 4 hours ago

  • A new test was performed with the new expression and can be seen here. Where the new word has been replaced in the code in an edition of this reply:

    objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
    
  • If the error occurs, Build error: User-defined type not defined. And this line is highlighted: Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")

    Decode the Createobject and delete New regex code in this way:Set objRegExp = CreateObject("VBScript.RegExp")

Implement in cells

A Brief code to implement in an Excel spreadsheet. To allocate these values in cells just work with the value m.Value

Assuming the string in cell A1 and the values will be inserted from cell B1 to last line B.

Dim str As String
Dim objMatches As Object
Dim i As Long: Let i = 1
str = Cells(1, 1)
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\D\.(?=\s*(?:\d|$))"
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
  For Each m In objMatches
      Cells(i, 2) = m.Value
      i = i + 1
  Next
End If

Browser other questions tagged

You are not signed in. Login or sign up in order to post.