Extract text with VBA

Asked

Viewed 11,310 times

0

I need to create a macro in EXCEL that goes to a cell that has a text, and brings me a piece of this text.

The problem is that the text has no pattern, only know that somewhere in this text will be something like NAME: and EMAIL: and I need to extract to another cell everything that has between those two words. For example:

Bla bla bla bla bla NOME: JOSH IDNUMBER: 098766 E-MAIL: [email protected] bla bla bla.

So I need a macro that finds the NAME: and bring everything up EMAIL: and paste it into another cell, like: NAME: JOSH IDNUMBER: 098766

  • Are you Aware this is a Portuguese-speaking forum? How are your data in the spreadsheet? Are they all in the same column? In principle, you could achieve the desired result with a combination of formulas from Excel itself, without having to resort to a Macro.

  • You failed to notice that we Speak Portuguese. Please translate your question into English.

3 answers

1

Suppose all your entries are in column "A" from row 1;

Suppose all your Entries are in column "A" Starting at Row 1

Put the following formula in cell "B1":

Place the following formula in Cell "B1":

=TRIM(MID(A1;FIND("NAME:";A1)+5;FIND("EMAIL:";A1)-FIND("NAME:";A1)-5))

Copy the formula to the other cells in column "B" and you will not need the macro.

Copy that formula to the remaining Cells in column "B" and you’re good to go with no macro.

1

Vbobcat, thanks for the help but unfortunately VB did not recognize the "FIND" function. But I found an alternative as can be seen below:

Public LastLine As Double

Public i As Double

Public TestString As String

Public TestUpperCase As String

Public TestString2 As Double

Public TestString3 As Double

Sub Click()

On Error Resume Next

    Sheets("Plan2").Select

    LastLine = Range("B" & Rows.Count).End(xlUp).Row

    For i = 1 To LastLine

        TestString = Cells(i, 2).Value
        TestUpperCase = UCase(TestString)
        TestString2 = InStr(TestUpperCase, "NOME:")
        TestString3 = InStr(TestUpperCase, "EMAIL:")
        Cells(i, 3).Value = Mid(TestUpperCase, TestString2, TestString3 - TestString2)


    Next


End Sub

0

A Regular Expression can be used.

Code

Dim texto As String
Dim objCorresp As Object, objExpReg As Object
Set objExpReg = CreateObject("VBScript.RegExp")
'Expressão Regular
With objExpReg
    .Pattern = "nome:[\s\S]+?(?=e-mail)"
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
End With
texto = "Bla bla bla bla bla NOME: JOSH IDNUMBER: 098766 E-MAIL: [email protected] bla bla bla."
Set objCorresp = objExpReg.Execute(texto)
If objCorresp.Count <> 0 Then
    For Each c In objCorresp
        Debug.Print Trim(c)
    Next c
End If

Upshot

For the example of the code, the result is: NOME: JOSH IDNUMBER: 098766

For a string with multiple lines:

texto = "Bla bla bla bla bla NOME: JOSH IDNUMBER: 098766 E-MAIL: [email protected] bla bla bla." & vbNewLine & _
"Bla bla bla bla bla NOME: JOAO IDNUMBER: 098766 E-MAIL: [email protected] bla bla bla."

The result is:

NOME: JOSH IDNUMBER: 098766
NOME: JOAO IDNUMBER: 098766

Regular Expression

The following expression may be used: nome:[\s\S]+?(?=e-mail)

Where this expression captures the text that starts with nome: and is followed by any character with quantifier Lazy [\s\S]+?, before e-mail (?=e-mail)

And the demo can be seen in this link

Enable Regex in Excel

  1. Regex needs to be enabled, Enable the Developer mode
  2. In the 'Developer' tab, click 'Visual Basic' and the VBA window will open.
  3. Go to 'Tools' -> 'References...' and a window will open.
  4. Search for 'Microsoft Vbscript Regular Expressions 5.5', as in the image below. And enable this option.

Janela Referências

Function Defined by the User

An UDF can be created, with the following code:

Function extrair_texto_entre(inicio As String, fim As String, texto) As String
    Dim objCorresp As Object, objExpReg As Object
    Set objExpReg = CreateObject("VBScript.RegExp")
    'Expressão Regular
    With objExpReg
        .Pattern = inicio & "[\s\S]+?(?=" & fim & ")"
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
    End With
    Set objCorresp = objExpReg.Execute(texto)
    If objCorresp.Count <> 0 Then
        For Each c In objCorresp
            extrair_texto_entre = Trim(c)
        Next c
    End If
End Function

Upshot

This function can be used in the Worksheet as follows:

=extrair_texto_entre("string de início"; "string de fim"; célula ou "string")

Where it can be used either by referencing a cell or by entering a String.

String

In it you use the function as follows:

=extrair_texto_entre("nome:";"e-mail";"Bla bla bla bla bla NOME: JOSH IDNUMBER: 098766 E-MAIL: [email protected] bla bla bla.")

And get the following result: NOME: JOSH IDNUMBER: 098766

Com string

Cell-powered

This way you insert the function into the spreadsheet as follows:

=extrair_texto_entre("nome:";"e-mail";B2)

And get the result:

NOME: JOSH IDNUMBER: 098766

Com célula

  • I’m taking a look at this code and I saw that it served me in something important! But I just wanted to give an example that if he extracted only the middle characters, for example 11 22 333, I just wanted to display the "22" that is among the others in the cell, without displaying the "11" that in the case of his code he displayed in this way "11 22", which I alter in the code to just extract the medium??

  • @Elienayjunior You need to change Regular Expression, study about regular expressions and do some tests on https://regex101.com/. And in doubt open a question about regular expressions

  • I still don’t want to go into this subject completely of VBA and its functions etc.. I’m still studying advanced formulas of Excel itself and nesting them. I just really wanted a small adjustment in the code so I didn’t need to insert an auxiliary formula in Excel, to remove "11" and leave only "22"

  • You need to change the regular expression pattern here in .Pattern = inicio & "[\s\S]+?(?=" & fim & ")", where it takes the start string and the end string and captures everything that is between these two strings

  • Yes I got to touch this location, but what I was able to do was add the remaining characters that were hidden to the right of the sequence, in case I removed this (?=) and left the ( " and was "11 22 33" extracted. For me these characters are strange, in the formulas of Excel the (?) that corresponds as joker of individual characters and the ( " ) indicates open and close some content, text that is inside

Browser other questions tagged

You are not signed in. Login or sign up in order to post.