Split text

Asked

Viewed 260 times

2

I need to take a specific piece of a file . txt that is between <tag> and </tag>.
But I need to get all the lines where it happens.

Example:

<tag1>titulo</tag1>
<tag2>subtitulo</tag2> 

texto... 

<tag2>subtitulo</tag2> 
texto... 

<tag1>titulo</tag1> 
<tag2>subtitulo</tag2> 
texto...

I want to pick up the text between these tags and save.

  • 1

    You want to take the text between the <tag1> and <tag2> tags, is that it? If it’s no problem to put the application code, it would be easier to help...

  • That’s it! My system needs to do this with this txt.

  • 2

    I don’t understand if you want to capture titulo and subtitulo or what is outside the tags as texto....

  • and why the <tag2> is not closed?

  • I want to take what’s outside the tags.

  • forgot to close.

Show 1 more comment

1 answer

0


Try this code:

var pattern = /<tag.*?>(.*?)<\/tag.*?>/g; //Padrão que a regex vai dar Match
var reader = new FileReader(); //Leitor do arquivo txt
var output = ""; //Variável de texto onde será carregado o conteúdo do txt
reader.onload = function (e) {  //Função para leitura após o carregamento do reader
            output = e.target.result;
            displayContents(output);
        }; //Fim da função
        reader.readAsText(filePath.files[0]); //insira o path para seu txt
var m;

do {
    m = pattern.exec(output); //executa o padrão da regex e armazena os val em m
    if (m) {
        console.log(m[1]); //mostra o resultado no console do browser
    }
} while (m); 

Explanation of the code:

  • The following function loads the txt file through a FileReader

  • Passes your content to the variable output.

  • Print the content that match regex in the browser console.

Explanation of the regex:

/<tag.*?>(.*?)<\/tag.*?>/g;
  • <tag Finds the sequence exactly equal to <tag
  • .*?> Continues (without capturing) by all characters until the first occurrence of >.
  • (.*?)<\/tag Captures all content in capture group 1 until the first occurrence of </tag
  • .*?> Continues (without capturing) by all characters until the first occurrence of >.
  • /g It is a regex modifier, indicates that you should continue trying to match until the end of the text sequence.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.