Assign id according to text content

Asked

Viewed 246 times

2

I don’t know if it’s better Javascript, PHP, or even the sublime text, but,

I copied the text of cf and pasted on a txt (to get rid of those weird tags)

then in the sublime text, I:

  • selected everything crt+to
  • tightening crt+shift+l to insert per line
  • tightening alt+shift+w to insert html tags in each line
  • digit li, then already creates the opening and closing tags for each line

Then I ask you to select all:

</li>

<li>TÍTULO

and replace with:

      </li>
  </ul>

  <ul id="titulo" class="titulo">
      <li>TÍTULO

And then, I repeated this phase for each subsection: titulo>capítulo>seção>subseção>artigo

I managed to reach this result: cf_step_1.txt

I would like the algorithm to search each ul and assess the value of id according to the content of each ul.

for example, it’s like this:

<ul id="titulo" class="titulo">
   <li>
      TÍTULO I ...

<ul id="artigo" class="artigo">
   <li>
      Art.  1º A ...

I’d like to keep it that way:

<ul id="titulo1" class="titulo">
   <li>
      TÍTULO I ...

<ul id="artigo1" class="artigo">
   <li>
      Art.  1º A ...
  • 1

    Only result link & #Xa;http://preliminarte.com.br/cf_passo_1.txt Vlw missing

  • From what I understand, you want to make one parse from the text and extract some information (type, title number, article, etc.), possibly modified (type, Roman numbers for Arabic). Right? If the content is static - and you’re already making "manual" transformations - I suggest experimenting with regular expressions first, maybe it’s enough for what you want. I don’t know the sublime text, but you probably have this option on find/replace...

1 answer

1


As the content seems to me very uniform, a set of substitutions via regular expressions should be sufficient to achieve its goal. These replacements can be made in any of the three ways mentioned: in the text editor itself (using the "Search/Replace" - "Find/Replace") or using any programming language, including PHP and Javascript. The syntax of regexes will be similar in all cases (PCRE), the way to use them not so much.

By way of example, the conversion of the articles would be as follows:

Research:

<ul id="artigo" class="artigo">\n   <li>\n      Art\.  (\d)º

Replace:

<ul id="artigo$1" class="artigo">\n   <li>\n      Art.  $1º

The others (which include Roman numerals) would have the added complication of converting them (if it is really important to convert them). This time helps the use of a programming language, where you can replace the result of the match with the return value of a function. Example in Javascript:

var regex = /<ul id="titulo" class="titulo">\n   <li>\n      TÍTULO ([^ ]+) /g
var convertido = str.replace(regex, function(match, romano) {
    var arabico = deromanize(romano);
    return '<ul id="titulo' + 
           arabico + 
           '" class="titulo">\n   <li>\n      TÍTULO ' + 
           romano + 
           ' ';
};

Using a function of its own to convert Romans to Arabic, such as the deromanize described by that article:

function deromanize (str) {
    var str = str.toUpperCase(),
        validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/,
        token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g,
        key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1},
        num = 0, m;
    if (!(str && validator.test(str)))
        return false;
    while (m = token.exec(str))
        num += key[m[0]];
    return num;
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.