Regex for Cifras site as Cifraclub

Asked

Viewed 3,090 times

6

  • 4

    I don’t understand. You want to capture the ciphers from this site with Javascript and RegEx? Edit your question to make it clearer.

  • I have a (free) project that does this. I think you will like to see the following class. It extracts title, artist, ciphers etc. https://github.com/colares/touke-flow/blob/master/Packages/Application/Apimenti.Translator/Classes/Apimenti/Translator/Service/SongParserService.php P.s.: the name of the project is Touke: http://touke.dreamers.com Full code: https://github.com/necklaces/touke-flow.

4 answers

9


Regex is not the right tool for this problem. By inspecting the page’s source code, I see that the ciphers are each in one element b, with class _i0, _i1, _i2 and so on (irrelevant in this particular case). The ideal then is to search the DOM for these elements:

var cifras = document.getElementById("ct_cifra")
                     .getElementsByTagName("b");
var texto = "";
for ( var i = 0 ; i < cifras.length ; i++ )
    texto += cifras[i].innerHTML + " ";

If you want to extract these ciphers directly from the site, you can use a bookmarklet and display the resulting text in a alert, console.log, etc (somewhere where you can copy and paste).

javascript:(function(){var cifras=document.getElementById("ct_cifra").getElementsByTagName("b");var texto='';for(var i=0;i<cifras.length;i++)texto+=cifras[i].innerHTML+' ';alert(texto);})();

Note: code tested successfully in Firefox and Chrome, but in Chrome I could not select the text from the alert dialog box to copy.

  • 1

    I understood, but the problem is this (I think I didn’t express myself right): I’m taking the content of the div that has the cipher in Cifraclub, and putting in my application, then I need to be able to separate from the cipher what is chord and what is the lyrics of the song, so that later you can change the tone and etc...

  • @Douglasaraújo how are you getting the contents of this div? Manually, programmatically? The ideal would be to take advantage of this information (Markup) while it still exists; discarding it (i.e. picking up only the text) makes things much more difficult later...

  • Yes, today (in the old application) caught up with PHP and take the tags. But really, now I understand your point of view. It would be better to leave, than take and have to look after what is chord/letter. But the application today works with PHP. I was trying to do it all with JS. But from what they said, I see that it will not be possible to take the cipher of Cifraclub with JS, right?

  • 1

    @Douglasaraújo in the browser, no, for security reasons. But as indicated by Pagliuca, you can do this on the server side, using V8 for example (or the full Node.js). My alternative (bookmarklet) works, but the user has to navigate to the page and click on the bookmarklet (you could then send the captured text to the server instead of displaying on the screen). Besides laborious, I do not recommend it, because in general bookmarklet is a very unsafe technique (javascript runs in the context of the page, malicious code could do anything with it).

  • But the goal of me redoing the application, was not to use any server-side language... Node.js also depends on a server running, not?

  • 1

    @Douglasaraújo Yes, it depends... Unfortunately that you want will not be possible, security issues have priority on browser. If a site could read arbitrary content from other sites (without their authorization - see CORS) then nothing important/private could be done in a browser.

Show 1 more comment

2

The following Xpath code returns all page ciphers.

//pre[@id='ct_cifra']/b/text()

Just adapt it to the language you are using.

To test it, in your browser’s JS Console, use the following expression:

$x("//pre[@id='ct_cifra']/b/text()")

1

I’m assuming that you refer to client-side Javascript (browser), and not to a server application written in Node.js, for example.

Stackoverflow in English has a similar question: https://stackoverflow.com/questions/597907/open-webpage-and-parse-it-using-javascript

The most voted answer indicates that it is impossible for you to do this using only Javascript, since for security reasons, ajax is only allowed to load pages from the same domain as yours.

To load content from another domain, as is your case, you will need a server-side script (in PHP, Python, Ruby, etc).

0

<?php

/* A variável Cifra guarda a letra da música cifrada*/

$cifra = "E7     Amaj7
Antes de eu falar
             A9
Tu cantavas sobre mim
C#m               B4
    Tu tens sido tão
         A9
Tão bom     pra mim

C#m                   B4
    Antes de eu respirar
              A9
Sopraste Tua vida em mim
C#m               B4
    Tu tens sido tão
        A9
Tão bom    pra mim";

/* Esse é o padrão (regex) que utilizei para reconhecer todos os acordes da letra cifrada da variável acima */ 

$pattern = "/((?!(?<=[ÇçáéíóúÁÉÍÓÚ]))(((\b([A-G]{1,1}((?!([\scefghiklnopqrtuvxwyz]))((add|ADD)|#m|#|º|b)?([2-9]|m(\B|$)|maj|sus|dim|\(|º)?(\/[A-G][2-9](#|º)?(b{1,1})?)?(\+)?(\/([A-G]|[2-9]))?(º|#|b|\([2-9]\))?([2-9]|m)?(\+|(?<=[2-9])M)?(11|13)?(\/)?([A-G])?)([Eb|Bb])?(\([0-9]?[0-9]?(\/)?[0-9]?[0-9]?)?(\/1[1-9])?\b(\(((2|4|5|6|7|9|11|13)|[bB][2-9])\))?(-|\/[A-G])?(\/[2-9])?([2-9]|\)|\((b13|b11|b5)\)|\([2-9]\)|-|\/)?(\+)?)(m|b|º|#|13|add[2-9]\))?)|\b(?(?<=(?<=\s)[A-G](?=\s)))[A-G](?!(\s)?[a-zH-Z])\b)(º)?)((\(([0-9]{1,2})?(\-)?(\+)?\))|(?!((?:[çáéíóúÇÁÉÍÓÚ])|((\t|\s)[a-zA-Z][A-Z]|\s[A-G][a-z]\w|\s[A-G]\s\w[H-Z])\w?)))|(?(?<=(?<=\s)[A-G]M|[A-G]m|[A-G]m(?=\s)))([A-G]m|[A-G]M(?!\s))(#|b)?([2-9]|maj|sus|dim|º|º)?(\/[A-G][2-9](#)?(b{1,1})?)?(\+)?(\/([A-G]|[2-9]))?(#|b|\([2-9]\))?([2-9])?(\+)?(\([A-G][2-9]\))?(\([1-9][1-9]?\))?(11|13)?(-|\/[A-G])?(\/[2-9])?([2-9]|-|\/)?(\+)?(b|#|13)?(?!(\s)?[a-zH-Z])(?!(?:[,çáéíóúÇÁÉÍÓÚ]))\W)/";

/* Apliquei abaixo o comando preg_replace que coloca todas ocorrências (acordes) encontradas entre as tags </b>*/    

$cifra = preg_replace($pattern, "<b>$1</b>", $cifra);

echo "<pre>". $cifra ."</pre>";
  • 1

    Put an explanation of how this code works.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.