2
I’m learning Go language in the competition part. I had the challenge of using the Generator standard to get a Channel that reads the title of a URL through a goroutine.
Inside this goroutine that I built, the reading is done through http GET and then gets it in a string after checking a regex. Initially, the code returned the index error out of bounds (Panic: Runtime error: index out of range), and found that the mistake was because of the line break between the tags <title>
, my regular expression didn’t recognize this line break using (.*?), for the point (.) disregard line break characters.
I discovered this by giving view-source on any site, realizing that not all titles are set between tags <title>
on the same line, and they may also be on broken lines, for example:
<title>
meusite
</title>
instead of <title>meusite</title>
So far, so good.
With this, I tried to improve my regex to match titles that are in the same line, as well as broken lines, but unfortunately I did not succeed because the code did not return titles the way I wanted.
Below is my source code:
//Padrões de concorrência - Generator
//Para mais informações sobre padrões de concorrência, visitar a documentação
//Google I/O 2012 - Go Concurrency Patterns
package main
import (
"fmt"
"io/ioutil"
"net/http"
"regexp"
)
func tituloURL(urls ...string) <-chan string {
ch := make(chan string)
for _, url := range urls {
go func(url string) {
resp, _ := http.Get(url)
html, _ := ioutil.ReadAll(resp.Body)
//r, _ := regexp.Compile("<title>(.*?)<\\/title>")
// r, _ := regexp.Compile("<title>(.|\n)*?<\\/title>")
r, _ := regexp.Compile("<title>(.*?)|([^\\d])*?<\\/title>")
//r, _ := regexp.Compile("<title>([\\s\\S]*?)<\\/title>")
// r, _ := regexp.Compile("<title>(.|[\\s\\S])*?<\\/title>")
ch <- r.FindStringSubmatch(string(html))[1]
}(url)
}
return ch
}
func main() {
t1 := tituloURL("https://www.github.com", "https://www.linkedin.com")
t2 := tituloURL("https://www.instagram.com", "https://www.youtube.com")
fmt.Println("Prmeiros títulos:", <-t1, "|", <-t2)
fmt.Println("Segundos títulos:", <-t1, "|", <-t2)
}
As you can see, I tried to use some regex patterns, and Regexpal gave match, but the code did not return the expected result.
Some of you already have some suggestion of another regex that can fix this error?
I’m counting on your help!
I’m on hold.
You can use that to parse HTML. Is not recommended the use of regexps.
– Ainar-G