Import from txt, change content and download R content

Asked

Viewed 288 times

3

Hi!

I am beginner in programming namely R, I have read quite still very confused with libraries and everything and still do not know what its limitations, so I ask you if it is possible with R to do the following:

I put it in *txt youtube video list, import list to R, it puts in data.frame, changes link to the "http://ssyoutube.com/..." and then download the entire list automatically.

What you have is you can only download one at a time, and you need to click button.

Can you automate all this with R, or I need to learn another programming language?

Back, but still very messed up, because I’m really a beginner :(

# importar_listas
# fiz assim para importar várias listas se necessário
# mas já topei que dá erro se tiver algum *txt temporário
# ou directório dentro desse.
# já por isso também deixei nome dos *txt sem espcaços
# alguma dica?

# buscar nome dos ficheiros no directório "youtube"
nome_ficheiros <- list.files(path="youtube/.")

# cola directorio + nome de *txt
ficheiro_t <- paste("youtube",nome_ficheiros,sep="/")

[1] "motivacao_desportiva.txt"  "natureza_linda.txt"       
[3] "pegadinhas_divertidas.txt"

# cria as listas de links em cada *txt 
lista_ficheiros <- lapply(ficheiro_t, read.table, stringsAsFactors = FALSE)

[1] "youtube/motivacao_desportiva.txt"      "youtube/natureza_linda.txt"       
[3] "youtube/pegadinhas_divertidas.txt"

# cria tabela com todos links, ordenados por linha
db_yt <- do.call(rbind, lista_ficheiros)

[[1]]
                                       V1
1 https://www.youtube.com/watch?v=C-Lq6FTwRWI
2 https://www.youtube.com/watch?v=0fVdAsTp0EU
3 https://www.youtube.com/watch?v=kYSdIQ16UxY
4 https://www.youtube.com/watch?v=gHaP_oJa_l0

[[2]] ...

Here began the confusion, as I had no success with download, I created several "urls"

# com todas as variaçoes que encontrei

for(i in seq(nrow(db_yt))) {   

    # transforma "youtube"em "ssyoutube.com"
  db_yt[i,2] = gsub(".*www.", "http://ss", db_yt[i,1])

    # transforma_em_http://en.savefrom.net/#url=
  db_yt[i,3] = gsub(".*www.", "http://en.savefrom.net/#url=", db_yt[i,1])

  # transforma_em_link_completo_site_download
    db_yt[i,4] = paste("http://en.savefrom.net/#url=", db_yt[i,1], "&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com", sep = "")
 }
                                       V1
 1 https://www.youtube.com/watch?v=C-Lq6FTwRWI
 2 https://www.youtube.com/watch?v=0fVdAsTp0EU
 ...
                                    V2
 1 http://ssyoutube.com/watch?v=C-Lq6FTwRWI
 2 http://ssyoutube.com/watch?v=0fVdAsTp0EU
 ...
                                                       V3
 1 http://en.savefrom.net/#url=youtube.com/watch?v=C-Lq6FTwRWI
 2 http://en.savefrom.net/#url=youtube.com/watch?v=0fVdAsTp0EU
                                                                                                                        V4
 1 http://en.savefrom.net/#url=https://www.youtube.com/watch?v=C-Lq6FTwRWI&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com
 2 http://en.savefrom.net/#url=https://www.youtube.com/watch?v=0fVdAsTp0EU&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com
 ...

# ainda criei este vector para usar link original do *txt
save_url <-c("http://en.savefrom.net/#url=", "&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com")


# importar bibliotecas
library(xml2)
library(rvest)
library(httr)

# simulador_de_browser
uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"

# site de download (1º link "completo")
url <- "http://en.savefrom.net/#url=https://www.youtube.com/watch?v=PWkIoMOBiKE&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com"

# site para usar
site <- html_session(url, user_agent(uastring))

# descobrir o form
form <- site %>% html_form()

# só existe 1, logo
form <- form[[1]]

here big trouble, because it seems that it is inside iframe and I can not get there

# nomear_ficheiro_para_gravar
# se analisar o html e fizer search pelo "nome do video" simplesmente não aparece :(
nome <- site %>% html_nodes(xpath='//*[@id="answer-28575576"]/table/tbody/tr[1]/td[2]/div/pre') %>% html_text()

# submeter dados
dados <- set_values(form = form,
                     sf_url="http://youtube.com/watch?v=Q84boOmiPW8"
)

download <- submit_form(site, dados)
download.file(url, paste0(download, destfile="mp4", method="wget", mode = "wb"))

# se estivesse tudo funcionando
for(i in seq(nrow(db_yt))) {
  # site de download - 1º link
  url <- db_yt[i,4]

  # site para usar
  site <- html_session(url, user_agent(uastring))

    # demora um pouco a carregar página
    Sys.sleep(20)

    # submeter dados
  dados <- set_values(form = form,
                      sf_url=db_yt[i,1]
  )
  download <- submit_form(site, dados)
  download.file(url, paste0(download, destfile="mp4", method="wget", mode = "wb"))

    # nova pausa
    Sys.sleep(40)
}

Where is giving error is after Submit, but I appreciate all suggestions to improve even what for me does not give error.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.