Posts by Daniel Falbel • 12,504 points
268 posts
-
1
votes1
answer1199
viewsA: Problem with arrays in Python 3.6
On the line: tf.square(self.Y, self.y_data) should be: tf.square(self.Y - self.y_data). You want to calculate the average quadratic error and not pass self.y_data as the name of the operation.…
-
0
votes2
answers48
viewsA: multivariate regression in Keras
It makes sense to think that the more information you put into the model, the better your hit will be. But this is not always true when we are talking about forecasting error. It may be that this…
-
2
votes1
answer146
viewsA: Bar Graph in ggplo2, geom_bar()
Just add scale_y_continuous(expand = c(0,0)) to the graph. Example: library(ggplot2) ggplot(iris, aes(x = Species)) + geom_bar() + scale_y_continuous(expand = c(0,0))…
-
6
votes3
answers166
viewsA: Find an expression in several elements of a list
I’d do it this way: library(purrr) buscar_nome <- function(lista, nome) { map_lgl(lista, ~any(nome %in% .x[[1]])) %>% which() } # > buscar_nome(lista, "Maria da Silva") # [1] 1 # >…
ranswered Daniel Falbel 12,504 -
5
votes3
answers1376
viewsA: Fill column of a data frame with data from another data frame in R
I would do it using the package dplyr. With the dplyr you will combine simple operations until you achieve the result you want: First, the databases: dados1 <- data.frame( ITEM =…
-
5
votes4
answers7118
viewsA: Changing the name of a variable in a dataframe R
With dplyr you can do so: library(dplyr) df <- df %>% rename(Pais = Country)
ranswered Daniel Falbel 12,504 -
4
votes1
answer95
viewsA: How to use dplyr within a function?
The dplyr has a very nice syntax for programming interactively and the tradeoff of this is precisely programming when we are within functions. The best place to understand how it all works is this…
-
1
votes1
answer671
viewsA: R in Jupyter Notebook
Basically what you need to do is install the Irkernel which is equivalent to Ipython. To install just run on R: install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools',…
-
3
votes2
answers114
viewsA: What do you call the previous month in the R?
With lubridate, can do so: library(lubridate) month(today() - months(1))
-
2
votes3
answers1583
viewsA: function in R that also returns the execution time itself
The accuracy of microbenchmark is in running several times the functions thus avoiding being influenced by possible computer crashes that could affect the running time. When you spin the…
-
2
votes1
answer119
viewsA: How to use strings as parameters in R
The best place to understand how to use strings instead of variable names is this document Programming with dplyr. It is necessary to do something very similar to parse and eval that you quoted, but…
-
5
votes1
answer72
viewsA: What does the following line on R do?
The syntax the way it is is a little weird. As described in the documentation, the first argument can be an array, from which you want to take a display or an integer number. If it’s an integer,…
ranswered Daniel Falbel 12,504 -
0
votes2
answers948
viewsA: How to get random results in SQL with different Dbms?
In SQL Server, if you don’t need to select all the rows, but only a sample of the rows and performance is important, use the following: SELECT * FROM table WHERE (ABS(CAST( (BINARY_CHECKSUM(*) *…
-
0
votes1
answer53
viewsA: How to calculate the amount of connections on a Deep Network?
If you write this as matrix multiplication - in my view - it is easier to calculate the number of parameters (or if you want to call connections) of your network. Suppose your X input is a Matrix…
-
5
votes2
answers349
viewsA: How to insert missing dates into a data frame?
One possible way is to create a data.frame with all possible dates: library(lubridate) library(dplyr) all_dates <- data_frame( date = seq(from = ymd("1968-01-01"), to = ymd("2018-01-01"), by = "1…
ranswered Daniel Falbel 12,504 -
9
votes1
answer213
viewsA: Computational efficiency in R - lists or vectors
To evaluate code speed, it is very important to fully isolate the problems. In your case, you are measuring the time of two operations: Create the matrix with random values with 1000 rows and 200…
-
3
votes1
answer1864
viewsA: Optimal separation of a data set in: Training, Validation and Testing
In general we randomly separated 70% for training, 15% validation and 15% tests... But this varies a lot and can depend on the problem, for example when there is a time factor, we cannot randomly…
machine-learninganswered Daniel Falbel 12,504 -
5
votes2
answers331
viewsA: Filter Different texts at different positions in R
I would do so: library(stringr) library(dplyr) dados %>% filter(str_extract(NOME, "\\d{1,}") %in% 1001:1019) The function str_extract extracts a pattern from a string using regex. In this case,…
-
5
votes2
answers530
viewsA: Filter Different Texts in R
The logical operator %in% is very useful in these situations to avoid you write several comparisons with | (or). Example using dplyr: library(dplyr) dados <- dados %>% filter(NOME %in%…
-
1
votes3
answers2670
viewsA: Know the frequency of words
The package tokenizers helps to do this in a very easy way! Example: library(tokenizers) tokenize_words(linhas, lowercase = FALSE) %>% unlist() %>% table() %>% sort(decreasing = TRUE) The…
ranswered Daniel Falbel 12,504 -
1
votes1
answer184
viewsA: Recommendation system based on previously registered data
Are you sure your problem is one of recommendation? It fits into problems of recommendation those where the number of items that can be recommended is very large and so cannot be solved in a…
machine-learninganswered Daniel Falbel 12,504 -
1
votes2
answers508
viewsA: Most Important Attributes in Random Forest Classifier
That paper proposes a methodology to analyze the predictions of this type of algorithm. Fortunately there is this python project implementing the methodology. At this link has a tutorial of use…
-
6
votes3
answers3012
viewsA: Aggregate function on R
I suggest you use the package dplyr to do this kind of operation. Here’s an example of usage that would solve your problem: library(dplyr) x <- mtcars %>% group_by(cyl, vs, am) %>%…
-
3
votes2
answers899
viewsA: Renaming the levels of a factor based on a data frame
I would make a left_join and then delete the variable. For example: > library(dplyr) > flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), + Nome=c("Flor 1", "Flor 2",…
-
3
votes3
answers5016
viewsA: Replace NA in R language
The dplyr has a function called coalesce which serves exactly for this. In your case, you could use: library(dplyr) dat$TIPO <- coalesce(dat$TIPO, "TESTE")…
ranswered Daniel Falbel 12,504 -
3
votes1
answer436
viewsA: In R, how to transform Tibble into dataframe
Use the function unnest of tidyr: Example: > library(dplyr) > library(purrr) > library(tidyr) > > df <- tibble(x = 1:10, y = 1:10) %>% + mutate(z = map2(x, y, ~data.frame(a = .x…
-
6
votes1
answer230
viewsA: In R, a function that reads only a few columns of a dataframe in Rda format
A good solution is to use the package fst. Note that it is not ideal for early storage since it is still in intense development. According to README it compresses as well as the saveRDS, is faster…
-
2
votes2
answers195
viewsA: Create a calendar dimension with the month before Sys.date.
You can do it like this: library(lubridate) library(dplyr) data_frame( data = seq(ymd("2015-01-01"), (today() - day(today()) + 1 - months(1)), by = "month"), ano = year(data), mes = month(data) )…
-
1
votes1
answer67
viewsA: How do I eliminate the first variable for testing with the t-1 variable?
Try the function lag package dplyr. She has an argument default which can be used to complete the previous values. In this case I used NA. library(dplyr) df <- data.frame( dia = seq(from =…
ranswered Daniel Falbel 12,504 -
2
votes2
answers830
viewsA: Index searches on an R vector
One way is to take advantage of recycling which happens automatically on the R. Example: x <- 1:15 x[c(TRUE, FALSE)] # retorna os ímpares x[c(FALSE, TRUE)] # retorna os pares Remembering that…
-
1
votes1
answer67
viewsA: What to do after preparing a Model?
There are some ways, I would say that the following two are the most used. 1) API: There are several examples of API’s that provide model drainage (see: https://cloud.google.com/vision/). Create an…
-
1
votes3
answers94
viewsA: Find exponent of the data-fit equation, R
I would solve using the function optim which serves to do arbitrary optimizations, date a loss function with respect to some parameters. Here is an example:…
ranswered Daniel Falbel 12,504 -
2
votes1
answer162
viewsA: different results using rpart and Caret
The caret by default does tuning of some hyperparameters of each model. He tries to do this in a clever way, but that is not always the right one for your problem. Already the rpart adjusts the…
-
3
votes2
answers1047
viewsA: Presentation of disproportionate rmarkdown chart
You control this in the parameters of Chunk rmarkdown: ```{r fig.width = 7, fig.height = 7} # código do seu gráfico ``` 7 is the pattern of height and width, goes moving until you think it’s good.…
-
3
votes1
answer197
viewsA: Integration between R and HTML
You can make good dashboards with the package flexdashboard. I’ll put an example below, but of course you can do much more complex things, for example: https://gallery.shinyapps.io/cran-gauge/…
-
2
votes2
answers85
viewsA: Multiple Gather with 4 resulting "joint" columns
Here is a solution: library(tidyverse) x <- data.frame( NOME = c("batata", "maça"), A1 = c(6, 9), A2 = c(4, 4), A3 = c(7, 8), B1 = c(2, 1), B2 = c(1, 2), B3 = c(1, 0) ) x %>% gather(keyA,…
ranswered Daniel Falbel 12,504 -
15
votes1
answer1699
viewsA: How does a CAPTCHA work?
CAPTCHA MEANS Completely Automated Public Turing test to Tell Computers and Humans Apart, or fully automated public testing to differentiate computers from humans. In general, CAPTCHAS are made so…
captchaanswered Daniel Falbel 12,504 -
4
votes1
answer158
viewsA: How to use deep-Learning to parse forms with addresses?
This question is cool, but the answer would almost be a great work project. With deep Learning it is possible to solve your problem. With javascript, I do not know how to answer. I will give an…
-
11
votes2
answers390
viewsA: What is deep learning?
Deep Learning are artificial neural networks with many layers. These layers when cleverly combined resulted in great advances in the area of artificial intelligence. Usually in machine learning…
-
4
votes2
answers291
viewsA: Selecting part of a data frame and saving in loop
I would do something like this, in the example considering the data frame. mtcars which is already available on R. The walk is a kind of loop that returns no results. With this code I am asking that…
ranswered Daniel Falbel 12,504 -
2
votes1
answer119
viewsQ: Difference between maxiter and maxfun parameters in function fmin_l_bfgs_b
In the help of function: maxfun : int Maximum number of function evaluations. maxiter : int Maximum number of iterations. But what is the maximum number of iterations and the number of function…
-
5
votes1
answer1978
viewsA: Problem with forecast in R
The error is saying the problem: the variables t and t2 have different lengths of their variable response. See the error that appears in this very simple code where the variables have different…
-
1
votes2
answers102
viewsA: Factor Column for Date
I would do so: library(lubridate) dmy(as.character(x$V1)) Note that as the V1 column is loaded as factor in R, I needed to convert to character first, ideally you would have read your basis with the…
ranswered Daniel Falbel 12,504 -
3
votes2
answers1348
viewsA: Compare vector elements in R of different sizes
Another way is to use the operator %in%: > a %in% b [1] FALSE FALSE FALSE FALSE TRUE The operator %in% returns a TRUE or FALSE vector of the same size as the left vector. TRUE indicates that the…
-
1
votes3
answers202
viewsA: How to assign NA as value?
You can use the ifelse also: enem$TP_COR_RACA <- ifelse(enem$TP_COR_RACA == "Nao", NA, enem$TP_COR_RACA) The ifelse takes three arguments: A comparison: enem$TP_COR_RACA == "Nao" Result if…
ranswered Daniel Falbel 12,504 -
4
votes1
answer51
viewsA: How to return cells based on identifier in R
A relatively simple way to do this is by using the dplyr: tab <- tab %>% group_by(id) %>% summarise(clas = paste0(clas, collapse = "")) Upshot: > tab # A tibble: 4 × 2 id clas…
ranswered Daniel Falbel 12,504 -
2
votes2
answers800
viewsA: How to convert numbers into categories in R
Has the function recode of dplyr. Using the same example as @Carlos Cinelli set.seed(10) exemplo_numero <- sample(0:5, 10, replace = TRUE) library(dplyr) recode(exemplo_numero, `0` = "branco",…
ranswered Daniel Falbel 12,504 -
5
votes1
answer708
viewsA: In R, when does a vector become "Too long"?
This has to do with the R source code. See that function : is defined in C here. There, you may find that this error appears in this condition: double r = fabs(n2 - n1); if(r >= R_XLEN_T_MAX)…
-
1
votes1
answer222
viewsA: Improve performance for predictive model creation
You may have a number of reasons for slowing down: Slow algorithm. O randomForest is not the fastest package: try to use ranger or Rborist. Source. xgboost also is fast for damn and making some…
-
3
votes1
answer375
viewsA: How to define initial kicks for nls function for potential regression model?
That answer Cross-Validated seems a good solution. The suggestion here is to take the log on both sides and adjust a linear model. That would look something like this: y = b*(x^a) log(y) = log(b) +…
ranswered Daniel Falbel 12,504