Using a dataframe values in ifelse

Asked

Viewed 157 times

1

I have the following scenario:

Data frame - Resources:

recursos <- as.data.frame(c("server01","server02","server03"));  
colnames(recursos) <- "Server"

Data frame - Events:

eventos <- as.data.frame(c("falha no server01","erro no server01","falha no server02","erro no server03"));  
colnames(eventos) <- "Eventos";  
eventos <- dplyr::mutate(eventos, Server = "")

My object is popular the Server column in the data frame events, using as a basis the servers mapped in the Resource data frame. In the solution below, I can fill in using a unique value. But that’s not what I need yet.

eventos$Server <- ifelse(grepl("server01", eventos$Eventos), "server01", "")

I thought about running a for(server in recursos$Server), changing the grep, for a variable. But I could not do it this way. Someone could help me ?

Thanks in advance.

2 answers

1

If the events contain the correct information about the servers, as in the given example, just use the function regmatches together with the relevant regular expression to extract the substrings of interest.

eventos <- dplyr::mutate(eventos, Server = regmatches(Eventos, regexpr('server[0-9]+', Eventos)))

Although expensive from a computational point of view, you can compare the words in each message of the variable Eventos with the list present at Server

In a row:

eventos <- dplyr::mutate(eventos,
                         Server = unlist(lapply(
                         strsplit(as.character(eventos$Eventos), '\\s'),
                         function(x) x[which(x %in% recursos$Server)])))

Assuming, of course, that the names of the servers on Eventos are always the same and unique in each entry.

  • Erikson, could I replace the regular expression with a column with multiple servers ? I ask this because the name of my servers, are random, some start with s, others with x, others with d. They are several names of servers. All are in the resources$Servers.

  • Vitor, if the last name in the events column is the server name you can use this solution adapted from the Erikson solution dplyr::mutate(eventos, Server = regmatches(Eventos, regexpr('\\w+$', Eventos)))

  • Note that none of the solutions are being analyzed if the server belongs to recursos. This could cost you more processing if you don’t need it.

  • Daniel, the problem is I have 200 different server names. Some starting with X, others with Y, others with B. So I stored the global server list in the dataframe resources, Servers column. The Erikson solution is very good. I will store this code for future needs. But for my scenario, I only need to use this list of servers in regmatches.

  • Unfortunately, it is only possible to use a regular expression in this case. My suggestion would be to create a more generic regular expression, which covers all cases, or use the function strsplit to compare the terms with the list in data.frame. I shall amend the reply to add that suggestion.

  • Erikson, assuming one of the examples in the dataframes are these: events[3,] - > "THE MEMORY HEALTH OF HOST D715WS074 IS NOT OK". resources[7,] -:> "D715WS074". You need to change something in your code ?

  • You do not need to change anything. If there is a possibility of Eventos are not in recursos, it is necessary to foresee a standard case.

Show 2 more comments

0


Updating the solution:

procura <- as.data.frame(sapply(as.character(recursos$Server), function(x) grepl(x, eventos$Eventos)), stringsAsFactors = F)

# Adiciona uma coluna "Missing" para os casos que não tem nenhum `recurso`
procura$Missing <- F
procura$Missing[rowSums(procura)==0] <- T

for(i in 1:nrow(eventos)) {
  eventos$Server[i] <- colnames(procura)[which(procura[i,]==T)]
}

This will work regardless of the number of Servers in the base recursos$Server

  • Thebiro thanks for the help. The only problem is precisely this, your solution perfectly meets my example. But in my real scenario, there are around 200 servers. There is a way to include a for in your solution ?

  • Got it, I’ll rewrite the solution then. I can assume the basis recursos contains all necessary single servers?

  • Yes Thebiro, that’s right. Imagine the resources$Server, has 200 lines with 200 different servers. Random names.

  • Basically, in each event line$Events, I go in the resources column$Server, and see if it contains one of the 200 servers on this line. If I am filling the $Server events column with the identified server name.

  • I’ve updated, run the tests... if recursos$Server have the correct entries, will work

  • Thebiro, returned an error Error in events$Server[i] <- colnames(search)[which(search[i, ] == T)] : Replacement has length zero

  • Strange, I circled in your example without any mistake

  • The difference is that in the dataframe events, includes the actual events that happen in my environment. A list of 200 miscellaneous strings "server failure", "server error", "high server consumption". And in the dataframe features, I expanded the list with all the servers I own.

  • But to function it is necessary that the events in eventos contains the strings of recursos exactly the same

  • Yes, and they are. A real example: events[3,] - > "THE MEMORY HEALTH OF HOST D715WS074 IS NOT OK". resource[7,] -:> "D715WS074"

  • I think what’s happening is that it has values of recurso that do not exist in eventos, but it’s easy to tidy up... I’ll update again

  • Thebiro, surely this can happen. In my server base, there are all the servers I own. And not all of them generate events.

  • It’s pretty ugly but now I think it will!

  • 1

    Thebiro you are my friend guy. It worked perfectly!!! Thank you very much !!!

Show 9 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.