How to create a filter from the words/phrases of interest to filter a particular wave from a "List"?

Asked

Viewed 1,181 times

11

In my example I have two classes that are SetorInteresse and Vaga, below follows the structure of the two:

Setorinteresse class:

public class SetorInteresse {

    private List<String> setores;

    public SetorInteresse(List<String> setores) {
        this.setores = setores;
    }

    public SetorInteresse() { }

    public void addPalavra(String palavra) {  setores.add(palavra); }

    public void removePalavra(String palavra) { setores.remove(palavra); }

    public List<String> getSetores() { return setores; }
}

Vaga Class:

public class Vaga {
    private String tituloVaga;
    private String setor;
    private String funcao;    

    public Vaga(String tituloVaga, String setor, String funcao) {
        this.tituloVaga = tituloVaga;
        this.setor = setor;
        this.funcao = funcao;        
    }

    public Vaga() { }   

    public String getDescricaoVaga() {
        return tituloVaga;
    }

    public void setDescricaoVaga(String tituloVaga) { this.tituloVaga = tituloVaga; }

    public String getSetor() { return setor; }

    public void setSetor(String setor) { this.setor = setor; }

    public String getFuncao() { return funcao; }

    public void setFuncao(String funcao) { this.funcao = funcao; }   
}

Below I have two methods one that populates the variable vagas type List<Vaga> and the other that populates the attribute setores of the object SetorInteresse see:

Method populating the variable vagas:

List<Vaga> vagas = criaVagas();
...
static List<Vaga> criaVagas() { 
    List<Vaga> vagas = new ArrayList<>();
    vagas.add(new Vaga("Desenvolvedor Java", "Tecnologia da Informação", "Desenvolvedor"));       
    vagas.add(new Vaga("Desenvolvedor C# e Web", "Tecnologia da Informação", "Desenvolvedor"));
    vagas.add(new Vaga("Motorista Carreteiro", "Logistica", "Motorista"));       
    vagas.add(new Vaga("Gerente de Sistemas", "Tecnologia da Informação", "Desenvolvedor"));
    vagas.add(new Vaga("Estágiario Tecnologia da Informação", "Tecnologia da Informação", "Estágiario"));       
    vagas.add(new Vaga("Analista de Sistemas", "Tecnologia da Informação", "Analista"));              
    vagas.add(new Vaga("Suporte Técnico", "Suporte", "Suporte"));              
    vagas.add(new Vaga("Gerente Comercial", "Departamento Administrativo", "Gerente"));       
    vagas.add(new Vaga("Assistente de Recursos Humanos", "Recursos Humanos RH", "Aissistente"));       
    return vagas;
}

Method populating the attribute setores:

SetorInteresse setorInteresse = criaSetorInteresse();
...
static SetorInteresse criaSetorInteresse() { 
    SetorInteresse setorInteresse = new SetorInteresse();
    setorInteresse.addPalavra("Desenvolvimento de programas");        
    setorInteresse.addPalavra("Tecnologia da informação e serviços");
    setorInteresse.addPalavra("Análise de sistemas");        
    return setorInteresse;
}

Based on the data that were entered in the two variables vagas and setorInteresse i wonder if there is any way in which I could create a filter that returns me only the list objects vagas where the value of the attribute setor of the object Vaga relates to any of the words or phrase in the attribute setores or if there is any alternative to this?

Example, if I have the following value in my attribute sectors:

Programme development

I would get all type objects Vaga where the value of the attribute setor is related to Desenvolvimento de programas, in this case the vacancies I would receive would be:

Java developer
Developer C# and Web
System Manager
Estágiario Information Technology
Systems Analyst
Technical Support

Thus the vacancies displayed would be according to the interests defined in the attribute setores.

There is a way to create a filter that gives me those results or there is a library that does that for me. And I would also like to know what criteria I should define in the relationship between the words/phrases and how to define them, if necessary?

  • 1

    To search intelligently you will probably have to query by similarity, similar to Google that gives that message: "Did you mean ...". First I’d have to share the word with split and delete words with 2 or 3 characters (from, with, do, in), then you would have to use an algorithm to compare word by word, such as levenshtein or hamming. The words in the sectors would all have to find similar in vacancies, but vacancies may have extra words, I believe.

  • Would the filter receive a text with the supposed sector to be considered or would the sector be selected from predefined options? I ask this because if it is the alternative 2 has a much more elegant way of solving your problem.

  • @Giulianabezerra the intention is to analyze the text and put in a filter after the analysis, however, if it is too complex this analysis I could define a few words for the filter... I’m looking for alternatives :)

  • Are you using Java 8? :P

  • @Giulianabezerra yes, I forgot to mention.

3 answers

6

Since you are using Java 8, you can use the lambda expressions. An example of how it would be possible to use them is:

List<Vaga> vagas = criaVagas();
List<Vaga> vagasFiltradas = vagas.stream().filter(vaga -> vaga.getSetor().contains("Tecnologia")).collect(Collectors.toList());

What the above excerpt does is use a filter that can be defined as you want and it will be applied to all elements of the list! The filter I chose as an example maps a "vague" object from Collection "vacancies" and checks if the sector of that vacancy is associated with a sector that you will use as a filter. The verification seeks if the field sector of the wave has the word "Technology" in any part of the String. I put "Technology" to illustrate and then in this case they would appear:

Desenvolvedor Java
Desenvolvedor C# e Web
Gerente de Sistemas
Estágiario Tecnologia da Informação
Analista de Sistemas

This filter can be more sophisticated, for example by considering other wave information to further limit the results.

3

My orientation is for you to segment your code more by creating 3 classes:

public class Setor {
    private int codigoSetor;
    private String descricaoSetor;


    ...gets and setters
}

public class Funcao {
    private int codigoFuncao;
    private String descricaoFuncao;


    ...gets and setters
}

public class Vaga {
    private String tituloVaga;
    private int codigoSetor;
    private int codigoFuncao;

    public Vaga(String tituloVaga, int setor, int funcao) {
        this.tituloVaga = tituloVaga;
        this.codigoSetor = setor;
        this.codigoFuncao = funcao;        
    }

    ...gets and setters
}

Create function sectors and vacancies with identifier codes:

SetorInteresse setorInteresse = criaSetorInteresse();

static boolean criaSetorInteresse() { 
    SetorInteresse setorInteresse = new SetorInteresse();
    setorInteresse.save(1, "Desenvolvimento de programas");        
    setorInteresse.save(2, "Tecnologia da informação e serviços");
    setorInteresse.save(3, "Logistica");
    ...

    return setorInteresse;
}

static String getSetorInteresse(int codigo) { 
    SetorInteresse setorInteresse = SetorInteresse();
    for (Setor setor : setorInteresse.getFuncoes()) {
        if (setor.getCodigoFuncao == codigo) {
            return setor.getDescricaoFuncao; 
        }
    }
}


FuncaoInteresse funcaoInteresse = criaFuncaoInteresse();

static boolean criaFuncaoInteresse() { 
    FuncaoInteresse funcaoInteresse = FuncaoInteresse();
    funcaoInteresse.save(1, "Desenvolvedor");        
    funcaoInteresse.save(2, "Estágiario");
    funcaoInteresse.save(3, "Motorista");    
    ...

    return funcaoInteresse;
}

static String getFuncao(int codigo) { 
    FuncaoInteresse funcaoInteresse = FuncaoInteresse();
    for (Funcao funcao : funcaoInteresse.getFuncoes()) {
        if (funcao.getCodigoFuncao == codigo) {
            return funcao.getDescricaoFuncao; 
        }
    }
}

List<Vaga> vagas = criaVagas();
...
static List<Vaga> criaVagas() { 
    List<Vaga> vagas = new ArrayList<>();
    vagas.add(new Vaga("Desenvolvedor Java", 2, 1));       
    vagas.add(new Vaga("Desenvolvedor C# e Web", 2, 1));
    vagas.add(new Vaga("Motorista Carreteiro", 3, 3));
    ...

    return vagas;
}

And when you want to use the descriptions use the methods getSetorInteresse and getFuncao passing the codes that were related to the inserted vacancies.

  • I liked the answer, however, if I have to associate code I will have a problem in the implementation, because the data comes from an XML file and I will have to keep checking code associated to a sector or function whenever I add something new. I would really like a way to analyze the words/phrases and see which one matches which.

1


The big problem is the fact that the relationship between the wave and the sector is essentially a semantic validation, something in which computers are not very good. However, you can get some satisfactory results only with symbolic manipulation, something in which computers are excellent.

Assuming the description of a SetorInteresse is at least symbolically similar to a sector in a wave, we could use the distance algorithm of Levenshtein to calculate the similarity between the wave sector and the sector of interest. With this we could define a minimum limit of acceptable similarity and filter the waves using this criterion.

A practical example:

List<Vaga> vagas = criaVagas();

List<Vaga> filtered = vagas.stream().filter(
        v -> StringUtils.getLevenshteinDistance(department, v.getSetor()) <= DISTANCE_THRESHOLD
).collect(Collectors.toList());

If the sector of interest is "Desenvolvimento de programas" then the following vacancies are returned to a tolerance limit 20:

1. Desenvolvedor Java
2. Desenvolvedor C# e Web
3. Gerente de Sistemas
4. Estagiário Tecnologia da Informação 
5. Analista de Sistemas 
6. Gerente Comercial

The problem in trying to solve this problem only at the syntactic level is that some strings may have less distance even if their semantic value has nothing to do with what was expected - as was the case of the Commercial Manager vacancy, whose sector presented less distance among all.

Also, a universal tolerance will not be very efficient. Most likely you will have to define individual tolerances for each department. A possible heuristic is to consider tolerance as a function of the size of the compared strings. For example, the Levenshtein distance cannot be equal to 80 or 90 percent of the size of the largest string.


The method that calculates Levenshtein distance, in addition to other methods for string distance, is available in the Apache Commons Lang library.

compile 'org.apache.commons:commons-lang3:3.4'

Browser other questions tagged

You are not signed in. Login or sign up in order to post.