How to view if there are two identical items in Arraylist and remove them?

Question

How to view if there are two identical items in Arraylist and remove them?

Asked 10 years, 9 months ago

Viewed 1,071 times

2

I am doing the "Tokenization" of a TXT file.

I need the code to hold all the tokens in an Arraylist, but can’t get any token duplicate.

I would like to know how to remove tokens duplicates, or checks whether the token already exists and in this case do not add it.

My current code:

for (org.cogroo.text.Token token : sentence.getTokens()) { // lista de tokens

    token.getStart(); token.getEnd(); // caracteres onde o token comeca e termina
    token.getLexeme(); // o texto do token (palavra que ele separa e pega exp: "clinico"
    token.getLemmas(); // um array com os possiveis lemas para o par lexeme+postag
    token.getPOSTag(); // classe morfologica de acordo com o contexto("coloca "prp, adj,n(noun))
    token.getFeatures(); // genero, numero, tempo etc
    contadorTokens++;
    System.out.println(expandirAcronimos(token.getLexeme()) + "_" + token.getPOSTag() + "_" + token.getFeatures());// imprime a palavra com o tag
    gravarArq.println(token.getLexeme() + "_" + token.getPOSTag() + "_" + token.getFeatures());// grava no arquivo txt cada palavra tokenizada
    gravarArquivo.println(token.getPOSTag() + "_" + token.getFeatures());// grava no arquivo "Tokens.txt" cada token

    listaTokens.add(token.getPOSTag()); //ADICIONA as tags para dentro de uma lista 

    for(int s=0;s<listaTokens.size();s++){  //PERCORRE A LISTA
        if (!listaTokens.equals(token.getPOSTag())) {

        }
    }
}

1

mightduck, for your own sanity and to facilitate who will help you, it is essential to make a logical indentation of the code. A good IDE helps in this. . . . Maybe the mgibson answer already solves, but it lacks the closure of the first for...

– brasofilo

2014/11/08 at 15:50

1 answer

Browser other questions tagged java list

You are not signed in. Login or sign up in order to post.

by mgibsonbr • **80,631** points · Answer 1 · 2014-11-08T15:24:24+00:00

To store elements without repetition, the ideal is to use a data type "set" instead of "list". I suggest the HashSet, or maybe the LinkedHashSet if the order of tokens must be preserved:

Set conjuntoTokens = new HashSet(); // Pode ser genérico, i.e. Set<Tipo>

for (org.cogroo.text.Token token : sentence.getTokens()) { // lista de tokens
    ...

    //listaTokens.add(token.getPOSTag()); //ADICIONA as tags para dentro de uma lista 
    boolean mudou = conjuntoTokens.add(token.getPOSTag()); // adiciona as tags no conjunto
                                                           // em vez da lista
    if ( !mudou ) {
        ... // O elemento já existia no conjunto
    }
}

listaTokens.addAll(conjuntoTokens); // adiciona todos os elementos do conjunto na lsta