How to create a regex to filter and delete files with a particular chunk in the name

Asked

Viewed 1,038 times

3

I’m trying to figure out a way to delete files that windows duplicates when making multiple copies. I was able to do something by creating the code below:

import java.util.*;
import java.io.*;


public class FileCreator{

    public static void main(String[] args) throws Exception{

        File f = f = new File(".");
        File[] files = f.listFiles();

        for(File fl : files){
            String fileName = fl.getName();
            if(fileName.contains("Copia - Copia")){
                    System.out.println(fileName);
            }

        }
    }
}

I created some files, as follows the print below:

inserir a descrição da imagem aqui

And the mourning was:

C:\Users\diego\Desktop\Nova pasta>java FileCreator
File 0 - Copia - Copia.txt
File 10 - Copia - Copia.txt
File 12 - Copia - Copia.txt
File 14 - Copia - Copia.txt
File 16 - Copia - Copia.txt
File 18 - Copia - Copia.txt
File 2 - Copia - Copia.txt
File 4 - Copia - Copia.txt
File 6 - Copia - Copia.txt
File 8 - Copia - Copia.txt

This form even suits me, because I just replace the text output of the condition within the loop by a simple fl.delete(); but would like to have more control over what is excluded, using a regex.

I started to do something according to below, but I could not create a regex that can detect the "Copia - Copia" exactly at the end of the file name, and then delete it.

    Pattern p = Pattern.compile("");
    Matcher m;

    f = new File(".");
    File[] files = f.listFiles();

    for(File fl : files){
        String fileName = fl.getName();
        m = p.matcher(fileName);
        if(m.find()){
            //fl.delete();
            System.out.println(fileName + " deletado");
        }
    }

How do I make a regex that fills these aquivos?

Note: detecting the extension is irrelevant, I just need to detect the Copia - Copia which is like windows renames duplicates of duplicates, adding at the end of the file name.

If possible, I would like to understand the functioning of the expression as well

  • Because you don’t: filename.split(".")[0]. substring(filename.length-13, filename.length) == "Copy - Copy"

  • I found the expression here: http://stackoverflow.com/questions/406230/regular-expression-to-match-line-that-doesnt-contain-a-word

  • @Douglas intended to learn using regex because I also have an automated program here, where I just add a regex and set up some rules in the window and he does the rest. In string I know how to do but I think the folder where I will run maybe I don’t have permission to run(work network folder :/)

  • @diegofm see if the explanation is clear in http://answall.com/a/167965/3635

4 answers

2


The regex can be like this Copia - Copia.[^.]+$

Explanation:

Copia - Copia\\.[^.]+$
^                ^   ^
1                2   3
  1. The Copia - Copia\\. is the party you wish to find

  2. [^.] the sign of ^ if you’re inside [...] indicates negation, that is any character within [^....] will be ignored in match, then after the points I used it so that anything can be the file extension, except another point.

  3. The $ is what defines the file name (the String) should end exactly as it comes before, in case it should end with Copia - Copia.[qualquer extensão]

Alternatively you can use C[oó]pia - C[oó]pia\\.[^.]+$ there are situations with accents and without accents, note that it varies if it is Unicode

The use would look something like final Pattern regex = Pattern.compile("C[oó]pia - C[oó]pia\\.[^.]+$");

An example with List<String> to test:

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

class Exemplo
{
    public static void main(String[] args)
    {
        final Pattern regex = Pattern.compile("Copia - Copia\\.[^.]+$");

        List<String> files = new ArrayList<String>();

        files.add("File 123 - Copia.txt");
        files.add("File 10 - Copia - Copia.java");
        files.add("File 12 - Copia.java");
        files.add("File 14 - Copia - Copia.txt");
        files.add("File 16 - Copia.txt");
        files.add("File 18 - Copia - Copia.log");
        files.add("File 2 - Copia.txt");
        files.add("File 4 - Copia.log");
        files.add("File 6 - Copia - Copia.txt");
        files.add("File 8 - Copia.txt");

        for (String file : files)
        {
            if (regex.matcher(file).find())
            {
                System.out.println("Encontrado: " + file);
            }
        }
    }
}

Example in http://ideone.com/Ph7CJC


It is also possible to use String.matches, but with it will need to add .* at the front, because for some reason he ignores it if it is not done, so .*Copia - Copia\\.[^.]+$. However, as @Victorstafusa said, maybe this might compromise performance a little, depending on how many times you will perform (I haven’t been able to confirm yet)

Explanation:

.*Copia - Copia\\.[^.]+$
^  ^               ^   ^
1  2               3   4

It would look something like:

for (String file : files)
{
    if (file.matches("Copia - Copia\\.[^.]+$"))
    {
        System.out.println("Encontrado: " + file);
    }
}
  1. .* finds any (group of) characters(s) that come before the desired text

  2. The Copia - Copia\\. is the party you wish to find

  3. [^.] the sign of ^ if you’re inside [...] indicates negation, that is any character within [^....] will be ignored in match, so after the points I used it so that anything can be the file extension, except another point.

  4. The $ is what defines the file name (the String) should end exactly as it comes before, in case it should end with [qualquer caractere]Copia - Copia.[qualquer extensão]

Alternatively you can use .*C[oó]pia - C[oó]pia\\.[^.]+$ there are situations with accents and without accents, note that it varies if it is Unicode

The use would look something like if (file.matches("C[oó]pia - C[oó]pia\\.[^.]+$")) {

  • 1

    In fact, String.matches uses Pattern under the covers(1, 2 and 3). When using the Pattern directly, it will only be compiled once. With the match within a loop, it be (re)compiled several times, having a lower performance.

  • 1

    @Victorstafusa I will adjust, I believed that Java as well as other languages generated a cache of Pattern (an example is PHP), but because of the doubts (question of version x compatibility) I followed its tips, edited.

2

You can use the following regular expression:

Pattern.compile("Copia - Copia\\.[a-zA-Z]{3,4}$");

Where:

  • Copia - Copia is the text you are looking for;
  • \\. is the literal character .. Normal would just be \ but how the expression is in a string we have to escape it once;
  • [a-zA-Z] delimits that the character must be between a and z or A and Z;
  • {3, 4} is related to the number of characters, which should be 3 or 4;
  • $ means it’s at the end of string;

That is to say:

Search for the text Copia - Copia followed by a ., 3 or 4 letters of a to Z at the end of a string;

  • Instead of [.], use \\.

  • @Victorstafusa quals The difference in this case?

  • Is that the notation starting with \\ is the standard way of doing quoting, while the [] is used to grouping. I mean, while \\. works because it was designed to work like this, [.] works more by luck, coincidence or accident. For example, if it were [-] would no longer work, while \\- would continue to function.

  • @Victorstafusa understood, I will change the answer

1

Use that expression:

(Copia - Copia)

Parentheses define a group of characters to be 'captured' from the string.

Sign in to this site http://regexr.com/3eoeg to see working.

  • Following @Cleitonoliveira’s comment, this regex only worked when using only (Copia - Copia), removing the bars and g.

0

String trechoParaRemover = "(Copia - Copia)";
fileName = filename.replace(trechoParaRemover,"");

Only this would solve your case, I do not believe you need a regular expression or delete().

  • 1

    I guess you didn’t read the statement. I need to identify the pattern(with regex) and delete the file in the folder, not rename the file.

  • I’m sorry I wasn’t clearer. A simple and efficient practice is to rename the file with the result of replacing the string completely. I read that you want to do with regex to dominate it. @Douglas used the best site for this, regexr. In your case, simply put the regex that he indicated on the following line: Pattern p = Pattern.Compile("(Copy - Copy)"); Particularly I prefer to use quotes instead of bars, and "g" is dispensable in your case as there is no need to make a global read, being only a small string to be validated.

  • Pull, it’s true, by removing the bars and the /g a regex works correctly. I am very lay in regex, I thought the /g was to filter something else at the end of the text, living and learning, thanks for the clarification :D

  • @Final diegofm of string is $

  • @Sorack in his case, the string still contains . txt at the end. The $ will not capture the stretch he wants.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.