Instantiate a Map object only when necessary based on a regex

Asked

Viewed 193 times

2

I have a Factory what instance the PARSER responsible for extracting information from a particular invoice format. To determine the invoice format I use regex. I loop every regex added to Map and check if the regex is found within the text of that invoice, if it is found, then PARSER responsible for extracting data from this invoice.

For that, I created one Map of Class and String, where String is regex. It was necessary to use Class because I cannot instantiate this PARSER before knowing if it really is he who will be the PARSER responsible because there are some logics in each constructor PARSER.

public class InvoiceParserFactory
{
    private static final Logger logger = LoggerFactory.getLogger(InvoiceParserFactory.class);

    private static Map<Class<? extends InvoiceParser>, String> map = new HashMap<>();

    static {
        map.put(xxx_PARSER.class, "regex_xxx");
        map.put(yyy_PARSER.class, "regex_yyy");
        map.put(zzz_PARSER.class, "regex_zzz");
        //...
    }

    public static InvoiceParser getParser(InvoicePdfReader reader)
    {
        String invoiceText = reader.getText();

        for (Map.Entry<Class<? extends InvoiceParser>, String> entry : map.entrySet()) {
            Class<?> clazz = entry.getKey();
            String regex = entry.getValue();

            Pattern p = Pattern.compile(regex);
            Matcher m = p.matcher(invoiceText);

            if (m.find()) {
                try {
                    Constructor<?> constructor = clazz.getConstructor(String.class);
                    return (InvoiceParser) constructor.newInstance(invoiceText);
                } catch (Exception e) {
                    logger.error("Can't instantiate parser.", e);
                }
            }
        }

        throw new InvoiceException("Can't find parser.");
    }
}

Doubt

Is this the best practice for this purpose? Particularly I found a gambiarra. Any suggestions to improve/refactor the code? Maybe using Java 8 Features?

Problem

In the future, I will need beyond regex, to perform some other condition, then I would have to completely refactor this Factory to support that besides a regex, it is possible to use other logics to instantiate the PARSER correct or something similar to this. What is the best practice in this case?

  • 2

    Personally, I didn’t think it was a scam. Of course it has limitations and is not the most beautiful code we have seen, but I believe you should think about these limitations when they really bother you (as in the case you mentioned). Let’s wait to see if someone proposes a better idea. For now I am thinking here of an alternative.

1 answer

2

I don’t think gambiarra, although I have my personal opinion as to the use of Reflection in production (which should be moderate, which in your case I think is). But this is just my opinion. Let’s go to the code:

There is a clear limitation in the code, which is the fact that all parsers are required to have a constructor that receives a String ('cause you’re using clazz.getConstructor(String.class)). If you take this care, there will be no problem.

If in the future you have other logics besides regex to build the parser, I suggest you create one or more classes that encapsulate this logic:

public class LogicaCriacaoParser {
    public boolean condicoesSatisfeitas() {
        // retorna "true" se todas as condições para criar o parser são satisfeitas
        // pode ser a regex, e o que mais você precisar
    }
}

In this case your map would have instances of LogicaCriacaoParser (or her subclasses, if you prefer):

private static Map<Class<? extends InvoiceParser>, LogicaCriacaoParser> map = new HashMap<>();

static {
    map.put(xxx_PARSER.class, new LogicaCriacaoParser());
    map.put(yyy_PARSER.class, new Subclasse1LogicaCriacaoParser());
    map.put(zzz_PARSER.class, new Subclasse2LogicaCriacaoParser());
    //...
}

Thus, in each subclass of LogicaCriacaoParser you could implement your specific logic to create each parser (within the method condicoesSatisfeitas()), can only be regex, or regex + the criteria you need.

The positive point is that you can create the logic you want within these classes (and your subclasses), and your model is not stuck to a single logic, such as regex.

To use, it would be something like:

LogicaCriacaoParser logica = entry.getValue();

if (logica.condicoesSatisfeitas()) {
    // cria o parser
}

Alternative (without map, nor Reflection)

In fact you wouldn’t even need to have the map, and it’s also possible to eliminate the use of Reflection.

The class itself LogicaCriacaoParser could have a method that returns the specific parser if the condition is satisfied. If you prefer, you can leave the class abstract and let each subclass handle your particular case:

public abstract class LogicaCriacaoParser {
    // retorna "true" se todas as condições para criar o parser são satisfeitas
    // pode ser a regex, e o que mais você precisar
    protected abstract boolean condicoesSatisfeitas();

    // possui a lógica para criar o parser
    protected abstract InvoiceParser criarParser();

    // cria o parser se as condições forem satisfeitas
    public InvoiceParser getParser() {
        if (condicoesSatisfeitas()) {
             return criarParser();
        }
        // retorna null se as condições não forem satisfeitas
        return null;
    }
}

Note that only the method getParser() is public because it is the only one that other classes need to know. With this, you need to have subclasses that know how to create a specific parser:

// logica para criar XXX_Parser
public class LogicaCriacaoXXX_Parser extends LogicaCriacaoParser {
    protected boolean condicoesSatisfeitas() {
        // verifica as regras para criar o XXX_Parser (regex, etc)
    }

    protected InvoiceParser criarParser() {
        // retorna o XXX_Parser (usando new ao invés de reflection)
    }
}

Thus, you create a subclass for each different parser (yyy, zzz, etc), each implementing its specific logic.

Then, you would only have a list with all the logic of parser creation:

List<LogicaCriacaoParser> logicas = new ArrayList<>();
logicas.add(new LogicaCriacaoXXX_Parser());
logicas.add(new LogicaCriacaoYYY_Parser());
// etc.

And to create a parser, you loop this list, until you find some parser that meets the conditions:

for (LogicaCriacaoParser logica : logicas) {
    InvoiceParser parser = logica.getParser();
    if (parser != null) {
        return parser;
    }
}

// se não encontrar nenhum, lança o InvoiceException

Of course there are some improvements, such as changing the constructors of each logic class to receive parameters, the methods themselves to check the conditions and create the parser can also receive parameters (which could be used in the parser construction, for example), etc.

But at least you’re not using Reflection (which by the way you didn’t feel comfortable using, and I understand you), and you’re not dependent on every subclass of parser having a specific constructor that gets a String (since each logic subclass can create the parser using new).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.