Refactor code using Factory Pattern without using the if-elseif condition

Asked

Viewed 253 times

2

I got the following Factory. In it I urge a class responsible for parsing the file in question. For this parser to be instantiated, it is first verified in the conditions whether that parser is correct to perform the processing.

public class InvoiceParserFactory
{
    public static InvoiceParser getParser(InvoicePdfReader reader)
    {
        String invoiceText = reader.getText();

        if (isXxxParser(invoiceText)) {
            return new XXX_PARSER(invoiceText);
        }
        else if (isYyyParser(invoiceText)) {
            return new YYY_PARSER(invoiceText);
        }
        else {
            throw new InvoiceException("Can't find parser.");
        }
    }

    private static boolean is_XXX_PARSER(String invoiceText) {
        Pattern p = Pattern.compile("xxx_regex");
        Matcher m = p.matcher(invoiceText);

        //as condições aqui são diferentes, portanto não da pra simplificar esses métodos
    }

    private static boolean is_YYY_PARSER(String invoiceText) {
        Pattern p = Pattern.compile("yyy_regex");
        Matcher m = p.matcher(invoiceText);

        //as condições aqui são diferentes, portanto não da pra simplificar esses métodos
    }
}

Here an example of a PARSER:

public final class XXX_PARSER extends InvoiceParser
{
    @Override
    protected void parseSomeText() {
        //implementação
    }
}

This logic does not scale in this Pattern design, since I will have hundreds or even thousands of PARSERS. I saw in some examples that it would be interesting to put all the PARSERS in a Collection and implement a statistical method in each PARSER to check whether the PARSER in question is what needs to be instantiated. Only there I would come across the need to instantiate the PARSER before I even use it, 'cause I’d have to use maybe Set<InvoiceParser> and then set.add(new XXX_PARSER) and so on, only if I instantiate this way into a Collection I break all my algorithm which is based on the constructor of the Invoiceparser class, which is the parent of all PARSERS.

What would be the best way to refactor this Factory to meet my need?

  • 2

    If you keep a set of the kind Class<?> do not need to instantiate the various parsers beforehand. You can even save a map with the class and its regex, and if any of the elements play, instantiate the parser by reflection.

  • Which version of Java are you using?

  • I am using Java 8

3 answers

2


You could use a enum with one element for each type of parser. In it you would have the way parser is created and the Pattern corresponding. Example:

Class InvoiceParser:

private abstract class InvoiceParser {/*métodos da classe*/}

Two implementations as an example:

private class XxxParser extends InvoiceParser {
    public XxxParser(String text) {}
}

And:

private class YyyParser extends InvoiceParser {
    public YyyParser(String text) {}
}

To enum with a Factory method who finds the parser suitable:

public enum Parser {
    //O primeiro parâmetro é a regex
    //o segundo é a forma como se cria o parser
    XXX_PARSER("xxx_regex", XxxParser::new),

    YYY_PARSER("yyy_regex", YyyParser::new);

    //interface com um método para criar o parser
    //permite o uso de lambda e method references
    @FunctionalInterface
    private static interface ParseCreator {
        InvoiceParser create(String text);
    }

    //cache interno
    private static final Parser[] VALUES = Parser.values(); 

    private final Pattern pattern;

    private final ParseCreator creator;

    private Parser(String regex, ParseCreator creator) {
        this.pattern = Pattern.compile(regex);
        this.creator = creator;
    }

    //factory method que encontra o parser adequado ao reader
    public static final InvoiceParser forReader(InvoicePdfReader reader) {
        for(Parser parser : VALUES) {
            if(parser.pattern.matcher(reader.getText()).find()) {
                return parser.creator.create(reader.getText());
            }
        }
        throw new IllegalArgumentException();
    }
}

Case to regex is not the only criterion used to identify the appropriate parser, you could add to enum an object responsible for verifying those additional criteria specific to each parser, in that case the enum would look that way:

public enum Parser {
    //o primeiro parâmetro é a regex
    //o segundo é a forma como se cria o parser
    //o terceiro são os critérios adicionais
    XXX_PARSER("xxx_regex", XxxParser::new, reader -> {
        //suas condições adicionais
        return true;
    }),

    YYY_PARSER("yyy_regex", YyyParser::new, reader-> {
        //suas condições adicionais
        return true;
    });

    //interface com um método para criar o parser
    //permite o uso de lambda e method references
    @FunctionalInterface
    private static interface ParseCreator {
        InvoiceParser create(String text);
    }

    //interface com um método para verificar se o parser é o adequado
    //permite o uso de lambda e method references
    @FunctionalInterface
    private static interface Parseable {
        boolean canParse(InvoicePdfReader reader);
    }

    //cache interno
    private static final Parser[] VALUES = Parser.values();

    private final Pattern pattern;

    private final ParseCreator creator;

    private final Parseable parseable;

    private Parser(String regex, ParseCreator creator, Parseable parseable) {
        this.pattern = Pattern.compile(regex);
        this.creator = creator;
        this.parseable = parseable;
    }

    //factory method que encontra o parser adequado ao reader
    public static final InvoiceParser forReader(InvoicePdfReader reader) {
        for(Parser parser : VALUES) {
            if(!parser.pattern.matcher(reader.getText()).find()) {
                continue;
            }
            //verifica os critérios específicos do parser
            if(parser.parseable.canParse(reader)) {
                return parser.creator.create(reader.getText());
            }
        }
        throw new IllegalArgumentException();
    }
}

To find the suitable parser just do:

InvoiceParser parser = Parser.forReader(reader);
  • When using Xxxparser::new is the parser instantiated immediately? That is, all parsers placed on Enum are pre-instantiated? If yes, that would be a problem because I would have to take logic from the parsers' constructor

  • 1

    @Gustavopiucco No. Every time you evoke the method create on the line return parser.creator.create(reader.getText()); he will create a new parser.

  • It would not be a "best practice" to use a static Map with all parsers instead of Enum?

  • 1

    @Gustavopiucco I don’t think there is a "best practice" for this case. It is simply a matter of using the best solution to the problem. If you think you use one Map will be the best, whether by an increase in performance, improvement in readability or even maintain some consistency in the code (if, for example, you have solved problems similar to a Map) go ahead and use, no problem at all.

  • 1

    @Gustavopiucco Only some adaptations would be necessary (the creation of a new class to group some of the values), since in the second enum, 3 parameters are used in the constructor, while a Map only accepts 2 values (key and value).

1

You can use a ServiceLoader. This is the mechanism that was introduced in Java 6 to solve problems exactly of this type that you have. In your case, the InvoiceParserFactory corresponds to a service. The implementations of this service are spread across the classpath/modulepath in a lot of places. The function of the ServiceLoader is to find all implementations, instantiate them and give the result.

First, let’s stir up your class InvoiceParserFactory (which has become an interface):

package com.example.invoice;

import java.util.ServiceLoader;

public interface InvoiceParserFactory {
    public boolean acceptsText(String text);

    public default boolean acceptsText(InvoicePdfReader reader) {
        return acceptsText(reader.getText());
    }

    public InvoiceParser newParser(String text);

    public default InvoiceParser newParser(InvoicePdfReader reader) {
        return newParser(reader.getText());
    }

    public static InvoiceParser getParser(InvoicePdfReader reader) {
        ServiceLoader<InvoiceParserFactory> loader =
                ServiceLoader.load(InvoiceParserFactory.class);

        for (InvoiceParserFactory factory : loader) {
            if (factory.acceptsText(reader)) {
                return factory.newParser(reader);
            }
        }

        throw new InvoiceException("Can't find parser.");
    }
}

This interface takes advantage of the fact that from Java 8, interfaces may have static methods or with implementations default. From Java 9, private methods are also allowed on interfaces.

Let’s assume that this interface is inside the JAR invoice-api.jar.

So in the file xxx.jar, you put that class:

package com.example.xxx.invoice;

import com.example.invoice.InvoiceParserFactory;

public class XxxInvoiceParserFactory implements InvoiceParserFactory {
    // ...

    public XxxInvoiceParserFactory() {
    }

    @Override
    public boolean acceptsText(String text) {
        Pattern p = Pattern.compile("xxx_regex");
        Matcher m = p.matcher(invoiceText);
        // ...        
    }

    @Override
    public InvoiceParser newParser(String text) {
        return new XxxParser(text);
    }
}

Within the xxx.jar, you place a file with the same name as the class/interface of the service offered (not the implementation) inside a folder called services inside the briefcase META-INF. For example, in the above case, the file would be called META-INF/services/com.example.invoice.InvoiceParser. Inside this file you put the full name of the implement class:

com.example.xxx.invoice.XxxInvoiceParserFactory

Let’s assume that the JAR yyy.jar has two distinct implementations of InvoiceParser, a call YyyBlueInvoiceParser and the other YyyRedInvoiceParser. In this case in the file META-INF/services/com.example.invoice.InvoiceParser inside that other JAR you put this:

com.example.yyy.invoice.YyyBlueInvoiceParserFactory
com.example.yyy.invoice.YyyRedInvoiceParserFactory

That is, what happens is that each JAR has inside the folder META-INF/services, files with the names of the services offered, within each of these files, is the name of the existing implementations for these services.

An app that has xxx.jar, yyy.jar and invoice-api.jar in the classpath you can see these three implementations. To change the implementations is easy, just that each JAR that contains some implementation(s) of your service a(s) declare within the META-INF/services/com.example.invoice.InvoiceParser inside the JAR itself and then just add the JAR to the classpath and the ServiceLoader you will find him.

However, there is a catch. So that the ServiceLoader can locate and instantiate the services, they must be in public classes, have a public constructor without parameters and the corresponding class must be an implementation or subclass of the service offered.

Ideally you should have the subclasses of InvoiceParser as a service rather than implementations of InvoiceParserFactory. However, to achieve this, you need to refactor your subclasses from InvoiceParser in a way that can get rid of the need to have PDF as a parameter in order to have a constructor without parameters.

There is another way to offer services in modular Jars (Java 9+) that are in modulepath instead of classpath, but this is a bit more complicated. The class javadoc ServiceLoader describes it in detail.

I also talked a little about the ServiceLoader in that other answer of mine.

  • It would not be the Serviceloader I need, I believe it is much simpler than you just imagined. The regex serves to identify the invoice format/type, and after identifying this format, instantiate the responsible data extraction parser for that invoice. My problem was just solving the huge amount of if-elseif to instantiate those parsers that lie in the same JAR. Because for every parser, there’s a regex that identifies that that parser is the one that should be used, you know? Anyway this answer helped me to solve a future problem.

0

You can use the library Reflections and then do this:

public class InvoiceParserFactory {

    private static final Set<Class<? extends InvoiceParser>> SUB_TYPES;

    static {
        Reflections reflections = new Reflections("com.example.meupacote");
        SUB_TYPES = reflections.getSubTypesOf(InvoiceParser.class);
    }

    public static InvoiceParser getParser(InvoicePdfReader reader) {
        String invoiceText = reader.getText();

        for (Class<? extends InvoiceParser> k : SUB_TYPES) {
            try {
                return k.getDeclaredConstructor(String.class).newInstance(invoiceText);
            } catch (Exception e) {
                // Engole e pula.
            }
        }
        throw new InvoiceException("Can't find parser.");
    }
}

In each implementation of InvoiceParser, you must place a constructor that throws an exception if the String passed is not compatible with it:

public class XxxInvoiceParser extends InvoiceParser {

    // implementação...

    public XxxInvoiceParser(String invoiceText) {
        if (!isXxx(invoiceText)) throw new IllegalArgumentException();
        // implementação...
    }

    @Override
    protected void parseSomeText() {
        // implementação...
    }

    private static boolean isXxx(String invoiceText) {
        Pattern p = Pattern.compile("xxx_regex");
        Matcher m = p.matcher(invoiceText);

        // implementação...
    }
}
  • Will this code take all implementations of Invoiceparser within the package with.example.meupacote automatically? Reflection compared to the @Felipe Marinho example, lost in performance? At scale (many implementations of Invoiceparser) would be a problem?

  • 1

    Yes. In packages and in any subpackages. I don’t know about performance, but one of the purposes of having the specified package is exactly that. However, eliminating this cost of performance is easy (I changed the answer to that), because the set of classes only needs to be searched once.

  • 1

    @Gustavopiucco There is still a bottleneck that he has to test all implementations one by one until he finds the one that is suitable. If these tests are heavy, finding the correct implementation is complicated. However, both the code of your question and all the answers described here have this same problem. The solution is for implementations to reject a PDF that does not belong to them as soon as possible through very simple and obvious tests and with that, even with thousands of different implementations, the impact is small.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.