Visitor pattern - separation of data and behavior

The visitor pattern separates the data from the algorithm. I will show with examples when it is worth using this pattern and when it is not. It is noteworthy that we do not always want such a separation.

Suppose we are writing a module for some text processor. We have the following 3 structures:

struct Element {};
struct Paragraph : Element {
    std::string text;
};
struct Link : Element {
    std::string url;
    std::string text;
};
struct List : Element {
    std::vector<std::string> items;
};

Our program should output the above structures in Markdown and HTML format.

(note, to simplify the examples I use struct everywhere instead of class)

Example 1. One class, several algorithms

If one class needs several behaviors, then we don’t need to use the visitor pattern. Suppose we only have a Paragraph class, and Link and List do not exist. Then a simple polymorphic printer will suffice.

struct Printer {
    virtual void print(const Paragraph& paragraph) = 0;
};
struct HtmlPrinter : Printer {
    void print(const Paragraph& paragraph) override {
        std::cout << "<p>" << paragraph.text << "</p>";
    }
};
struct MarkdownPrinter : Printer {
    void print(const Paragraph& paragraph) override {
        std::cout << paragraph.text << "\n\n";
    }
};

The usage could look like this:

int main() {
    Paragraph paragraph{"Hello, world!"};
    Printer* printer = getPrinter();
    printer->print(paragraph);
}

Let’s not get into how exactly getPrinter works, because it is not relevant to this example. What is important is that this function makes some decision and returns an abstract printer. Thus, we have decoupled the logic of choosing the behavior from the data and from the writing itself.

If we want to add support for the new format, we just need to add a new class inheriting from Printer and implement the print method for the Paragraph type. There is no need for a visitor pattern, because we have full separation of data and behavior. The code is extensible and easy to maintain.

The so-called dynamic dispatch, or simply polymorphism, occurs here. The choice of the appropriate print method is made in run-time.

If we wanted to add support for a new element, such as Link, then a new method will have to be added to the Printer class and to all inheriting classes. But it will not be at all as easy as it seems at first glance. In the next example we will see why.

Example 2. Multiple classes, one algorithm

Now we have several classes, but all of them will write out to the Markdown format.

struct MarkdownPrinter {
    void print(const Paragraph& paragraph) {
        std::cout << paragraph.text << "\n\n";
    }
    void print(const Link& link) {
        std::cout << "[" << link.text << "](" << link.url << ")\n";
    }
    void print(const List& list) {
        for (const auto& item : list.items) {
            std::cout << "- " << item << "\n";
        }
    }
};

This time the usage looks like this:

int main() {
    MarkdownPrinter printer;
    Element* element = getElement();

    // Ouch, that doesn't look good!
    if (auto paragraph = dynamic_cast<Paragraph*>(element)) {
        printer.print(*paragraph);
    } else if (auto link = dynamic_cast<Link*>(element)) {
        printer.print(*link);
    } else if (auto list = dynamic_cast<List*>(element)) {
        printer.print(*list);
    }
}

Here we have the so-called “static dispatch”, that is, the choice of the print method is taken at compile time. But this is possible only after obtaining a specific element through dynamic casting.

If we see this type of casting, it is immediately clear that something is wrong. Seeing such code, some people will think that the print function should be a virtual method in the Element class implemented by Paragraph, Link and List. Then there will be no casting, and polymorphism will take care of things.

struct Element {
    virtual void printInMarkdown() = 0;
};
struct Paragraph : Element {
    std::string text;
    void printInMarkdown() override {
        std::cout << text << "\n\n";
    }
};
struct Link : Element {
    std::string url;
    std::string text;
    void printInMarkdown() override {
        std::cout << "[" << text << "](" << url << ")\n";
    }
};
struct List : Element {
    std::vector<std::string> items;
    void printInMarkdown() override {
        for (const auto& item : items) {
            std::cout << "- " << item << "\n";
        }
    }
};

We no longer have a dedicated printer, and the code looks like this:

int main() {
    Element* element = getElement();
    element->printInMarkdown();
}

And indeed, this is often a sufficient solution, although it has one major drawback: the implementation of writing out to Markdown will now be scattered over several classes, instead of being grouped in one place.

Moreover, the behavior we added to the Element class may not fit there at all. Simple data containers have suddenly become responsible for writing themselves out.

With help comes the visitor pattern, which will allow us to separate data from behavior.

struct Element {
    virtual void print(MarkdownPrinter* printer) override = 0;
};
struct Paragraph : Element {
    std::string text;
    void print(MarkdownPrinter* printer) override {
        printer->print(*this);
    }
};
struct Link : Element {
    std::string url;
    std::string text;
    void print(MarkdownPrinter* printer) override {
        printer->print(*this);
    }
};
struct List : Element {
    std::vector<std::string> items;
    void print(MarkdownPrinter* printer) override {
        printer->print(*this);
    }
};

And now it’s pretty good, because we have full separation of data and behavior. This is exactly what the visitor is used for.

int main() {
    Element* element = getElement();
    MarkdownPrinter printer;
    element->print(&printer);
}

Here we have obtained the so-called “double dispatch”, that is, the selection of the Element::print method is dynamic (polymorphism), and the selection of the MarkdownPrinter::print method is static (function name overloading).

The most serious problem will arise when we want to add support for HTML in addition to Markdown. As in example 1, this is not at all as simple as it may seem. To do so, we will have to modify both all the Element subclasses and add a new HtmlPrinter class (this is understandable). And if we did not use the visitor at all, we will probably get code similar to this:

int main() {
    Element* element = getElement();

    // We are back to the series of conditions again
    if (shouldPrintInMarkdown()) {
        element->printInMarkdown();
    } else if (shouldPrintInHtml()) {
        element->printInHtml();
    }
}

It’s not as obvious as dynamic_cast, but we still get code that is harder to extend than it could be. Adding a new format requires adding the printInXXX method and remembering to add another if in the code above.

Of course, the above examples are very simplified so that they can be understood without much effort. In a large system, the problems I show here have much more serious consequences.

Is it possible to do better? Of course! With help comes an abstract visitor.

Example 3. Multiple classes, multiple algorithms

Let’s get straight to the code:

struct Element {
    virtual void print(Printer*) = 0;
};
struct Paragraph : Element {
    std::string text;
    void print(Printer* printer) override {
        printer->print(*this);
    }
};
struct Link : Element {
    std::string url;
    std::string text;
    void print(Printer* printer) override {
        printer->print(*this);
    }
};
struct List : Element {
    std::vector<std::string> items;
    void print(Printer* printer) override {
        printer->print(*this);
    }
};

We can see that the classes still know that they are to be written out, but the logic that does this is somewhere else.

struct Printer {
    virtual void print(const Paragraph& paragraph) = 0;
    virtual void print(const Link& link) = 0;
    virtual void print(const List& list) = 0;
};
struct HtmlPrinter : Printer {
    void print(const Paragraph& paragraph) override {
        std::cout << "<p>" << paragraph.text << "</p>";
    }
    void print(const Link& link) override {
        std::cout << "<a href=\"" << link.url << "\">" << link.text << "</a>";
    }
    void print(const List& list) override {
        std::cout << "<ul>\n";
        for (const auto& item : list.items) {
            std::cout << "  <li>" << item << "</li>\n";
        }
        std::cout << "</ul>\n";
    }
};
struct MarkdownPrinter : Printer {
    void print(const Paragraph& paragraph) override {
        std::cout << paragraph.text << "\n\n";
    }
    void print(const Link& link) override {
        std::cout << "[" << link.text << "](" << link.url << ")\n";
    }
    void print(const List& list) override {
        for (const auto& item : list.items) {
            std::cout << "- " << item << "\n";
        }
    }
};

We now have two implementations of Printer. Each of them focuses on one format.

It is time to use this code:

int main() {
    Element* element = getElement();
    Printer* printer = getPrinter();

    element->print(printer);
}

Great! I think the above code stands up for itself. We have combined multiple data types with multiple behaviors while maintaining full separation. If we need to tweak something in the HTML syntax, we have one class that takes care of just that - outputting all elements to HTML.

Adding a new format does not require modification of Element.

Adding a new element, besides the trivial implementation of the Element::print method, requires adding a new Printer::print method to each Printer. This is understandable, no one will do it for us. The plus side is that if we forget to add one of the methods, the program compilation will fail. In comparison, earlier we could forget to add another if and the program would run, but it would work incorrectly.

We obtained a combination of advantages from both previous examples.

Finally, let’s map the class and method names to the visitor pattern terminology:

Class/Method Visitor pattern terminology
Printer Visitor
Printer::print() Method visit
Element::print() Method accept

Personally, I’m not a fan of using this terminology in code. An experienced programmer will quickly realize they are dealing with a visitor anyway, and naming functions and classes in a way that is consistent with their role in the code feels more natural to me.