Visitor pattern - separation of data and behavior
The visitor pattern separates the data from the algorithm. I will show with examples when it is worth using this pattern and when it is not. It is noteworthy that we do not always want such a separation.
Suppose we are writing a module for some text processor. We have the following 3 structures:
struct Element {};
struct Paragraph : Element {
std::string text;
};
struct Link : Element {
std::string url;
std::string text;
};
struct List : Element {
std::vector<std::string> items;
};
Our program should output the above structures in Markdown and HTML format.
(note, to simplify the examples I use struct
everywhere instead of class
)
Example 1. One class, several algorithms
If one class needs several behaviors, then we don’t need to use the visitor pattern. Suppose we only have a Paragraph
class, and Link
and List
do not exist. Then a simple polymorphic printer will suffice.
struct Printer {
virtual void print(const Paragraph& paragraph) = 0;
};
struct HtmlPrinter : Printer {
void print(const Paragraph& paragraph) override {
std::cout << "<p>" << paragraph.text << "</p>";
}
};
struct MarkdownPrinter : Printer {
void print(const Paragraph& paragraph) override {
std::cout << paragraph.text << "\n\n";
}
};
The usage could look like this:
int main() {
Paragraph paragraph{"Hello, world!"};
Printer* printer = getPrinter();
printer->print(paragraph);
}
Let’s not get into how exactly getPrinter
works, because it is not relevant to this example. What is important is that this function makes some decision and returns an abstract printer. Thus, we have decoupled the logic of choosing the behavior from the data and from the writing itself.
If we want to add support for the new format, we just need to add a new class inheriting from Printer
and implement the print
method for the Paragraph
type. There is no need for a visitor pattern, because we have full separation of data and behavior. The code is extensible and easy to maintain.
The so-called dynamic dispatch
, or simply polymorphism, occurs here. The choice of the appropriate print
method is made in run-time.
If we wanted to add support for a new element, such as Link
, then a new method will have to be added to the Printer
class and to all inheriting classes. But it will not be at all as easy as it seems at first glance. In the next example we will see why.
Example 2. Multiple classes, one algorithm
Now we have several classes, but all of them will write out to the Markdown format.
struct MarkdownPrinter {
void print(const Paragraph& paragraph) {
std::cout << paragraph.text << "\n\n";
}
void print(const Link& link) {
std::cout << "[" << link.text << "](" << link.url << ")\n";
}
void print(const List& list) {
for (const auto& item : list.items) {
std::cout << "- " << item << "\n";
}
}
};
This time the usage looks like this:
int main() {
MarkdownPrinter printer;
Element* element = getElement();
// Ouch, that doesn't look good!
if (auto paragraph = dynamic_cast<Paragraph*>(element)) {
printer.print(*paragraph);
} else if (auto link = dynamic_cast<Link*>(element)) {
printer.print(*link);
} else if (auto list = dynamic_cast<List*>(element)) {
printer.print(*list);
}
}
Here we have the so-called “static dispatch”, that is, the choice of the print
method is taken at compile time. But this is possible only after obtaining a specific element through dynamic casting.
If we see this type of casting, it is immediately clear that something is wrong. Seeing such code, some people will think that the print
function should be a virtual method in the Element
class implemented by Paragraph
, Link
and List
. Then there will be no casting, and polymorphism will take care of things.
struct Element {
virtual void printInMarkdown() = 0;
};
struct Paragraph : Element {
std::string text;
void printInMarkdown() override {
std::cout << text << "\n\n";
}
};
struct Link : Element {
std::string url;
std::string text;
void printInMarkdown() override {
std::cout << "[" << text << "](" << url << ")\n";
}
};
struct List : Element {
std::vector<std::string> items;
void printInMarkdown() override {
for (const auto& item : items) {
std::cout << "- " << item << "\n";
}
}
};
We no longer have a dedicated printer, and the code looks like this:
int main() {
Element* element = getElement();
element->printInMarkdown();
}
And indeed, this is often a sufficient solution, although it has one major drawback: the implementation of writing out to Markdown will now be scattered over several classes, instead of being grouped in one place.
Moreover, the behavior we added to the Element
class may not fit there at all. Simple data containers have suddenly become responsible for writing themselves out.
With help comes the visitor pattern, which will allow us to separate data from behavior.
struct Element {
virtual void print(MarkdownPrinter* printer) override = 0;
};
struct Paragraph : Element {
std::string text;
void print(MarkdownPrinter* printer) override {
printer->print(*this);
}
};
struct Link : Element {
std::string url;
std::string text;
void print(MarkdownPrinter* printer) override {
printer->print(*this);
}
};
struct List : Element {
std::vector<std::string> items;
void print(MarkdownPrinter* printer) override {
printer->print(*this);
}
};
And now it’s pretty good, because we have full separation of data and behavior. This is exactly what the visitor is used for.
int main() {
Element* element = getElement();
MarkdownPrinter printer;
element->print(&printer);
}
Here we have obtained the so-called “double dispatch”, that is, the selection of the Element::print
method is dynamic (polymorphism), and the selection of the MarkdownPrinter::print
method is static (function name overloading).
The most serious problem will arise when we want to add support for HTML in addition to Markdown. As in example 1, this is not at all as simple as it may seem. To do so, we will have to modify both all the Element
subclasses and add a new HtmlPrinter
class (this is understandable). And if we did not use the visitor at all, we will probably get code similar to this:
int main() {
Element* element = getElement();
// We are back to the series of conditions again
if (shouldPrintInMarkdown()) {
element->printInMarkdown();
} else if (shouldPrintInHtml()) {
element->printInHtml();
}
}
It’s not as obvious as dynamic_cast
, but we still get code that is harder to extend than it could be. Adding a new format requires adding the printInXXX
method and remembering to add another if in the code above.
Of course, the above examples are very simplified so that they can be understood without much effort. In a large system, the problems I show here have much more serious consequences.
Is it possible to do better? Of course! With help comes an abstract visitor.
Example 3. Multiple classes, multiple algorithms
Let’s get straight to the code:
struct Element {
virtual void print(Printer*) = 0;
};
struct Paragraph : Element {
std::string text;
void print(Printer* printer) override {
printer->print(*this);
}
};
struct Link : Element {
std::string url;
std::string text;
void print(Printer* printer) override {
printer->print(*this);
}
};
struct List : Element {
std::vector<std::string> items;
void print(Printer* printer) override {
printer->print(*this);
}
};
We can see that the classes still know that they are to be written out, but the logic that does this is somewhere else.
struct Printer {
virtual void print(const Paragraph& paragraph) = 0;
virtual void print(const Link& link) = 0;
virtual void print(const List& list) = 0;
};
struct HtmlPrinter : Printer {
void print(const Paragraph& paragraph) override {
std::cout << "<p>" << paragraph.text << "</p>";
}
void print(const Link& link) override {
std::cout << "<a href=\"" << link.url << "\">" << link.text << "</a>";
}
void print(const List& list) override {
std::cout << "<ul>\n";
for (const auto& item : list.items) {
std::cout << " <li>" << item << "</li>\n";
}
std::cout << "</ul>\n";
}
};
struct MarkdownPrinter : Printer {
void print(const Paragraph& paragraph) override {
std::cout << paragraph.text << "\n\n";
}
void print(const Link& link) override {
std::cout << "[" << link.text << "](" << link.url << ")\n";
}
void print(const List& list) override {
for (const auto& item : list.items) {
std::cout << "- " << item << "\n";
}
}
};
We now have two implementations of Printer
. Each of them focuses on one format.
It is time to use this code:
int main() {
Element* element = getElement();
Printer* printer = getPrinter();
element->print(printer);
}
Great! I think the above code stands up for itself. We have combined multiple data types with multiple behaviors while maintaining full separation. If we need to tweak something in the HTML syntax, we have one class that takes care of just that - outputting all elements to HTML.
Adding a new format does not require modification of Element.
Adding a new element, besides the trivial implementation of the Element::print
method, requires adding a new Printer::print
method to each Printer
. This is understandable, no one will do it for us. The plus side is that if we forget to add one of the methods, the program compilation will fail. In comparison, earlier we could forget to add another if and the program would run, but it would work incorrectly.
We obtained a combination of advantages from both previous examples.
Finally, let’s map the class and method names to the visitor pattern terminology:
Class/Method | Visitor pattern terminology |
---|---|
Printer |
Visitor |
Printer::print() |
Method visit |
Element::print() |
Method accept |
Personally, I’m not a fan of using this terminology in code. An experienced programmer will quickly realize they are dealing with a visitor anyway, and naming functions and classes in a way that is consistent with their role in the code feels more natural to me.