Some years ago, I found myself experimenting with template metaprogramming in C++. One of the most intriguing projects that came out of this tinkering was a generic XML parser that combined several powerful features: declarative syntax, automatic validation, and bidirectional serialization/deserialization—all without requiring duplicate code.
Today I’m dusting off this experiment to share how it works and reflect on how modern C++ features could simplify this approach even further.
The Goal: Declarative XML Handling
The primary goal of this project was to create a system where:
- You could define an XML structure once, using a declarative syntax
- The same definition would handle both parsing and serialization
- Validation would be built-in (required vs. optional elements)
- The code would be type-safe and leverage compile-time checking
What I wanted to avoid was the typical approach where you write separate code for parsing, validation, and serialization—an approach that often leads to duplicated logic and inconsistencies.
For example, consider this XML document:
<root key="mykey">
<data id="1">D1</data>
<data id="2">D2</data>
</root>
In a traditional approach, you might write separate parsing code, validation logic, and serialization functions for this structure. But with this template-based approach, you define the structure once and get all these capabilities automatically.
Here’s a glimpse of what the final API looked like:
auto xml =
"root"_node(
Required(),
"key"_attr(Required()),
"client_id"_attr(),
NodeList(
"data"_node(
"id"_attr(Required()),
Text(Required()))));
With this single definition, you could:
- Parse XML strings into a structured
NodeData
object - Validate that all required elements exist
- Serialize the data structure back to XML
How It Works: Template Metaprogramming Magic
The implementation relies heavily on several C++ template metaprogramming techniques:
1. The NodeData Structure
The core of the system is a generic NodeData
structure that acts as an intermediate representation:
struct NodeData
{
std::string name;
std::string text;
std::map<std::string, std::vector<NodeData>> subnodes;
std::map<std::string, std::string> attributes;
};
This structure stores everything we need about an XML node: its name, text content, child nodes, and attributes.
2. Node Types and Traits
The system defines several node types that know how to interact with both XML and the NodeData
structure:
Node<name, Args...>
: Represents an XML elementAttribute<name, Args...>
: Represents an XML attributeText<Args...>
: Represents text content within an elementNodeList<SubNodeType, Args...>
: Represents a list of similar child nodes
Each node type implements several key methods:
subnode()
: Retrieves the relevant part of an XML documentvalidate()
: Checks if the node meets requirementsparse()
: Extracts data from XML intoNodeData
serialize()
: Writes data fromNodeData
back to XML
3. Type Deduction and User-Defined Literals
One of the key challenges in C++14 was that constructors couldn’t deduce template parameters from arguments. This limitation required a creative workaround using user-defined literals and builder classes:
template<class CharT, CharT... chars> auto operator""_node()
{
static const char name[] = {chars..., 0};
return NodeBuilder<name>();
}
This trick allows us to write "root"_node()
which creates a NodeBuilder<"root">
that can then build a Node<"root", Args...>
with the proper template parameters.
4. Variadic Templates for Complex Structures
Variadic templates allow us to handle arbitrary nesting and combinations of nodes:
template<class... Args>
struct is_required;
template<>
struct is_required<> : std::false_type { };
template<class Arg, class... Args>
struct is_required<Arg, Args...> :
std::conditional_t<std::is_same_v<Arg, Required>,
std::true_type,
is_required<Args...>> { };
This recursive template specialization checks if Required
is among the arguments passed to a node, enabling us to handle validation elegantly.
Example in Action
Let’s see how the system handles various XML inputs:
auto examples = {
"<wrong />"s, // Wrong root element
"<root />"s, // Missing required attribute
"<root key=\"mykey\" />"s, // Valid minimal example
"<root key=\"mykey\"><data id=\"1\" /></root>"s, // Missing required text
"<root key=\"mykey\"><data id=\"1\">D1</data><data id=\"2\">D2</data></root>"s // Fully valid
};
The system will:
- Reject
<wrong />
because it has the wrong root element name - Reject
<root />
because the required “key” attribute is missing - Accept
<root key="mykey" />
as a minimal valid document - Reject
<root key="mykey"><data id="1" /></root>
because the data node is missing required text - Accept the full example with two data nodes, each with their required attributes and text
Modern C++ Improvements
This code was written in the early days of C++17. If I were to revisit it today with C++20, several improvements would be possible:
1. Class Template Argument Deduction (CTAD)
C++17 introduced CTAD, which would eliminate the need for the NodeBuilder
approach. With C++20 we could directly write:
// Instead of "root"_node(Required())
Node<"root">(Required())
2. String Literals as Template Parameters
C++20 allows string literals as template parameters, which would greatly simplify the implementation:
template<auto name>
class Node { /* ... */ };
// Usage:
Node<"root">
3. Concepts and Constraints
C++20 concepts would allow for more precise constraints on template parameters, making error messages clearer and improving compile times:
template<NodeConcept... Children>
class Node {
// Implementation
};
4. std::visit
and Variant
Using std::variant
from C++17 could provide a more type-safe approach to handling different node types:
using NodeVariant = std::variant<Element, Attribute, Text>;
5. if constexpr
C++17’s if constexpr
should eliminate the need for some of the template specializations:
if constexpr (is_required_v<Args...>) {
// Handle required case
} else {
// Handle optional case
}
Conclusion
This XML parsing experiment demonstrates the power of template metaprogramming in C++11. By combining user-defined literals, variadic templates, and SFINAE, we can create a declarative API for XML handling that provides type safety, validation, and bidirectional conversion.
While the implementation might seem complex, it delivers significant benefits:
- Write once, use for both parsing and serialization
- Built-in validation enforced at runtime
- Declarative syntax that mirrors the structure of XML
- Type safety that catches errors at compile time
Modern C++ would make this implementation even more elegant and concise, but the core ideas remain valuable. Template metaprogramming allows us to create domain-specific languages within C++ that can dramatically reduce boilerplate and improve code reliability.
The next time you find yourself writing separate code for parsing, validating, and serializing a data format, consider whether a template-based approach might let you define the structure once and get all those operations for free.
See the full code example here