Create a plugin for Google Protocol Buffer
Google’s Protocol Buffer is a library to encode and decode messages in a binary format optimised for compactness and portability between different platforms. At the moment the core library can generate code for C/C++, Java and Python but additional languages can be implemented by writing a plugin for the Protobuf’s compiler.
There is already a list of plugins to support third party languages however you can write your how plugin to output custom code tailored for your needs. In this post I’m going show an example of a plugin written in Python.
Configuration
Before start writing the plugin we need to install the Protocol Buffer compiler:
apt-get install protobuf
to be able to compile ore .proto
file through our plugin and the Python Protobuf package:
pip install protobuf
to implement the plugin.
Writing the plugin
The interface between the protoc
compiler is pretty simple: the compiler will pass a CodeGeneratorRequest
message on the stdin
and your plugin will output the generated code in a CodeGeneratorResponse
on the stdout
. So the first step is to write the code which reads the request and write an empty response:
# !/usr/bin/env python
import sys
from google.protobuf.compiler import plugin_pb2 as plugin
def generate_code(request, response):
pass
if __name__ == '__main__':
# Read request message from stdin
data = sys.stdin.read()
# Parse request
request = plugin.CodeGeneratorRequest()
request.ParseFromString(data)
# Create response
response = plugin.CodeGeneratorResponse()
# Generate code
generate_code(request, response)
# Serialise response message
output = response.SerializeToString()
# Write to stdout
sys.stdout.write(output)
The protoc
compiler follows a naming convention for the name of the plugins, as state protobuf-plugin
you can save the code above in a file called protoc-gen-custom
in your PATH
or save it with any name you prefer (like my-plugin.py
) and pass the plugin’s name and path to the --plugin
command line option.
We are choosing the second option so we’ll save our plugin as my-plugin.py
, then compiler’s invocation will looks like this (assuming that the build
directory already exists):
protoc --plugin=protoc-gen-custom=my-plugin.py --custom_out=./build hello.proto
The content of hello.proto
file is simply this:
enum Greeting {
NONE = 0;
MR = 1;
MRS = 2;
MISS = 3;
}
message Hello {
required Greeting greeting = 1;
required string name = 2;
}
The command above will not generate any output because our plugin does nothing, time now to write some meaningful output.
Generating code
Lets modify the generate_code()
function to generate a JSON representation of the .proto
file but first we need a function to traverse the AST and return all the enumerator, messages and nested types:
def traverse(proto_file):
def _traverse(package, items):
for item in items:
yield item, package
if isinstance(item, DescriptorProto):
for enum in item.enum_type:
yield enum, package
for nested in item.nested_type:
nested_package = package + item.name
for nested_item in _traverse(nested, nested_package):
yield nested_item, nested_package
return itertools.chain(
_traverse(proto_file.package, proto_file.enum_type),
_traverse(proto_file.package, proto_file.message_type),
)
And now the new generate_code()
function:
import itertools
import json
from google.protobuf.descriptor_pb2 import DescriptorProto, EnumDescriptorProto
def generate_code(request, response):
for proto_file in request.proto_file:
output = []
# Parse request
for item, package in traverse(proto_file):
data = {
'package': proto_file.package or '<root>',
'filename': proto_file.name,
'name': item.name,
}
if isinstance(item, DescriptorProto):
data.update({
'type': 'Message',
'properties': [{'name': f.name, 'type': int(f.type)}
for f in item.field]
})
elif isinstance(item, EnumDescriptorProto):
data.update({
'type': 'Enum',
'values': [{'name': v.name, 'value': v.number}
for v in item.value]
})
output.append(data)
# Fill response
f = response.file.add()
f.name = proto_file.name + '.json'
f.content = json.dumps(output, indent=2)
For every .proto
file in the request we iterate over all the items (enumerators, messages and nested types) and we write some informations in a dictionary. Then we add a new file to the response and we set the filename, in this case equal to the original filename plus the .json
extension, and the content which is the JSON representation of the dictionary.
If you run again the protobuf compiler it will output a file named hello.proto.json
in the build
directory with this content:
[
{
"type": "Enum",
"filename": "hello.proto",
"values": [
{
"name": "NONE",
"value": 0
},
{
"name": "MR",
"value": 1
},
{
"name": "MRS",
"value": 2
},
{
"name": "MISS",
"value": 3
}
],
"name": "Greeting",
"package": "<root>"
},
{
"properties": [
{
"type": 14,
"name": "greeting"
},
{
"type": 9,
"name": "name"
}
],
"filename": "hello.proto",
"type": "Message",
"name": "Hello",
"package": "<root>"
}
]
Conclusion
In this post we walked through the creation of a Protocol Buffer plugin to compile a .proto
file into simplified representation in JSON format. The core part is the interface code to read a request from the stdin
, traverse the AST and write the response on the stdout
.
However you are not limited in just transforming the input into another format but you can use the request to output any code in any language, you can parse a .proto
file and output code for a RESTful API in Node.js, converting the message and enum definitions into a XML file or even generate another .proto
file i. e. without the deprecated fields.