I have a confession to make: I’m irrationally obsessed with developer tools—integrated development environments, programming languages, container engines, you name it. I catch myself reviewing the latest SpriteKit API additions on the off-chance I find time to make a game to play with my kids, which may never happen because there are a million other things I want to toy around with. If I could control myself, I’d stick to the tools that I’m most likely to actually use in the next few months, but instead ask me about cross-platform graphical user interface toolkits and I’ll talk your ear off about the latest and greatest things I have absolutely no foreseeable use for.
My personal obsessions bleed over into doing technology surveys since of course I get to learn even more, often in a domain I wouldn’t have explored on my own. There’s a key difference though—technology surveys aren’t doe-eyed daydreams of what might be possible, they’re grounded assessments of practicable options. Doing one well can be the difference between confidently delivering on time to a happy client and constantly running around putting out fires and mitigating delays caused by poor technical decisions and hoping the goal posts miraculously get pushed later down the road. So while I certainly wouldn’t advise anyone else to read every single Arstechnica article and comb through software release notes with the eye of a lawyer reading a contract, I would very much recommend that every project start with a good survey.
Plenty has been written about general guidance for making technology choices, and if you haven’t read Dan McKinley’s “Choose Boring Technology” blog post, start there. He presents a nuanced approach to making technology choices based on a wealth of experience. Completing a technology survey will take you beyond general guidance and give you something actionable. It will also serve to document the options considered, key assumptions, and viable alternatives, allowing you to quickly revisit your decisions in light of new information and providing a jumping off point for future surveys. How you approach the survey will depend on whether you are conducting it for a specific project and set of requirements (constraint-driven) or are looking to leverage something more broadly across multiple projects (opportunity-driven).
A couple years ago, before I came to Credera, I led the design and architecture for a highly concurrent network-based replication service. (I’m going to play a bit fast and loose with historical accuracy, which isn’t particularly salient here. For the sake of brevity, I will also omit details.)
Our initial requirements included needing to run on Windows, Linux, and Android—of which, only the Android requirement had been nailed down to a particular version of the OS. We might be asked to target Windows 7 or 10 and the Linux distro was completely up in the air. Thankfully, Android wasn’t going to be running on a phone or tablet, so we had decent hardware and no battery concerns. We also had to provide both a C++ and a Java SDK for interacting with our service. Most importantly, we had to have something ready to demonstrate in about three months.
The first place to start in a constraint-driven survey is focusing on the best defined and most constraining requirements and most impactful decisions. In my case, this was being able to run on Android and choosing which programming language to use. Google’s own documentation highly recommended using Java on Android and falling back to C/C++ as a last resort. Theoretically, any JVM language might work, but Google had their own virtual machine and compatibility was not a foregone conclusion. Our team had extensive experience with C#, and it could be used on Android through a third-party product we were familiar with (Xamarin). However, both CLR-based languages like C# and F# and JVM-based languages like Java would be extremely difficult to sell politically. My boss, fairly or unfairly, was strongly insisting on using a compiled language for what he thought were both real and stakeholder-perceived performance reasons.
That left C and C++. I decided unilaterally (and admittedly with personal bias) that using C would be unnecessarily difficult for my team to use compared to C++. C++ would be an easy sell, and it would make the C++ version of the SDK trivial to implement. There seemed to be no reason to continue investigating alternatives and our team commenced on the next key step of our survey—we started creating prototypes. Our goal was to quickly validate our assumptions, discover pitfalls, and, if the prototyping work went well, have a codebase to reference or directly use for the project. We parallelized the effort by creating one prototype per core project feature, including cross platform deployment, concurrent I/O, and connection security.
For each prototype, a developer would conduct a survey of available libraries/toolchains and make a prototype using the most promising one or two options, then we would regroup and discuss what we had learned. We had a few key takeaways:
- Compile times were already slow for the prototypes, particularly those that ended up using multiple libraries.
- Most of the team was unfamiliar with C++, and it was taking us a significant amount of time to come up to speed.
- CMake and Visual Studio didn’t get along well at the time (that’s since been much improved, hat tip to the VS team!).
- Cross platform compatibility was extremely painful—not only were we targeting multiple operating systems, but we also had to target different compilers and standard library implementations.
Partly due to all of the factors above, the prototype effort started dragging on too long. The cross-platform issue was particularly worrisome—we had a four-dimensional juggling act to conduct. Each library we used had to work on each OS, with each compiler, and with each standard library implementation. We may have started optimistic that most libraries would be compatible with all OSes, compilers, and standard library implementations, but we were quickly disabused of that notion and felt like we were tiptoeing in a minefield of incompatibility.
I decided our three-month time window was too short to go with C++. We needed to make our cross-platform issues go away or at least fade into the background. I went back to the language survey—had we considered all of our options? I dug a bit more and found the Go mobile project. It was clearly in beta, but we didn’t need it to do much—we were creating a back-end service and barely needed any platform integration. Go itself seemed to tick all the other boxes as well: most of the features we needed were part of the standard library, it appeared to have a much easier learning curve, compile times were blissfully fast, concurrency was a central capability of the language, there was one toolchain to deal with, and being a compiled language, I could get my boss onboard.
The team shifted gears, wrote new prototypes in Go, and this time our efforts went much smoother. We were able to provide both a Java and C/C++ SDK thanks to Go’s C bridge (CGO), which wasn’t without its own pains, but at least the SDK was fairly limited in scope. We presented our technology survey to first my boss and then to the other stakeholders. Despite them never having used Go on a previous project, we were able to easily make the case for it based on our detailed survey.
In the end, we ended up succeeding in delivering something we were proud of on time and on budget. Making the right decisions up front made that possible. There were plenty of surprises on the way—Android and the C++ SDK ended up being dropped as requirements, we had perpetual issues getting the third-party Go debug tool (delve) working on our Windows machines, and a late-breaking requirement was added for setting IP header values on all packets we transmitted. The survey may have turned out very differently if we knew all that at the beginning, particularly since our biggest initial constraints were removed. Had we ever reached a point where using Go was no longer viable though or, more happily, begun a new project with similar requirements our original technology survey would’ve given us a solid jumping off point.
During my first few weeks at Credera, I was given the incredible opportunity to deep dive into a topic I was interested in and help prepare a Lunch ‘n Learn presentation with one of my new colleagues. I decided to look into the topic of distributed tracing, which tracks user requests as they move through a set of connected services and provides insight into where errors or delays are occurring and why.
With this type of open-ended, opportunity-driven survey, the first step was to understand the trade-space—in this case, how and why did distributed tracing come into being and who are the main players in the domain. Several of the distributed tracing tools I found referenced Google’s Dapper research paper from 2010, which did an excellent job of helping me understand the raisons d’être of distributed tracing, some of the technical terms involved, and generally how it works in practice. My colleague, who had some experience with performance analysis, pointed me to the related set of tools described as application performance management (APM) software. I also stumbled upon a great source of current information coupled with personal insights from experts using and making distributed tracing tools—conference videos posted to YouTube. In particular, the Cloud Native Computing Foundation’s conference videos.
From my research, I learned that the distributed tracing community was in the process of coalescing around a shared standard for interacting with distributed tracers—OpenTracing. The appeal of OpenTracing was learning one API instead of many, with the actual tracer implementation being abstracted away. That gave me a promising avenue to dive into, and so I took the next step and started creating a prototype.
For the prototype, I wanted to exercise as much functionality as possible in a short time frame. I chose to create three microservices—one with Spring Boot, another with just Java, and the last with Go to see how OpenTracing worked on different platforms and languages. I wanted to instrument HTTP, gRPC, and database requests to ensure instrumentation and reports would translate through different communication paths.
My first realization was how much language choice and OpenTracing constrained my options for choosing a distributed tracer. I had gone into the prototype thinking that OpenTracing would enable me to switch tracers at will. However, OpenTracing requires implementations/wrappers for each platform/language and for each tracer. If I had chosen PHP as one of the languages, I’d have no options and if I had chosen Ruby, I’d lose half my choices. Even with Java and Go, I really only had one open source option at the time, Twitter’s Zipkin (I was reluctant to try something proprietary for such an open-ended experiment).
As I got into the prototype implementation and started integrating, I discovered that the OpenTracing specification was undergoing frequent and incompatible changes. I had known upfront that OpenTracing was pretty new, but I hadn’t anticipated how immature its library support would be. Getting all of the libraries to support the same version of the spec while avoiding key bugs in older releases felt like walking a tightrope while juggling.
I managed to get everything working for the presentation, with only minor lingering concerns like clock synchronization issues when running for hours or days. Recently when I went back to start upgrading to newer library versions, I ended up getting stymied when one of the many libraries I was using hadn’t updated at all in the last few months while others had moved on to an extremely recent version of the spec. At this stage, if I were asked to recommend a distributed tracing setup to a client, I would probably advise them against adding the extra complication of the OpenTracing abstraction unless they had a very homogeneous deployment—it’s just not mature enough yet to warrant avoiding the use of a specific tracer directly. However, I hope that predicament will soon be outdated as the project matures. I look forward to being able to use an open source tracer during development and then switch over with minimal effort to use a proprietary one in production if it provides additional value to my client.
The trend toward open source development in general, and public bug tracking specifically, has been a huge boon to conducting a good survey. Rather than spend weeks pulling out my hair trying to get something to work, I’ve often found perusing a list of known issues during prototyping or early on in a project to be enormously time and energy saving. Known issues and their comment threads give an excellent read into what problems I might run into and how likely those problems are to be resolved soon. The smaller the project, the more likely I am to read every single open issue posted.
You may have noticed that I haven’t actually provided any examples of what the artifact looks like for a technology survey. That wasn’t a mistake—it’ll vary greatly depending on the situation and stakeholder expectations. The important thing is to capture your findings and prototypes so that they can be revisited later.
Experience has taught me the hard way that when it comes to integration, the trouble is often in the details and there are usually a lot of details. That’s why projects like JHipster and all of the myriad Linux distributions, which work to ensure that disparate software pieces fit together, are so valuable. When there isn’t a ready-made integration available that meets your needs, keep in mind that having more pieces usually leads to more integration work and plan accordingly.
The best-executed surveys will take the time to go beyond the brief marketing blurb of a particular technology and dig into its limitations, maturity, trade-offs, development community, and software ecosystem. There is no better way to attain that depth than by rolling up your sleeves, getting your hands dirty, and making prototypes that hash out how to accomplish the most potentially problematic bits of functionality. This lets your team ‘fail fast’ with an initial approach or two then move forward with a solid plan backed by real experience.