Monday, July 10, 2017
When my manager, Karl, pitched Presto, Facebook’s open source distributed SQL query engine, as the upcoming intern project, I think all three of us analytics interns (Kathy, also a DTech Scholar, Breck, a Vanderbilt senior, and myself) were ready for the challenge. We had spent the last few weeks testing an aggregated query API using SQL-like queries and were feeling ready to attack a new type of project. What really got me excited about this product, however, was Karl’s enthusiasm for the tool – we would later learn that the need for this product accompanied a huge overhaul of Zuora’s core architecture.
Over the course of the next several weeks, we dove straight into the documentation. We created thorough in-company documentation of our set-up process for Presto’s installation, documented preliminary evaluations of the tool and learned how to use the Presto Client. By the end of this time, Kathy and Breck had created a custom connector for Zuora’s query system, and I had configured multiple Presto nodes (a coordinator and set of workers) using Amazon Web Services to benchmark the performance of various operations. We ventured through many bugs, failures, and mishaps, but made it to the other side alive.
It was on our third week, when both teams were in the final stages of their tasks, that we ran into the same weakness in Presto. Designed for simplicity, Presto lacked a vital join-based optimization that our architecture relied upon. Without it, queries would take eons, at least from the user’s perspective. Hence, without this feature, Presto was unusable.
The following day, after both sub-teams presented the same problem, the rest of our team came to the same conclusion. And hence, after three weeks of hard work to integrate Presto, we scratched the entire operation. I had certainly heard some interns complain after leaving their companies that their hard-work would never be integrated into the product. Still, to experience this first-hand was very eye-opening. I won’t lie, it was a huge blow to my confidence to have our work discarded. I didn’t immediately respond to the situation with a happy-go-lucky attitude – I felt that I had wasted my time. Of course, what type of blog post would this be if I didn’t come away with some silver lining?
It was later that week, during a talk with our senior engineers at Zuora, that my perspective pivoted. One of the senior architects, a man named Henning with a strong German accent and many pairs of sandals, revealed that his original computer science courses (“way back when,” he said) had been taught in Pascal. Languages like C have long since left Pascal in the dust. What had really been valuable to him was not the specific tool but the learning experience.
With growth, your specific knowledge on a certain tool or language will rapidly become outdated. With any move, there will completely new internal tools, services, and platforms to master. There will always be a learning curve. What is important is the ability to learn quickly and adapt to new and changing tools, languages, services, etc. And so, in many ways, learning Presto started to seem like not such a waste of time after all. I had learned how to find the resources I need, work effectively in a team, and document and present my findings. I had interacted with the Facebook Presto team, both on GitHub and in person, learning how to ask the “right” questions. Even without physical results, it was a valuable learning experience overall.