This Is How You Bootstrap a Data Team
Data alone is not enough—we needed the right storytellers.
Six months ago, I packed up my travel-sized toothbrush kit, my favorite coffee mug now filled with pens and business cards, and a duffel bag full of gym socks and free conference tee-shirts. With my start-up survival kit in tow, it was time to move on from my job as a back-office engineer.
I dragged my chair ten feet across the office and began my new life as the engineering lead of Betterment’s nascent data team—my new mates included two talented data analysts, a data warehousing engineer and a marketing analyst, also the product owner. I was thrilled. There was a lot for us to do.
In our new roles, we are now informing and guiding many of the ongoing product and marketing efforts at Betterment. Thinking big, we decided to dub ourselves Team Polaris after the sky's brightest star.
Creating a tighter feedback loop
Even though our move to create an in-house data team was a natural part of our own engineering team evolution here at Betterment, it’s still something of a risky unknown for most companies.
Business intelligence tooling has traditionally been something that comes at a great upfront cost to an organization (it can reach into the millions of dollars)—but as a startup, we instead looked carefully at how we could leverage our homegrown talent and resources to build a team to seamlessly integrate into the existing company architecture.
Specifically, we wanted a tight feedback loop between the business and technology so that we could experiment and figure out what worked before committing real dollars to a solution—aka high-frequency hypothesis testing. We needed a team responsible for collecting, curating and presenting the data—and our data had to be trustworthy for objective metric-level reporting to the organization.
Our work consisted of collaborating with our marketing, analytics, and product teams to establish systems and practices that:
- Measure progress towards high level goals
- Optimize growth and conversion
- Support product and project strategy
- Improve customer outcome
A guide to tactical decisions
With these requirements in mind, here are some of the tactical decisions we made from the start to get our new data team off the ground. In the future, expect to read more from our team about how we use our data insights to drive product and growth development at Betterment.
1. Define our process
For us the obvious first order of business was to deliver continuous, incremental value and gradual transition from legacy systems to new ones.
Our initial task was to interview internal stakeholders to get at their data-related pain points. We sent out questionnaires in advance but collected answers through face-to-face dialogue. A couple of hours of focused conversation defined a six-month tactical focus for the team. Then, with our meticulous notes compiled, it became clear to us that our major challenges lay with the accessibility to and reliability of key performance metrics.
With the interviews in hand, the team sat down to pen a manifest and define pillars by which we would measure our progress. We came up with ACES: Automated, Consistent, Efficient, and Self-serviced as the motifs by which we could create a measurable feedback loop.
2. Inform the roadmap
Within three weeks of operations, it became clear that we could use turn-around time metrics from ad-hoc or advisory requests to inform us where we need to invest in project cycles and technology.
Yet busy with data projects we were feeling the pain ourselves. We needed more easily accessible business measures with sufficient context by which we and our colleagues could roll up or slice and dice our data. We knew that a star schema approach would help us clarify a data narrative and give all of us a consistent view of truth. But there was no way for us to do it all at once.
3. Limit disruption while we build
To limit disruption to our colleagues while delivering incremental improvements, we implemented a clever and completely practical transition plan within MySQL’s native feature set. Specifically, we set up a new database server dedicated to reporting and ad-hoc workloads. This dedicated MySQL instance consisted of three database schemas we now refer to as our Triumvirate Data Warehouse.
The first member of this triad is betterment_live. This database is a complete, real-time, read-only replica of our production database. It’s just native MySQL master-slave replication; easy to set up and maintain on dedicated hardware or in the cloud.
The second member is client_analytics. It is a read-write schema to which our colleagues have full privileges. The usage pattern is for folks to connect to client_analytics and from there to: cross-query against the betterment_live schema, import/export and manipulate custom datasets with Python or R, perform regression and analysis, etc. Everybody wins. Our data workers retain their ability to run existing processes until we can transition them to a “better” way while the engineering team has successfully expelled business users out of an already busy production environment.
Last but certainly not least is our new baby, the data warehouse. It is a read-only, star-schema representation of fact and dimensional tables for growth subject areas. We’ve pushed the aforementioned nuisance and complexity into our data pipeline (ETL) process and are able to synthesize atomic and summary metrics in a format that is more intuitive for our business users.
Legacy workloads that are complex and underperforming can now be transitioned over to the data warehouse schema incrementally. Further, because all three schemas live in the same MySQL server, client_analytics becomes a central hub from which our colleagues can join tables that have not yet been modeled in the warehouse with key dimensions that have been. They get the best of both worlds while we look to what comes next Finally, transition is prioritized in-stream with the needs of the organization and we never bite off more than we can chew.
4. Standardize and educate
A major part of our data warehouse build out was in clarifying definitions of business terms and key metrics present in our daily parlance. Maintaining a Data Dictionary wiki became a part of our Definition of Done. Our dashboards, displayed on large screen TVs and visible by all, were the first to be relabeled and remodeled. Reports available to the entire office were next. Cleaning up the most looked at metrics helped the organization speak to and understand key data in a consistent manner.
5. Maintain a tight feedback loop
The team follows an agile process familiar to modern technology organizations. We Scrum, we Git, and we Jenkins. We stay in regular contact with stakeholders throughout a build-out and iterate over MVPs.
Now, back to the future
These are just the first few bootstrapping steps. In future posts I will be tempted to wax technical and provide more color on the choices we’ve made and why. I will also share our vision for an Event Narrative Data Warehouse and how we are leveraging start-up friendly partners such as MixPanel for real-time event processing, funneling, and segmentation. Finally, we will share some tactics for enabling data scientists to be more collaborative and presentational with their R or Python visualizations.
At Betterment, our ultimate goal is to continue developing products that change the investing world—and that starts with data. But data alone is not enough—we needed the right storytellers. As we see it, the members of Team Polaris are the bards of a data narrative that help the organization grow while delivering a top-tier product.
Interested in engineering at Betterment? Betterment is an engineering-driven company that has developed the most trusted online financial advisor based on the principles of optimization and efficiency. Learn more about engineering jobs and our culture.
Determination of most trusted online financial advisor reflects Betterment LLC's distinction of having the most customers in the industry, made in reliance on customer counts, self-reported pursuant to SEC rules, across all online-only registered investment advisors.