Free for 90 days: Sign up now and get 90 days managed free after your first deposit. See offer details

<title>Dismiss</title>

Keeping Our Code Base Simple, Optimally

Betterment engineers turned regulatory compliance rules into an optimization problem to keep the code base simple. Here's how they did it.

Articles by Mike

By Mike Matsui
  |  Published: September 4, 2013

Interested in working at Betterment as an engineer? Learn more here.

At Betterment, staying compliant with regulators, such as the Securities and Exchange Commission, is a part of everyday life.  We’ve talked before about how making sure everything is running perfectly — especially given all the cases we need to handle — makes us cringe at the cyclomatic complexity of some of our methods.

It’s a constant battle to keep things maintainable, readable, testable, and efficient. We recently put some code into production that uses an optimizer to cut down on the  amount of code we’re maintaining ourselves, and it turned out to be pretty darn cool. It makes communicating with our regulators easier, and is doing so in a pretty impressive fashion.

We were tasked with coming up with an algorithm that, at first pass, made me nervous about all the different cases it would need to handle in order to do things intelligently. Late one night, we started bouncing ideas off each other on how to pull it off. We needed to make decisions at a granular level, test how they affected the big picture, and then adjust accordingly.

To use a Seinfield analogy, the decisions we would make for Jerry had an effect on what the best decisions were for Elaine. But, if Elaine was set up a certain way, we wanted to go back to Jerry and adjust the decisions we made for him.  Then George. Then Newman. Then Kramer. Soon we had thought about so many if-statements that they no longer seemed like if-statements, and all the abstractions I was formulating were already leaking.

Then a light came on. We could not only make good decisions for Elaine, Jerry, and Newman, we could make those decisions optimally.

A little bit of disclaimer here before we start digging in a little more: I can barely scratch the surface of how solvers work. I just happen to know that it was a tool available to us, and it happened to model the problem we needed to solve very well. This is meant as an introduction to using one specific solver as a way to model and solve a problem.

An example

Let’s say at the last minute, the Soup Nazi is out to make the biggest batch of soup he possibly can. For his recipe he needs a ratio of:

  • 40% chicken
  • 12% carrots
  • 8% thyme
  • 15% onions
  • 15% noodles
  • 5% garlic
  • 5% parsley

All of the stores around him only keep limited amounts in stock. He calls around to all the stores just to see what the have in stock and puts together each store’s inventory:

Ingredients in stock (lbs)

Elaine’s George’s Jerry’s Newman’s
Chicken 5 6 2 3
Carrots 1 8 5 2
Thyme 3 19 16 6
Onions 6 12 10 4
Noodles 5 0 3 9
Garlic 2 1 1 0
Parsley 3 6 2 1

Also, the quality of the bags at all of the stores vary, limiting the total number pounds of food the Soup Nazi can carry back. (We’re also assuming he only wants to make at most one visit to each store.)

Pound of food limits

Elaine’s 12
George’s 8
Jerry’s 15
Newman’s 17

With the optimizer, the function that we are trying to minimize or maximize is called the objective function. In this example, we are trying to maximize the number of pounds of ingredients he can buy because that will result in the most soup. If we say that,


a1 = pounds of chicken purchased from Elaine’s
a2 = pounds of carrots purchased from Elaine’s
a3 = pounds of thyme purchased from Elaine’s

a7 = pounds of parsley purchased from Elaine’s
b1 = pounds of chicken purchased from George’s

c1 = pounds of chicken purchased from Jerry’s

d1 = pounds of chicken purchased from Newman’s

We’re looking to maximize,

a1 + a2 + a3 … + b1 + … + d7 = total pounds

We then have to throw in all of the constraints to our problem. First to make sure the Soup Nazi gets the ratio of ingredients he needs:

.40 * total pounds = a1 + b1 + c1 + d1
.12 * total pounds = a2 + b2 + c2 + d2
.08 * total pounds = a3 + b3 + c3 + d3
.15 * total pounds = a4 + b4 + c4 + d4
.15 * total pounds = a5 + b5 + c5 + d5
.05 * total pounds = a6 + b6 + c6 + d6
.05 * total pounds = a7 + b7 + c7 + d7

Then to make sure that the Soup Nazi doesn’t buy more pounds of food from one store than he can carry back:

a1 + a2 + … + a7 <= 12
b1 + b2 + … + b7 <= 8
c1 + c2 + … + c7 <= 15
d1 + d2 + … + d7 <= 17

We then have to put bounds on all of our variables to say that we can’t take more pounds of any ingredient than any store has in stock.

0 <= a1 <= 5
0 <= a2 <= 1
0 <= a3 <= 3
0 <= a4 <= 6

0 <= d7 <= 1

That expresses all of the constraints and bounds to our problem and the optimizer works to maximize or minimize the objective function subject to those bounds and constraints. The optimization package we’re using in this example, python’s scipy.optimize, provides a very expressive interface for specifying all of those bounds and constraints.

Translating the problem into code

If you want to jump right in, the full code sample can be found here. However, there are still a few more things to note:

  • Get numpy and scipy installed.

  • The variables we’re solving for are put into a single list. That means, x = [a1, a2, … , a7, b1, b2 … d7]. With python, it’s helpful to know that we can pull the pounds of food for a particular ingredient out of x,  i.e, [a1, b1, c1, d1] with

    x[ingredient_index :: num_of_ingredients]

    Likewise, we can pull out the ingredients for a given store with

    x[store_index * num_of_ingredients : store_index * num_of_ingredients + num_of_ingredients]

    e.g, [b1, b2, b3, b4, b5, b6, b7]

  • For this example, we’re using the scipy.optimize.minimize function using the ‘NLSQP’ method.

Arguments provided to the minimize function

Objective function

With the package we’re using, there is no option to maximize. This might seem like a show stopper, but we get around it by negating our objective function, minimizing, and then negating the results. Therefore our objective function becomes,

 −a1 − a2 − a3 − a4 − … − d6 − d7

And expressing that with numpy is pretty painless:

numpy.sum(x) * −1.0

Bounds

Bounds make sure that we don’t take more than any one ingredient than the store has in stock. The minimize function takes this in as a list of tuples where the indices line up with x. We can’t take negative ingredients from the store, so the lower bound it always 0. Therefore, 

[(0, 5), (0, 1) … (0, 1)]

In the code example, for readability, I threw all of the inputs into the program into some globals dictionaries. Therefore, we can calculate our bounds with,


def calc_bounds():
bounds = []

for s in stores:
for i in ingredients:
bounds.append((0, store_inventory[s][i]))

return bounds

Guess

Providing a good initial guess can go a long way in getting you to a desirable solution. It can also dramatically reduce the amount of time it takes to solve a problem. If you’re not seeing numbers you expect, or it is taking a long time to come up with a solution, the initial guess is often the first place to start. For this problem, we made our initial guess to be  what each store had in stock, and we supplied it to the minimize method as a list.

Constraints

One thing to note is that for the packages we’re using, constraints only deal with ‘ineq’ and ‘eq’ where ‘ineq’ means greater than. The right hand side of the equation is assumed to be zero. Also, we are providing the constraints as tuple of dictionaries.

(a1 + b1 + c1 + d1) − (.40 * total pounds) > 0
...
(a7 + b7 + c7 + d7) − (.05 * total pounds) > 0

Note here that I changed the constraints from equal-to to greater-than because comparing floats to be exactly equal is a hard problem when you’re multiplying and adding numbers. Therefore, to make sure we limit chicken to 40% of the overall ingredients, one element of the constraints tuple will be,

{'type' : 'ineq',
'fun' : lambda x : sum(extract_ingredient_specific_pounds(x, chicken)) − (calc_total_pounds_of_food(x) * .4) }

Making sure the soup nazi is able to carry everything back from the store:

12 − a1 − a2 − … − a7 >= 0

17 − d1 − d2 − … − d7 >=  17

Leads to,

{'type' : 'ineq',
'fun' : lambda x : max_per_store[store] − np.sum(extract_store_specific_pounds(x, store))}

Hopefully this gives you enough information to make sense of the code example.

The Results?

Pretty awesome. The Soup Nazi should only buy a total of 40 lbs worth ingredients because Elaine, George, Jerry, and Newman just don’t have enough chicken.

9.830 lbs of food from Elaine's. Able to carry 12.0 pounds.
chicken: 5.000 lbs (5.0 in stock)
carrots: 0.000 lbs (1.0 in stock)
thyme: 0.000 lbs (3.0 in stock)
onions: 0.699 lbs (6.0 in stock)
noodles: 1.000 lbs (5.0 in stock)
garlic: 1.565 lbs (2.0 in stock)
parsley: 1.565 lbs (3.0 in stock)
7.582 lbs of food from George's. Able to carry 8.0 pounds.
chicken: 6.000 lbs (6.0 in stock)
carrots: 0.667 lbs (8.0 in stock)
thyme: 0.183 lbs (19.0 in stock)
onions: 0.733 lbs (12.0 in stock)
noodles: 0.000 lbs (0.0 in stock)
garlic: 0.000 lbs (1.0 in stock)
parsley: 0.000 lbs (6.0 in stock)
13.956 lbs of food from Jerry's. Able to carry 15.0 pounds.
chicken: 2.000 lbs (2.0 in stock)
carrots: 3.501 lbs (5.0 in stock)
thyme: 3.017 lbs (16.0 in stock)
onions: 4.568 lbs (10.0 in stock)
noodles: 0.000 lbs (3.0 in stock)
garlic: 0.435 lbs (1.0 in stock)
parsley: 0.435 lbs (2.0 in stock)
8.632 lbs of food from Newman's. Able to carry 17.0 pounds.
chicken: 3.000 lbs (3.0 in stock)
carrots: 0.632 lbs (2.0 in stock)
thyme: 0.000 lbs (6.0 in stock)
onions: 0.000 lbs (4.0 in stock)
noodles: 5.000 lbs (9.0 in stock)
garlic: 0.000 lbs (0.0 in stock)
parsley: 0.000 lbs (1.0 in stock)

16.000 lbs of chicken. 16.0 available across all stores. 40.00%
4.800 lbs of carrots. 16.0 available across all stores. 12.00%
3.200 lbs of thyme. 44.0 available across all stores. 8.00%
6.000 lbs of onions. 32.0 available across all stores. 15.00%
6.000 lbs of noodles. 17.0 available across all stores. 15.00%
2.000 lbs of garlic. 4.0 available across all stores. 5.00%
2.000 lbs of parsley. 12.0 available across all stores. 5.00%

Bringing it all together

Hopefully this gives you a taste of the types of problems optimizers can be used for. At Betterment, instead of picking pounds of ingredients from a given store, we are using it to piece together a mix of securities, in order to keep us compliant with certain regulatory specifications.

While there was a lot of work involved in making our actual implementation production-ready (and a lot more work can be done to improve it), being able to express rules coming out of a regulatory document as a series of bounds and constraints via anonymous functions was a win for the readability of our code base. I’m also hoping that it will make tacking on additional rules painless in comparison to weaving them into a one off algorithm.

Recommended Content

View All Resources
Investing’s Pain Gap: What You Put Up With To Earn Returns

Investing’s Pain Gap: What You Put Up With To Earn Returns

Markets are frustrating—especially when you look at a year’s worth of returns. Year to year, you can easily experience what we call the pain gap. The key is to not let the pain gap create a behavior gap between your account and market performance.

ETF Selection for Portfolio Construction: A Methodology

ETF Selection for Portfolio Construction: A Methodology

Betterment seeks to maximize investor take-home returns, which drives our investment selection criteria and process.

Guidelines for Testing Rails Applications

Guidelines for Testing Rails Applications

Discusses the different responsibilities of model, request, and system specs, and other high level guidelines for writing specs using RSpec & Capybara.

Explore your first goal

Cash Reserve

Our high-yield account built to help you earn more on every dollar you save.

Safety Net

This is a great place to start—an emergency fund for life's unplanned hiccups. A safety net is a conservative portfolio.

Retirement

Whether it's a long way off or just around the corner, we'll help you save for the retirement you deserve.

General Investing

If you want to invest and build wealth over time, then this is the goal for you. This is an excellent goal type for unknown future needs or money you plan to pass to future generations.

See details and disclosure for Betterment's articles and FAQs.