Aug 24, 2022

Technically Minded | MLOps Best Practices Part 1

Vincent Yates
Jason Goth

Vincent Yates and Jason Goth

Technically Minded | MLOps Best Practices Part 1

Credera is excited to announce the release of our latest podcast: "MLOps Best Practices: Part 1"

This podcast, which is available on iTunesSpotifyGoogle, and Anchor FM, brings together some of the brightest sparks in technology and transformation consulting to wax lyrical on current trends and challenges facing organizations today. 

On This Episode

Artificial intelligence (AI) and machine learning (ML) can undoubtedly offer significant benefits to most organizations, but only when used correctly. So what does it take to get a machine learning application into development, and into development reliably, securely, and repeatably?

In Part 1 of our MLOps best practices series, Chief Data Scientist Vincent Yates and Credera's CTO Jason Goth focus on the challenges associated with MLOps and discuss a few solutions for leaders to implement in their organizations.

Listen now.

The following transcript has been edited for clarity and length.

What is MLOps?

Vincent Yates (00:02):

Welcome to Technically Minded, a podcast brought to you by Credera. We get technology leaders together to discuss what's happening in our world. Our discussions are fun, lighthearted, and frankly opinionated, but hopefully it gives you a sense of what matters, what to pay attention to, and what to ignore.

Today, we're going to pick up on the topic of MLOps. Previously, Jason and I wrote a blog series entitled, "Mind the Gap" in which we asserted that it's not AI unless it's in production. Artificial intelligence, AI, or machine learning, ML, can undoubtedly offer step change benefits to most organizations. In fact, some that we've been at, we've helped with, GE specifically has saved industrial customers over $1.6 billion, with a B, billion dollars through the use of ML power and predictive maintenance.

Highmark saved over $260 million, just in 2019 by leveraging ML to protect fraud, waste, and abuse in the healthcare insurance group. ML powered-recommendation engines account for more than 35% of all of Amazon's total revenue. So it's no joke, said differently, this stuff is really important.

As we've talked about on the last podcast, it's a lot of engineering challenges that go into making models in production. Often these aren't done though, which reduces the effectiveness and leads to a lot of technical debt. In fact, Google researchers call machine learning, "the high interest credit card of technical debt." So today we're going to focus on those challenges, some of the solutions, and joining me as always back freshly from vacation is Jason Goth, our CTO. Welcome, Jason.

Jason Goth (01:51):

Thanks, Vincent. I would say I'm excited to be here, but I think my head is still in Argentina.

Vincent Yates (01:58):

Well, how was the trip by the way?

Jason Goth (02:00):

It was great! It was a great trip.

Vincent Yates (02:01):

Is Argentina worth visiting?

Jason Goth (02:03):

Argentina is definitely worth visiting.

MLOps Best Practices

Vincent Yates (02:05):

OK. So to the topic of hand, Jason, MLOps. If I as a data scientist think about this domain, if I go back to my early days as sort of actual hands on keyboards, making algorithms, trying to put them to production, it was difficult to say the least. It was hard. I didn't really know what to do, there were a lot of challenges here.

Help me understand, if we just rewind the tape a bit, and you know me, I love to try and draw equivalent and examples back from software development, is there a world, was there a world, in which software development was kind of the same way? Stuff was being done locally on laptops, and then something changed, they realize how hard that is and all the challenges brought with that. Are there some parallels here? I'm specifically thinking about in that DevOps space. What did that journey look like and what are some lessons that we've learned there that we can sort of pull over?

Learning from the DevOps Journey

Jason Goth (02:56):

Well, there are a lot of parallels. That's why they call it MLOps, to parallel the DevOps story. If you think back to the mid-'90s, maybe late '90s, when things started really moving online, applications went from something that you had to install with a CD on your desktop or something like that, the updates would come out every year or two or three, into being online where while things can move very quickly, we can update very quickly. We can be agile, we can learn and adapt, but we can only do that if we can get the software in production.

And that was the problem. We used the same techniques of building software very manually, very locally, and then trying to get that into production was very difficult and fraught with failure. So a lot of these concepts of automation, configuration management, etc., came up about how do we move things efficiently from a development environment, through testing, into production very quickly in a matter of weeks, or a month.

Vincent Yates (04:02):

That makes sense, it sounds a lot like what clients are calling us in today for. So often I get a lot of calls these days with clients who have potentially hired some boutique AI firm, or some large conglomerate, it doesn't actually matter. They might have hired these people even internally, and they say to me, "Look, Vince, I have all this data. I've hired these really smart, these PhDs in math or statistics or physics or whatever the domain is, these really brilliant data scientists. I've turned them loose on the data," with usually no business problem in sight here, but again, a different podcast, "and they've come back to me. They got lucky." This is a small minority that do get lucky and find something meaningful. "And they made this beautiful deck," and I said, "Great, go put that in production, actually make it happen." And now it's 12 or 14 or 18 months later, and I'm still waiting. What's happened?

Or even in the minority case where they've gotten something into production, they're like, "How do I scale this thing? Now the service keeps falling over. I want it to go to a larger number of product lines or a larger number of routes, etc. How do I get that to work?" Is that, in essence, the same sort of story that you heard in the original dev world? Because I assume so, but?

Jason Goth (05:13):

It's exactly the same story.

Vincent Yates (05:15):


Jason Goth (05:15):

Right, so I love watching all the machine learning tutorials, or TensorFlow tutorials, or Caris tutorials. They get the mathematicians on, they have some data, they get it online from Kaggle or somewhere. They make all these cool models that make all of these accurate predictions. And then they all say the same thing. OK, now we'll just go get that in production. And then next we'll go look at another modeling problem in their video-

Vincent Yates (05:44):

And so to be clear, what they're not saying is like, "Click a button and now it's in production."

Jason Goth (05:47):


Vincent Yates (05:49):

What they're saying is like, "The rest is left to the reader."

Tip #1: Getting MLOps Intro Production

Jason Goth (05:52):

Right. Someone needs to get this in production, not me. And that's the real problem. What does that mean, getting it into production? And get it into production reliably, consistently, and repeatably? So, that's really the MLOps problem. And unfortunately, data scientists are brilliant at a lot of the data and math, but they are not typically trained as software developers. More importantly, they're not usually even in the software development group.

They don't even have the ability to go change the software and put those things in, or the access to go change the software and put those things in production. That's another team. Another team, which by the way, has another set of priorities with their own budget and constraints and deadlines and things that they're behind on. And they're not really interested or motivated or incented to go deploy these models. Now, that's not always true, sometimes they are. But for them, for a software developer, what's a model? How does it deploy? How do I use it? I've got to write some code to use this thing, and it's probably not going to be the four lines of Python script that you wrote in your Jupyter Notebook.

Vincent Yates (07:11):

Yeah. Well, and that's really interesting because my next question then to you is in the software domain where these people are trained as software engineers, of course, developers, that's what they do. In the early days of that journey to really incorporate CI/CD processes and DevOps processes back into the core workflow, was it that you effectively got some other group of experts in that domain and you pulled them into the projects? Was it that you retrained the people who were already doing that? Was it some combination of both? What did that look like and what might that mean then for the future of data scientists in terms of how do we get that rigor back into the core process?

Jason Goth (07:50):

Yeah. Well, today you can pull some experts in CI/CD, DevOps, and automation, automated testing, into your project and help train up those teams and get the right processes, the right tools in place. And at the time in the late '90s, early 2000s, there were no experts to pull in.

Vincent Yates (08:09):

Fair enough.

Jason Goth (08:11):

And so, yeah. There was a lot of trial and error. And a lot of what you see today is the result of a lot of trial and error. I do think there'll be some trial and error in MLOps before we find out what our good practices... I hate the term best practices, I'm like, mm, does that mean you've tried all of them and measured and you know this one actually worked best?

Vincent Yates (08:31):

You're saying it's just some local maximum, not global maximum.

Jason Goth (08:33):

Right. Exactly.

Vincent Yates (08:34):

OK. Fair enough.

Jason Goth (08:35):

And so we're going to have to continue to develop the tools and best practices around that.

Vincent Yates (08:42):

And I want to get to tooling in just one second, but before that, help me understand where do people, where do software developers, learn those skills? To your point, yes, there are some experts now. The average person hands on keyboards, do they learn it still? Where do they learn that? Is it all built in sort of codified within the infrastructure in the platforms or?

Tip #2: Getting the Right People on Your Team

Jason Goth (09:00):

So a big way that software developers learn to do that today is just by replicating what they have in front of them. And so most applications have some level of DevOps, continuous delivery automation, in place, and they replicate that. Either directly copy it, or use those patterns and practices and tools. And so, they learn by doing. And they learn by replicating what others done, which is by the way, the way a lot of software in general works, not just the DevOps piece, as well. And so, again, that goes back to the, we don't have a lot of really good implementations of that in machine learning for people to learn from. And so there are some tools, TensorFlow extended has some TensorFlow serving layers, Amazon, Google, the Cloud providers have some products to serve them, but it's not just a matter of serving.

Okay, I've got my model, I serve it up. Well, something still has to consume it, some mobile app. And a lot of times now they're like, we want to get the model served out on the Edge, on somebody's phone, or on some device, or in the web browser itself, not sending the data back.

Vincent Yates (10:16):


Jason Goth (10:17):

And again, there's products, and technologies, and tools to do that. Like, "Oh, here's something to run it, the model, on your phone," but still I have to get it there. I have to get the thing that runs the model on the phone. I have to get the updated model to the phone. And so those are all challenges that have to be solved, and they typically are solved by the engineering group because they have to be engineered into those applications.

Vincent Yates (10:43):

Yeah. So, if I hear you right, I think part of it is that data science teams, the savvy ones, need to tap some of the experts who understand this from the various near domain, effectively DevOps, and take some of those best practices. But perhaps rather than trying to reeducate, if you will, or in this case, the first time educate, data scientists on some of these best practices, the majority of the focus early days should actually be on the frameworks, the paradigms, the sort of reference architectures of how to do this well, is that right?

Jason Goth (11:14):

Yeah. I don't think that it's going to be really effective to go try to train up a bunch of data scientists as software engineers, and deployment engineers, and operational experts. I think we'll need to build that into the products, by extension, into the current deployment pipelines, and just have the data scientists be users of that.

We talk a lot about cross functional teams, having multiple roles on one team. I think that there will need to be people on the team that take care of that and provide a platform, if you will, for the data scientists to use, to then push those updates. Portions of that are what some of the Cloud providers have created with things like SageMaker with Amazon, where you can publish your models and it can serve them up. You still have to interact with them and you still have to, we talked last time about well, you might want to A/B test a model. And so you have to build that A/B testing framework. You have to measure it. You have to measure those things. And so all of that needs to get engineered in for the data scientists to use.

Tip #3: Building the Right Processes

Vincent Yates (12:23):

OK. So we've talked a lot about the people side of this, which is the data scientists aren't necessarily trained in the right domain, but we have people who are, and they just need to be pulled into that process. And part of the development process, but also the infrastructure around it, and really the platform that it sits on. Let's assume that those people are part of the project now, you understand sort of how you're going to deploy this someday. You understand how you're going to get the data to some degree. Let's talk about the process. What does that actually look like from a process standpoint?

If I just kicked us off here a little bit, I'm imagining now a world in which data scientists say, "Okay, I want to start with the business problem," which is again, different podcast, but let's start with the actual business problem here. What do we have to actually enable?

And then, it's funny, actually I'll tell an anecdote real quick. Recently, Jason and I were attending a conference by a tech provider and they were giving a demo of their platform, a machine learning platform. And in this demo it was really demonstrated how easy and quick and fast you can build these models. So to Jason's point earlier, this person had downloaded some data from Kaggle in this case, just straight to their laptop. They then opened this up into Excel, effectively, and started modifying a bunch of stuff. Columns weren't quite right, they weren't in the right format. There were some weird missing values they wanted to replace, and they just did "find and replace"... really, really simple. And then they showed you, "Oh, well great. Now that you've done all this work in Excel, you can just upload it back to the platform and we can go do a bunch of machine learning on this, and that's really cool."

The challenge is, if you go back to our assertion, which is it's not machine learning unless it's in production, you've now taken the first step of the process and made it so it's not repeatable. There's no code, there's no tracking of what in the world you just did. You can't imagine a world in which you get more data in the future and you have to go do the same thing manually? Which means what, you can retrain a model anytime, or you can go do inference on a model. Anytime somebody sitting in front of their laptop downloads and does those same corrections over and over. And really the mentality in this approach is to say, "Look, I've done this thing. Now it's somebody else's problem to figure out how to productionalize it. They have no record of what I've done. I have no record what I've done." It's prone to errors. It's obviously manual. It's not really production worthy.

And so that anecdote is to illustrate the first point that I wanted to make, which is, it seems if you want to do this in production, if you walk in the door assuming this is going to make it to production, you have to build a process that assumes that from start to finish. Meaning every bit of data cleaning, every bit of feature engineering, every bit of machine learning needs to be auditable. It needs to be repeatable. It needs to be versioned. It needs to be transparent so that when you actually want to get to production, you have a physical record, or at least a digital record, I guess, of what things have taken place. Is that right? From your mind?

Jason Goth (15:06):

Yeah, I agree. A hundred percent. The example about, well, let's get the data from this spreadsheet in this spreadsheet. I'll rename it. We'll pull some of that data over there. I mean, that data gathering is the first step. And somehow we got to clean that data and get it combined and all into some place where now a data scientist can go do feature engineering and do their analysis, and that kind of thing.

Well, that stuff needs to be automated, and repeatable. Because let's suppose, one of the things that is going into that is your sales data, for example, you're trying to predict customer behavior so that you can predict what things to offer. Well as customer behavior changes, which it inevitably does, you have to get that data in. Well, if every time you want to make a change, you've got to go manually adjust that, then your chances of getting to the same answer again is pretty low. And it also adds a time, there's an accuracy there, but there's also a timeframe perspective. It just takes a lot longer.

Vincent Yates (16:10):

Right. And the other problem is I actually have worked at companies where data scientists were on call. So if you built the model that made it to production, you were responsible for the uptime and reliability of that model. And if things went awry, well, you better show up in the middle of the night and figure out what's gone wrong. And I'll tell you that process where these things are done manually creates another problem, which is debugging becomes effectively impossible, because you don't even know what data was used to train said model. You don't know if this was an edge case, if you had a bad sample. In some future world, you can imagine somebody actually giving you poison pills, they're poison pilling your model, meaning giving you bad data intentionally so that your model does something weird. You wouldn't have any clue or any way to reproduce those things. So, that's definitely part of it, too, if you think about total life cycle there.

Now, in fairness to everything I've said so far, the person in the demo did something very reasonable, and I see a lot of data scientists doing it. So it's worth just quickly mentioning why they do that. And the answer is it turns out doing data clean up in Excel, for example, or Google Sheets, or whatever you use, is actually terrific. It's really easy. It's really nice. It's really convenient. These tools have been developed over decades and decades now to make a lot of the stuff that we're talking about really, really simple. And so I'm curious from your perspective, Jason, in the DevOps world, is there something similar where there's just, it's so much faster to not create an ETL, to not actually create a table, and to not write all the SQL statements, but rather do find replace. Is that the same, or have the tools caught up and made it just as easy to do it in a truly repeatable, robust way as it would be to do in Excel, in my example?

Jason Goth (17:56):

Yeah. I think there are tools. You can do it. You can build... You can do some scripting, even in Python, a language data scientists are familiar with. There are plenty of libraries to go, open Excel, rename column. You can do that as a two line script. There are some tools, our friends out in the Bay area at SnapLogic have some tools that can build pipelines to repeatedly update process, transform data, and put it somewhere, like in an S3 bucket, that then can be used. You don't have to do it every time. You can then use it in that clean format. If you just have a way to clean things and detect anomalies. And it may even be another system, ML-based to detect anomalies and strip them out.

But once you have that, okay, well now I've got a clean set. Now that set of clean data, the data scientists can go hog wild on. Because they're just essentially reading that, they're not changing it. And that's a big, if you think about the whole process, there's the gathering the data, there's cleaning it and getting features extracted, and other data test sets extracted and that kind of thing. And then those exist as artifacts that then you can use for analysis. Now that analysis is going to always be very ad hoc, which is great, because it's analysis. Go for it with Jupyter Notebooks, and I'm sure you prefer R, than Jupyter, or Python, Julia, you name it. But then, okay, well eventually we come back to we have to produce another artifact, which is the model.

Now that artifact then has to be tracked and versioned and all of those things. And then it has to be tested, which is another topic we should dig into. How do you test a model where you don't know what the right answer's supposed to be? So there's testing it for that it works, and there's testing that it works correctly. Here, I'm talking mostly that it just works. That it gives an answer.

Vincent Yates (19:55):


Jason Goth (19:56):

Not necessarily the right answer. And then we have to promote that out into production, which could involve copying, there's engineering to consume that, which again, that goes through the normal software development. And then monitor that, and provide that data back so that then we can go determine, okay, do we need to make changes, adjustments to the model? There's always going to be that analysis modeling step, which is going to be very manual. And I think that's fine. I don't think... It's those other steps that need to be automated, and they're typically not.

Vincent Yates (20:29):

Yeah. That analysis step for you data scientists or statisticians who might be listening, I think we typical call EDA, or exploratory data analysis. And this is sort of the crux of it, because you're exactly right. So if you think about just the way we think about data, that it's an industry now. We have these ideas of you have your sort of raw data, you have some slightly cleaned up data, and we kind of give these labels of silver and gold and platinum or diamond or palladium, or who knows. I'm making stuff up at this point, but we sort of give it some value here. And I think part of what you said is okay, well, once your data's cleaned up, we can give it a good label of gold, for example, or diamond in some extreme case.

And that's what we leverage. And I think the pushback from a data scientist potentially well, look, that's awesome. And I'm happy to use that data. The challenge is that data may not be everything that I want to use. In other words, there might be signals in some other data source that we've never explored before that I want to just see is this actually predictive of the behavior that I care about. And so in the case of sales, maybe I want to get usage shade that we've never looked at before. I want to get marketing data that we've never really used before, hopefully you're using your marketing data, but conceptually there might be other data sources.

And so, then this challenge becomes, well, geez, do I want to spend all this energy integrating these new data sources and cleaning them for something that may not, in fact most times probably won't, pan out to be anything material and I won't ever use this pipeline again. Or do I just want to go grab some data dump, conceptually, mess around with it, and kind of explore. And this is where I don't know if there are lessons to be learned from the DevOps base of like, that's really just intuition and kind of an art.

Are any rules of thumb that you guys developed? When do you do something just locally on your laptop versus trying to figure out how it's going to integrate with the rest of the stack? What does that look like?

Jason Goth (22:17):

Yeah. DevOps engineers have an expression, "Don't do it twice."

Vincent Yates (22:22):


Jason Goth (22:24):

If you want do it once there's some data set, I don't know if it's going to work or not, let me just go pull it, see it. Yeah. Okay. Well, there does look like something here. Well, that's the point where okay, I'm going to have to use this data and now I've got to start cleaning it up, and cataloging it, and those things. Like I wouldn't never do it manually twice.

Vincent Yates (22:44):

Got it. OK. That's a good rule of thumb. I like that. So in other words, if you have new data set, the first time you do it, great. Go for it. Just go grab it, download it. CSV it if you have to, I would encourage you to some form of code SQL or Python or something, not Excel. But whatever, you do whatever you need to do, in other words, to go fast and figure out, is there value here? Don't waste a bunch of time making it super auditable. But then to your point, the second you find it's useful, don't ever do it again. Do it the right way the second time.

Jason Goth (23:11):

Yeah. Then you need to go back and I would even say redo it.

Vincent Yates (23:14):

Yeah. That's good. I like that.

Jason Goth (23:17):

Because if you didn't redo it, then you'd have to do it again twice.

Vincent Yates (23:21):

Fair enough. Yeah. That's a good point. So now we're in a place where we start thinking about the problem of developing models from look, we have to have the right people. And that includes somebody who understands this DevOps. We have to have a process that, from inception, assumes this thing's going to make it to production, and we're making choices appropriately. The last bit here, and this is sort of the last bit of process starting to push into the platform or technology powering some of these things, is really around the fact that machine learning is kind of unique in the software domain. And you can correct me if I'm wrong, Jason, now this is a little bit of speculation. I didn't actually research this, but it's my intuition that it's unique in that if you build a data science model and let's say that you build it perfectly, whatever perfectly means, you've modeled the problem really, really well.

That's awesome. And you've done all the other stuff we've talked about so far. This is the only domain in which that model has a effectively unknown to you, but defined, expiration date. Meaning that there is concept drift, data drift, these things go on, people's behavior fundamentally shifts. And a model that's really predictive today, at some point in the future, and you may not know when, but at some point in the future, that model will no longer be predictive. It will certainly not be as powerful as it is today. And so the other half of this that I think makes it unique is that we know from inception, if you change nothing else in the system, you freeze the code of the entire system, we know this thing will break. It's just by definition, it will break. And so I don't, I think it's unique is that unique, first of all? Or there other pieces in software where that's kind of true, too?

Jason Goth (24:59):

That does still happen, but it's much less likely.

Vincent Yates (25:01):

OK. Fair enough.

Jason Goth (25:03):

And it's not guaranteed.

Vincent Yates (25:05):

I'm not as special as I think I am?

Jason Goth (25:06):


Vincent Yates (25:06):

I'm not the snowflake?

Jason Goth (25:09):

No, but it generally revolves something like a hardware failure.

Vincent Yates (25:12):

OK, fair. Yeah. Fair, fair. Hardware does fail sometimes. Okay.

So given that the fact that this is a certainty, this is not probabilistic, this is basically a guaranteed probability of effectively 1 this is true. How do we start thinking about a process or a technology to start addressing that from, again, inception, because we know it's going to make to production. We know that once in production, assuming nothing else changes, the data power, again, the model, the concept will drift and therefore we have to update it.

Tip #4: Monitoring Your MLOps Model

Jason Goth (25:39):

That to me is all through monitoring. It goes back to the thing we keep talking about here, you're doing these things for a reason. We have a goal. The goal is to increase conversion. Add new things, cross sales to the cart. Whatever that goal may be, we can measure that. And we can start to determine, is this still working? We can see a trend line of cross cells are adding to the cart, it's 1%, 1%, 1.5%, great, good for us. 1.5%, 1%, 2%, 2%, 0%. Okay. Well, something... That doesn't look right, and something has changed. Now maybe-

Vincent Yates (26:21):

And by the way, let me-

Jason Goth (26:21):

There may be some-

Vincent Yates (26:22):

Let me interject one second. Great example by the way, and you can keep going, but 0%, something's not right.

I would also say, and this is the thing that I think a lot of people are biased against saying is plus 10%, also something is probably not right. If it's been hovering around 2%, and you see it go positively. I think we're too keen to ignore those things. If we see it go negative, we're always very hot on top of it, but be equally cautious of things that look too good to be true. Sorry to interrupt.

Jason Goth (26:48):

No problem. I want you to dig into that because I'm like, that sounds great.

No, but if you do see these outliers, it's time to start looking, is it time... And again that's back to some of the engineering, well then you need, as a data scientist, to see what happened. And those are, it's not just, this is the conversion, but look at the data and actually determine what was driving that change in and be able to try to determine what was driving that change in behavior. And so that's data that you need as well. And if you have to design a way, just like for developers, when things go wrong, we have a way to debug.

We have logs, we have debugging that we can turn on, that needs to be present for the machine learning. And this is where how those things get consumed really determines what it is that you need to start to do to log and debug. If something gets consumed by copying the model and you have some code that calls it directly, well, then that thing needs to be logging it. If it goes to a platform and that platform serves it up via an API, then we have to have that platform expose what it's doing. And so there's no general rule about what that solution might look like, but it does need to be there. And that usually involves working, again, cross-functionally with the developers and the data scientists to say, "What do you need to be able to see to tell me if this thing isn't working?" and we'll build that in.

Vincent Yates (28:26):

That makes sense to me. We have, and we will have her on to give a more in depth talk here, Amanda Aschenbrenner is one of our Senior Architects working on some of our MLOps reference architecture. But to this point, one thing I'd love to dig into a little bit is this concept that we have in there, because again, we understand this is going to make a production. We understand it's going to break. We have to retrain. We sort of take that from inception and say, okay, well, how would we approach this problem differently, knowing that this is a necessary part of the overall life cycle of this? And it's really around this concept of a model factory.

Do you maybe want to just talk about what a factory in general means in computer science terms and how that might relate to this?

Understanding the Model Factory

Jason Goth (29:05):

Yeah. A factory is something that, well, generates something, builds something out of some raw materials. And so you can think of a software pipeline as like I've got source code and I build a running system. I take that code, I compile it, I copy it out to the servers. We'd call that a pipeline or a factory. And so as we have these models, those models have lots of may have lots of parameters with them. We may need to keep where they go, to which servers they go, we may need to keep what percentage of servers are going to try it out and tell those servers, "You need to try this out and we're going to measure that and make sure that it doesn't explode," in production before we roll it out more broadly, that kind of Canary rollout.

And so all of that metadata needs to be captured and there's a lot of process then can then goes like, OK, well then let's put it out to the two servers, out of the 10. Which two? With which parameters? And that in itself has to be repeatable. And so that idea of scripting that, we call a model factory, it takes and it makes those models as an input, as a raw material, and updates the running system on the other side with it. We do need to probably get Amanda to talk more about the reference architecture, but that's a key part of it for us. And that's something that you don't see in a lot of other reference architectures.

Vincent Yates (30:29):

Yeah. And I think the idea here is exactly like you said about the data initially. So we said, look, if you understand that you're going to be using this data on an ongoing basis, maybe to the point you do it once. That's fine. Don't repeat yourself. Don't do it a second time. When you actually need to think about it from inception about, how do we do the entire exploratory phase, or the actual first prototype of this phase, in a way that's reproducible?

The same must also be true for the actual training and ultimately a retraining of the model. So if we're capturing all of the signals, all of the metadata effectively, about what data goes into this model? What did we classify again? Did we do KFold? How many Ks? All of that stuff is not only helpful when you want to go back and just audit some work you've done, but it's also helpful in order to let the machine basically retrain on its own. And so again, we can sort of automate that from inception if we're thoughtful about what do we need to capture in order to understand how does this model get trained and what kind of cluster does it need? How many GPS or CPUs or instances, how much memory, et cetera, makes that entire retraining process... We know we're going to do it again. So we can capture that initially and allow us to that really, really quickly, the second and third and fourth and 10th time.

Jason Goth (31:38):

And the testing of it. Testing of it, you do want to have some separate data that you can test with, separate from your training data. And well, we need to do that. And we need to validate that this thing looks reasonably correct before we then go push it in. So that whole, from the data analysis phase to where we have something where we want to say, yeah, you used the term, "push the button," let's get it out into production, or let's notify something we need to retrain and push that out in production. Having that all automated is a really important point. And that is something, frankly, that I would not do manually once. Because if you're going to get any value, that is certainly you're going to have to do multiple times. Even to get value out of the first one, because you're never, the model's never going to be right the first time.

Vincent Yates (32:25):

That's interesting. That's a really good insight. To that point. And we've alluded to it a few times. I want to talk about how do you sort of evaluate the effectiveness of a model? And really the element here that I want to capture is the feedback loop. I think this is often overlooked for a lot of people and they think, well, hey, we're going to go build a model and it's going to predict customer churn, for example. Or it's going to predict which shirt to recommend you as you're checking out in the shopping cart. And those are all fine. Those are good. How do you know if it was effective though, is the part that I think people don't actually think through and the bit here, it's actually not terribly complicated, doing it well is actually a bit complicated, which is how do you done really good metrics?

And that's probably a whole podcast on its own. That goes into incentives and organizational alignment. And are you measuring the right thing? Do you have sensitivity around it? There's a great article recently, I just shared on LinkedIn, that effectively, if you're Amazon, if you choose revenue as a metric, for example, it seems very natural. Like, hey, we're going to A/B test on revenue. That seems natural. We show in the math there that you actually can't even detect a $10 million change, positive or negative. You can't even detect a $10 million change to revenue.

But again, my point here is less about how do you choose right metric, but rather the feedback loop itself. So are you actually... Do you have a mechanism to actually capture the decision based off that model? Does that make sense to you, Jason?

Jason Goth (33:44):

It does. And my answer to, "How do we determine the effectiveness?" is "ask Vincent." But, Vincent will need a lot of things. And what do you need to be able to do this, to determine if it's effective, to be able to determine if it's working, to determine if there's a problem? That's where we need to work together to say, "Okay, well, I can build a way to get you those things reliably and repeatably."

Vincent Yates (34:12):


Jason Goth (34:12):

I'm certainly not the data modeling expert. We can work together on what the best way to measure that is, but as the engineer, I can build you a way to get that metric. It's probably more on the data science teams, I think, to determine here are the things we need to measure.

Vincent Yates (34:27):

Yeah. I think that's right. And I think that really comes back to, more often than not, and this isn't a hundred percent true, but more often than not, when you're doing machine learning in production, it's really driving some customer experience. Whether suggesting the right product, or giving the right price or, showing them the right sort of information on their educational journey, for example. A lot of that, most of that, if not all of that requires typically a different data set entirely.

So let's just say you're doing something about pricing. Let's say you're Tesla. They launched a new product recently, it's been six months now, but still relatively new, around insurance. And so you can imagine that there's a bunch of people all behind the scenes, actuaries and data scientists, working on how do we price this insurance?

And the question becomes great, we have a new algorithm, we have a new way of pricing it. Was it right? Or was it wrong? And in this case, you could, again, you could sort of measure it directly in some sense, but it's going to be in a different system, almost guaranteed, going to be a different system. What are your claims data? It's probably different than what your actual customer marketing data, or what price we showed you data was. Your CRM [inaudible 00:35:27] did you buy it? Some other data sat in a different part of the organization entirely has that bit about, did you get a claim? Did you not get a claim? What's the lifetime value? And the second part of that example is that the feedback loop there is slow. It's really, you can imagine that people get in an accident pretty infrequently, hopefully.

Once every few years. If I develop a model that says, "Hey, I'm estimating how times you're going to have a collision based off you, Jason, and your driving behavior. My model says this, I'm going to give you this price." It might take me 2, 3, 4 cycles of that to understand, did I have it right or wrong? And if those cycle times are on the scale of years, you're going to have no idea if that model's good or bad for five, 10 years, and that's not going to work. And so that's where comes down to the actual details here do get a bit nuanced, and can be a bit challenging, but just thinking through, "How am I going to measure this for real?" I think is often overlooked.

Jason Goth (36:18):

Yeah. I agree. I was flying home from Chicago yesterday and I logged onto the mobile app to get my boarding pass, and it was like, "Hey, do you want to take an earlier flight? It's like $400, $500, $600, pick one. And if you're the lowest we'll call you." Well, how do you determine what price to show people? And what does the success look like there? I think the success probably looks like we paid the least amount of dollars to keep the plane from being oversold. And so that's probably different than most, because you're trying to be like, "How do I minimize this?"

Vincent Yates (36:53):

Yeah, no, that's exactly right.

So again, if we think about where we are on this journey, we have to have the right people. And that includes people who understand this domain and we have to have the right process, that is, we're going to assume from day one, it's going to make it to production. And that has some implications about how we design the system, what tooling we use. And we have technology to enable this. You've named SageMaker, for example, we've talked about the model factory concept. How would you build all that in? And then ultimately, how do you measure the outcome there?

Back to the original question, which is we get people coming to us saying like, "Hey, I've gone down this path. It's not actually working right." And it's not working right because I think they've skipped at least one of the things we've just now talked about. Let's assume then that somebody's new to this space, or somebody wants to try something different. They don't want to have to make the same mistakes everybody else has made. Where would you, Jason, suggest is the starting place for this journey to actually be successful? Do you start with the people? Do you hire them? Do you start with the organizational structure? Do you start with the technology? How do you approach that?

Jason Goth (37:56):

That's a great question. Again, I think you got to start with what are you trying to accomplish?

Vincent Yates (38:00):


Tip #5: Starting Small, and Only Use ML to Solve a Real Business Problem

Jason Goth (38:01):

And so many times when we see people that are struggling to get things in production and get value from them, they usually start with people. I have the best data scientist. I have the best data modeler. OK, well, what problem are you having them solve? They're going to figure it out. They'll figure out my problem and they'll figure out the solution. That's somewhat of an anti-pattern, I think.

Vincent Yates (38:26):

Yeah. I agree.

Jason Goth (38:27):

You got to give someone a problem that you're going to try to address, and measure the outcome. And if you do that, that tends to focus things down small. I think the problem with building all of a full MLOps pipeline, or model factory, and all of these things is, well, if you try to do it generically, like this will handle anything, and any problem, and any solution we're going to throw at it. You're almost certain to fail. That's like you're biting off a huge chunk.

But if you take the ML problem itself and make it very small and targeted, and we're going to just figure out the right price to make you get off the plane. Well then it's a very focused MLOps solution, or whatever. Like, okay, well we just need to figure out how to get this piece to the mobile app and get that result back.

All good architectures, this is probably another topic for another podcast, but all good architectures, I think, evolve. They're not designed, they evolve like anything else. In MLOps, solutions are the same. They start small and they evolve based on the pressures and forces around them. I'd be very wary of starting and saying like, "Hey, I know the right machine learning Ops, or MLOps, or model factory solution, to solve these problems. Let's go build all of that in the next six months." And then your data scientists will be publishing updates daily, that's unlikely. That's unlikely when we do it with software, too. We tend to build these things small and scale them up.

Vincent Yates (40:04):


Jason Goth (40:04):

And let them evolve based on external forces.

Vincent Yates (40:09):

Right. That makes a lot of sense to me, and I think the way that I would say it is, I always like to start with the business problem. Because it turns out, we've talked before about keeping it simple. That is the goal here. And what you find is if you start with the business problem, actually you could build a really simple model that performs pretty well. And you don't have to have all of this other infrastructure initially. And the beauty of doing that is you can sort of test and learn the processes. You can test and learn the cultural, and elemental, organizational changes that you're going to have to go through here, and you can keep iterating.

And by the way, now that iteration's probably funded by the first project. So you might deploy something as simple as logistic regression, which requires almost none of what we've talked about today, but it drives real value. And you can do it quickly, and then you can take that value and then reinvest it in making this more and more robust. And so platform-first approaches, I think, are tempting. And I think that a lot of CIOs seem to enjoy them and like them, because they're technologists at their core usually. But it's usually not actually the way to drive material value.

Jason Goth (41:07):

And that's true for most things. Platform-first tends to be a problem. There's always someone that is there to sell you a platform.

Vincent Yates (41:14):

That's right. Yeah. That business model does work. Well again, hopefully this was helpful to our listeners today. I think we covered some good stuff. We'll do more on this topic in an upcoming podcast. Jason, thanks for coming back from Argentina. I appreciate it. We missed you while you were gone, but it seems like you had a good time. For those of you who would like to learn more, please visit the Insights page at Thank you for listening, and I hope you'll join again.

Conversation Icon

Contact Us

Ready to achieve your vision? We're here to help.

We'd love to start a conversation. Fill out the form and we'll connect you with the right person.

Searching for a new career?

View job openings