5 Simple Steps to Help You Win With Your Next Machine Learning Project
Photo by Andre Hunter on Unsplash

5 Simple Steps to Help You Win With Your Next Machine Learning Project

Kanban board...check!

Project manager...check!

Tensorflow installed...check!

And...your data science project is good to go.

Or is it?

Data science, machine learning and deep learning are changing the world as we know it. Whole industries are being reshaped. Business processes are being transformed. And new startups are sprouting left, right and centre.

You're probably thinking...I need to get started with this.

And you're right. You absolutely should.

But before you do...there's a few insider tips that you can use to make sure you succeed.

Want them?

................................read on!

1. Break up the tasks

The first thing you should ask is whether or not you need machine learning at all. Yes...I said it.

Do you really need machine learning?

There are a lot of problems that machine learning can solve but you should always step back and evaluate the problem without the solution in mind.

Working with machine learning can be attractive but it's just as important to be practical.

For example if you're trying to create a driver based model you might be better off with Planning Analytics or Excel. Or if you've got constraints and goals then you might actually be looking at an optimisation problem so you might be better off looking at using a solver engine. Either way it's an important question to ask.

Assuming the problem has some machine learning components to it...the next task is to break down the problem.

Say you wanted to build a Self Driving Car.

If you took that problem and broke it down into it's parts it might look a little like this:

  1. Collect video of cars driving
  2. Split video into image frames
  3. Label images where pedestrians and signs are noted
  4. Use multi-task deep learning to detect pedestrians and signs
  5. Deploy the deep learning model
  6. Use live cameras in the car to predict when there pedestrians or signs
  7. Use motion planning to tell the car where to go
  8. Steer the car based on rules and motion planning
  9. Start again at Step 6.

You can see here that there's actually a bunch of steps involved to create a self driving car. And in this case, there's only really one where machine learning is involved (Step 5). (Yes, it's more complicated than this but the example is for simplicity).

This is common in practice.

Machine learning is one part of a larger system. It's important to break your problem up into the required steps in order to work out how you're going to tackle the problem as a whole but also so you can see how it's all going to piece together.

Non End-to-End Deep Learning vs End-to-End Deep Learning

N.B. There is a field of machine learning known as end-to-end deep learning where you're able to take raw inputs and provide the ideal output and have the model learn the relationships, this however requires a lot of data specific to your problem (which is typically hard and expensive to get).

2. Create a definition of success

In agile project management there's huge emphasis on the definition of done.

This is partially because that's when you can move the Trello card from Doing to Done (I kid) but mainly because that's when the requirement is deemed fulfilled by everyone that signed up to it without limitation.

I'd say that having a definition of success is just as important when tackling a machine learning project.

Why?

Think about this example.

Say for example you're building a daily time-series forecast.

You get some data, you train a model.

It turns out that it has a mean absolute error of about $10,000.

In the worst case this might equate to $310,000 of error over the entire month.

Now if the existing finance team is already forecasting sales with an error of about $200,000, your ML model isn't going to get too much traction now is it?

Think of the definition of success as the statement that's going to make the model worth investing in. Typically the definition of success has to be better than your current baseline and should be set before the project starts.

Here are a few examples:

  1. The model allows the Finance team to predict monthly sales with an MAE of $1,000
  2. The model predicts clients that are likely to default with an F1 score of .96
  3. The model translates call centre conversations with an accuracy of 93%

3. Gather the subject matter experts

People that already working within the business are your friend.

If you have users that already experts within the business, start recruiting them.

They often know the data better than anyone and can help you with all the phases of the data science pipeline, but especially those that are most time intensive.

No alt text provided for this image

About 80% of machine learning is spent cleaning data and creating features that explain the target variable. More often than not if you have users that are already working and analysing data you can leverage their wisdom to help accelerate this process.

This is particularly important when working with structured data!

4. Build a model as fast as possible

Planning is important, don't get me wrong, but action is probably more so.

Whilst you're still in the project initiation phase it's a great idea to try to get your hands on some data (NDA's and other legal stuff aside) and try to build a model.

If you take a look at the CRISP-DM model there's an arrow that goes from Evaluation back to Business Understanding. This is because it's meant to be iterative.

No alt text provided for this image

The sooner you're able to start, the faster you're able to evaluate what the model looks like. This goes against traditional wisdom but helps you begin to ask the more pointed questions like:

1) Whether or not you need more data?

2) Whether you have the right data to solve the problem

3) Whether the problem is able to be solved with the approach you're currently taking

Get out there and build something!

N.B. Low-code or no-code tools like AutoAI help you do this a whole lot faster if your coding isn't quite up to scratch!

No alt text provided for this image

5. Leverage work that has been done before

Unless you're working on cutting edge research there's pretty high chance that the machine learning problem you're trying to solve has been solved before. If that's the case, there's an even higher chance that you're able to leverage the work of others.

The machine learning and data science community tends to share a lot, so make use of that and help launch your project with a bang. There are three key areas where you can leverage existing work;

Architectures: A large part of deep learning is ensuring you get the architecture right. Most state of the art architectures are open and out there you just need to know where to look. My favourite place to start is over on the Keras Github Page.

No alt text provided for this image

Transfer Learning: Data Scientists across the globe have gone to huge efforts to train models that are hyper accurate. You can leverage these pre-trained models and simply retrain the final layer(s) of your model for your specific use case, this is known as transfer learning. This cuts down on training time and resource usage (and ultimately saves project budget).

White-papers/Research: Arxiv is an absolute gold mine when it comes to research. You can find the collective wisdom of the academic community but also that of practical machine learning practitioners all there.

And that's a wrap. Hopefully one or more of these tips helps get your project off the ground and kicking goals. If there's any I missed, drop a suggestion in the comments or message me I'd love to know!

Karen Hardie

Data and AI Partner Specialist at IBM | Helping customers get Insights and Value from their data

4y

Great article Nick!

To view or add a comment, sign in

Insights from the community

Explore topics