Machine learning workflow

A complete machine learning workflow starts with a problem that you want to solve and ends with mechanism by which your customers can make predictions about new instances of similar data. To arrive at this end, you must work through four phases:

data exploration and preparation
model development and training
model testing and deployment
operational development and management.

These phases, and the steps within them, are iterative as you may need to reevaluate and go back to a previous step at any point in the process.

Data exploration and preparation

Machine learning is all about data. To build a good model, you must start with good data. You must analyze your data and assess if it can help you model your machine learning problem. In addition, you must prepare and stage this input data in ways that TensorFlow and Cloud ML engine expect. That might include:

Joining data from multiple sources and rationalizing it into one dataset.
Visualizing the data to look for trends.
Using data-centric languages and tools to find patterns in the data.
Identifying _features _in your data—the subset of data attributes in your raw data that you use in your model.
Cleaning the data to find any anomalous values that can be caused by data entry or errors in measurement.

Most of the work of preparing your data for use in your machine learning is about getting a consistent dataset and deciding how it might best be used to solve your problem. The final step in getting your data ready to use is _preprocessing. _In this step you transform valid, clean data into the format that best suits the needs of your model. Here are some examples of data preprocessing:

Normalizing numeric data to a common scale.
Applying formatting rules to data, like removing the HTML tagging from a text feature.
Reducing data redundancy through simplification, as when converting a text feature to a bag of words representation.
Representing text numerically, as when assigning values to each possible value in a categorical feature.
Assigning key values to data instances.

Cloud ML Engine services do not provide any data wrangling functionality. But, you can use other services such as Google Cloud Datalab, BigQuery, Cloud Dataproc and Dataflow for exploring and transforming your data.

Model development and training

Once your data is ready, you can begin to build models to solve your problem. A given dataset can be used to create many models, predicting different aspects of similar data. Even when you want to solve a specific problem, you can use many different approaches. The first model that you built rarely offers the best predictive performance. Model building is a very much iterative process. For each model you develop with your data, you'll go through three steps in order, though you'll frequently go back to previous steps to make corrections and refinements:

Develop your model using established or novel machine learning techniques.
Train your model by fitting it to a training data for which you already know the target values.
Evaluate your trained model by using data that, like your training data, includes target values. You compare the results of your model's predictions to the actual values for the evaluation data using statistical measures.

Model testing and deployment

During training, you apply the model to known data to find optimal parameters to achieve a desired predictive power. When your results are good enough for the needs of your application, you should deploy the model to whatever system your application uses and test it.

To test your model, run data through it in a context as close as possible to how it will be used in your final application. You should use a different dataset for testing than you use for training and validation. You should set aside a separate set of data each time you test so that your model gets tested using data that it has never seen before.

You make adjustments as a result of your testing. You can uncover problems in your model, or in its interaction with the rest of your application at this stage.

Operational development and management

In the previous step, your primary goal was to test your model in its real-world context. After you found a model that is working well, you put the model into production use. At this point, you need to be able to monitor the model's operation and manage the jobs and resources that is uses. From a practical perspective, this is the phase where development activity gives way to operations.

Machine learning workflow