AWS Machine Learning Foundations

Here are my notes from the a course on Machine Learning from the AWS Educate program.

Here's the course

Module 1 - Intro

Careers in ML

Module 2 - Introduction to Machine Learning Foundations

AI Overview

AI Terms

Classification and Regression Problem Types (used in supervised learning)

Classical Programming vs ML

Use classical programming when you can define explicit rules for a task

Use machine learning when you need to define rules based on data sets, and the rules cannot be coded.

Algorithm Types

Features and Weights

Features and weights are used in algorithms for training models.

Features are identified as important for accurate outcomes. For example it might be important to identify something as a hat, or not a hat. The model will use a 1 for yes and a 0 for no.

Weights can be applied to features. For example if a feature is more important in determining an accurate outcome, then it will have a higher weight.

Here's my attempt to demonstrate how features and weights work (based on the example in the course).

Step 1

Heres what the equation looks like, where x is the identity of the item and a is the weight:

f(x) = a * x
f(x) = 0.8 * 1

Step 2

Our equation then looks like this:

f(x) = a1*x1 + a2*x2
f(x) = 0.8*1 + 0.25*1 

The final equation looks like this:

f(x) = 1.05

You might determine that if the final is greater than 1 then the model should recommend that the customer purchase the hat.

Module 3 - Machine Learning Pipeline

A series of steps that are used to solve a business problem.

  1. Problem formulation
  2. Collect, clean, and label data
  3. Evaluate data
  4. Feature engineering - extracting more information from existing data to improve the model's predicting power
  5. Select and train a model (based on use case and business problem)
  6. Evaluate the model
  7. Tune model and return to feature engineer to repeat an iteration from there
    1. Does the model meet your business goals, if not return to feature engineering

Problem Formulation Identify a business problem that you believe can benefit from ML Articulate the problem and convert it into an ML problem.

Which algorithm type will you use (supervised, unsupervised, reinforcement learning)?

Do you have data that can be labelled? How should the data be labelled?

What data do you need to train your model? There are three types of data to consider:

Identify features (feature engineering) and weights.

Training data vs Validation data Don't use all the data to train your model, instead keep some of it away from the model so that you can use it to validate the performance of the model.

A general rule:

Choose a machine learning algorithm (supervised, unsupervised, reinforcement)

Evaluate and tune the model (iteratively)

The goal of training is a model that is balanced(?) and generalizes well

Finally, when you are happy with the model, you can deploy it in SageMaker

Module 4 - ML Tools and Services

The three layers of the ML stack:

  1. The data layer
  2. The model layer
  3. The deployment and monitoring layer

There are many libraries that are used in ML

AWS has instances that are optimized for ML

  1. C5 and P3 instances

AWS IoT Greengrass - brings intelligience to edge devices

AWS Elastic Inference instance - add GPU support to instances that helps with ML processing (and apparently save up to 75% of ML by saving on GPU over provisioning)

AWS ML Managed Services

  1. Computer vision
    • Rekognition
    • Textract
  2. Chat interface
    • AWS Lex
  3. Speech
    • AWS Polly
    • AWS Trnascript
  4. Fraud detection
    • Amazon Fraud Detector
  5. Language
    • Comprehend
    • Translate
  6. Reccomendations
    • Personalize
  7. SageMaker
    • prepare, build, train & tune, deploy and manage models
    • Sage maker can help you with every step in the ML pipeline

SageMaker Tools

A SageMaker notebook instance is a machine learning (ML) compute instance that is running the Jupyter Notebook application