Machine Learning Algorithms Notes - Class 11 AI (843)

Boost your exam preparation with highly structured Machine Learning Algorithms Notes for Class 11 AI (843), carefully designed for CBSE students with simplified concepts, exam-oriented explanations, important definitions and quick revision material for last-minute preparation.

Contents hide

1. Machine Learning

2. Types of Machine Learning

3. Supervised Learning

4. Regression

4.1. Correlation: Foundation of Regression Analysis

4.2. Types of Correlation

4.3. PEARSON’S r (Correlation Coefficient)

4.4. When Regression is NOT Suitable

4.5. REGRESSION

4.6. Linear Regression

5. Applications of Linear Regression

6. Classification (Concept)

7. How Classification Works

8. Types of Classification

9. K-Nearest Neighbour (KNN) Algorithm

10. Clustering (Unsupervised Learning)

11. Uses of Clustering

12. How Clustering Works:

13. Types of Clustering Methods

14. K-Means Clustering

15. Steps in K-Means Clustering

16. Applications of K-Means Clustering

17. Advantages of K-Means

18. Limitations of K-Means

Machine Learning

Machine Learning (ML) is a part of Artificial Intelligence that enables computers to learn from data and make decisions without being explicitly programmed.
Instead of following fixed instructions, ML models learn patterns and relationships from data and can make predictions on new data.
ML works with different types of data such as images, text, sensor data, and historical records.
Common ML algorithms include decision trees, neural networks, and support vector machines.
ML is used in many real-life applications like recommendation systems (Netflix), speech recognition, medical diagnosis, chatbots, fraud detection, and self-driving cars.

Types of Machine Learning

Machine Learning is mainly divided into three types:
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning.

Supervised Learning

Supervised learning is a type of Machine Learning where the model learns from labeled data (input with correct output).
It helps the model predict results or make decisions based on past examples.
There are two main types of supervised learning:
- Regression → works with continuous data
- Classification → works with discrete data

Regression

Correlation: Foundation of Regression Analysis

Regression is used to predict continuous values such as price, height, or temperature.
It is based on the concept of correlation, which shows the relationship between two variables.
Correlation means how one variable changes with respect to another.

Types of Correlation

Positive Correlation → both variables move in the same direction (increase or decrease together).
Negative Correlation → variables move in opposite directions (one increases, the other decreases).
Zero Correlation → no relationship between variables.

Correlation Values

+1 → Perfect positive correlation
0 → No correlation
-1 → Perfect negative correlation

PEARSON’S r (Correlation Coefficient)

Pearson’s r measures the strength and direction of a linear relationship between two continuous variables.
It is important in regression because a strong correlation indicates a meaningful relationship between variables.

Key Conditions to Use Pearson’s r

Data should be on interval or ratio scale.
Variables should be normally distributed (approximately).
Relationship between variables should be linear.
There should be no outliers in the data.

Pearson’s r is calculated using the formula:

Range of Values

r = +1 → Perfect positive correlation
r = 0 → No correlation
r = -1 → Perfect negative correlation

Interpretation

r > 0 → Positive relationship (both variables increase together)
r < 0 → Negative relationship (one increases, the other decreases)
r = 0 → No relationship between variables

When Regression is NOT Suitable

No correlation: Regression is not useful when there is no correlation between variables, as they change independently.
Non-linear relationships It may not work well for non-linear relationships, as simple regression mainly captures linear patterns.
Outliers (extreme values): can affect the model and lead to incorrect predictions.
Unreliable: If basic assumptions (like linearity or no multicollinearity) are violated, results may become unreliable.

REGRESSION

Regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables.

Regression is a technique used to find and predict the relationship between variables.
It helps in predicting the value of a dependent variable based on one or more independent variables.
It is mainly used with continuous data such as height, temperature, salary, etc.
It is useful for prediction, forecasting, and understanding relationships between variables.

When there are two variables x and y, and y depends on x, it is called simple regression.
y (Dependent Variable) → the value we want to predict or understand.
x (Independent Variable) → the value used to predict or explain changes in y.
In simple linear regression, the relationship between x and y is represented by the equation:

Simple Linear Regression Equation

a → Intercept (value of y when x = 0)
b → Slope (change in y for one unit change in x)
e → Error term (difference between actual and predicted values)

Finding the Line (Regression Line)

Regression analysis finds a best-fit line or curve to show the relationship between variables.
It explains how the dependent variable changes with the independent variable.
The aim is to make the line as close as possible to all data points.
The Least Squares Method is used to determine this best-fit line.
It minimizes the squared differences between actual and predicted values.
This method helps calculate the slope and intercept of the line.
These values are used for accurate prediction of outcomes

Properties of Regression Line

It minimizes the total squared error between actual (y) and predicted (ŷ) values.
The regression line always passes through the mean of both x and y values.

Linear Regression

Linear regression is a basic machine learning technique used to show a linear relationship between variables.
It involves a dependent variable and one or more independent (predictor) variables.
If there is one independent variable, it is called Simple Linear Regression.
If there are multiple independent variables, it is called Multiple Linear Regression.

Types of Linear Regression

Simple Linear Regression: Uses one independent variable to predict the dependent variable.
Multiple Linear Regression: Uses more than one independent variable for prediction.

Applications of Linear Regression

Used in market analysis to understand relationships between factors like price and sales.
Helps in sales forecasting by analyzing past data and trends.
Used to predict salary based on experience, education, etc.
Applied in sports analysis to study player and team performance.
Used in medical research to analyze health-related factors.

Python Program for advance learner

import numpy as np
import matplotlib.pyplot as plt
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# Calculate mean and standard deviation
x_mean = np.mean(x)
y_mean = np.mean(y)
x_std = np.std(x)
y_std = np.std(y)
# Calculate covariance and slope
covariance = np.sum((x – x_mean) * (y – y_mean)) / (len(x) – 1)
slope = covariance / (x_std ** 2)
# Calculate y-intercept
intercept = y_mean – slope * x_mean
# Predicted values
y_pred = slope * x + intercept
# Plot data and regression line
plt.scatter(x, y)
plt.plot(x, y_pred, color=’red’)
# Add labels and title
plt.xlabel(‘x’)
plt.ylabel(‘y’)
plt.title(‘Simple Linear Regression’)
# Show the plot
plt.show()
# Print slope and intercept
print(“Slope:”, round(slope, 2))
print(“Intercept:”, round(intercept, 2))

Program Explanation

Imports NumPy for calculations and Matplotlib for graph plotting.
Defines sample data for x (input) and y (output).
Calculates mean, standard deviation, covariance, and slope.
Finds the y-intercept using slope and mean values.
Uses the regression equation to predict y values.
Plots actual data points and the regression line.
Prints the slope and intercept values.

Output

Classification (Concept)

Classification is a machine learning technique used to group data into predefined categories.
The main goal is to assign correct labels to data based on its features.
It works on labeled data, so it is a type of supervised learning.
The model learns from training data and then predicts labels for new (unseen) data.

Example

Sorting waste into categories like paper, plastic, metal, food waste.
Each type of waste is given a label, which is similar to classification in ML.

How Classification Works

Classes/Categories: Data is divided into different classes or categories (e.g., positive/negative).
Features/Attributes: Each data instance has features that help identify its class.
Training Data: Model learns from labeled data with correct class labels.
Classification Model: Algorithm builds a model to learn patterns from data.
Prediction/Inference: Model uses learned patterns to predict classes of new data.

Types of Classification

Binary Classification → Only two classes (e.g., yes/no, spam/not spam)
Multi-Class Classification → More than two classes (e.g., cat, dog, bird)
Multi-Label Classification → One data item can have multiple labels (e.g., movie genres: action + comedy)
Imbalanced Classification → Classes are unevenly distributed (e.g., fraud detection where fraud cases are very few)

Binary Classification	Multi-Class Classification	Multi-Label Classification	Imbalanced Classification
Two class labels	More than two class labels	One item can have multiple labels	Unequal class distribution
Email spam (spam/not spam)	Face classification	Photo tagging (multiple objects)	Fraud detection
Medical test (yes/no)	Plant species classification	Image with multiple labels	Outlier detection
Exam result (pass/fail)	OCR / Image classification	—	Medical diagnostic tests

K-Nearest Neighbour (KNN) Algorithm

KNN is a supervised learning algorithm used for both classification and regression.
It is a non-parametric method, meaning it does not assume any fixed data pattern.
It works on the principle of proximity (closeness).
It classifies a new data point based on similarity with nearest data points.

Why KNN is Needed

Useful when data structure is not clearly defined.
Works well when decision boundaries are complex.
Provides a simple and effective way to classify new data.

Steps in KNN Algorithm

Select the value of K (number of neighbors).
Calculate the Euclidean distance between the new point and all data points.
Identify the K nearest neighbors.
Count how many neighbors belong to each category.
Assign the new data point to the category with maximum neighbors.
Model is ready for prediction.

Applications of KNN

Image recognition and classification
Recommendation systems
Healthcare diagnostics
Text mining and sentiment analysis
Anomaly detection
Fraud detection
Outlier detection
Medical diagnostic tests
Photo classification (identifying objects like bicycle, apple, person, etc.)

Advantages of KNN

Easy to understand and implement
No separate training phase required
Works for both classification and regression
Handles noisy data reasonably well

Limitations of KNN

Slow for large datasets
Depends on choice of K and distance metric
Needs proper data preprocessing and scaling
Not suitable for high-dimensional data

Python program for advance user

# Importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing dataset
data_set = pd.read_csv(‘user_data.csv’)
# Extracting Independent and Dependent variables
x = data_set.iloc[:, [2, 3]].values
y = data_set.iloc[:, 4].values
# Splitting the dataset into Training and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.25, random_state=0
)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
st_x = StandardScaler()
x_train = st_x.fit_transform(x_train)
x_test = st_x.transform(x_test)

Clustering (Unsupervised Learning)

Clustering is a technique used to group similar data points together.
It works on unlabeled data, so it is a type of unsupervised learning.
Data is divided into clusters where:
- Points in the same cluster are more similar
- Points in different clusters are less similar
It finds patterns based on features like shape, size, color, behavior, etc.
No predefined labels are given; the model discovers patterns on its own.

Uses of Clustering

Market segmentation
Customer grouping
Image recognition
Data analysis

How Clustering Works:

To cluster data effectively, follow these key steps:
1) Prepare the Data: Select the right features for clustering and make sure the data is ready by scaling or transforming it as needed.
2) Create Similarity Metrics: Define how similar data points are by comparing their features. This similarity measure is crucial for clustering.
3) Run the Clustering Algorithm: Apply a clustering algorithm to group the data. Choose one that works well with your dataset size and characteristics.
4) Interpret the Results: Analyze the clusters to understand what they represent. Since clustering is unsupervised, interpretation is essential for assessing the quality of the clusters.

Types of Clustering Methods

Some of the common clustering methods used in Machine learning are:

1) Partitioning Clustering
2) Density-Based Clustering
3) Distribution Model-Based Clustering
4) Hierarchical Clustering

Partitioning Clustering

Divides data into non-hierarchical groups (k clusters).
Also called a centroid-based method.
Example: K-Means algorithm.
Each cluster has a center (centroid).
Data points are grouped such that distance within cluster is minimum.

Density-Based Clustering

Forms clusters based on high-density regions.
Can create clusters of any shape.
Separates clusters by low-density (sparse) areas.
May face difficulty with varying densities or high-dimensional data.

Distribution Model-Based Clustering

Assumes data follows a probability distribution (usually Gaussian).
Calculates the likelihood of data points belonging to clusters.
Example: Gaussian Mixture Model (GMM).

Hierarchical Clustering

Does not require pre-defined number of clusters.
Creates a tree-like structure (dendrogram).
Clusters are formed by splitting or merging data points.
Number of clusters is decided by cutting the tree at a level.
Example: Agglomerative Hierarchical algorithm.

K-Means Clustering

K-Means is an unsupervised learning algorithm used for clustering.
It divides data into K number of clusters.
Each cluster has a centroid (center point).
It groups data such that points in the same cluster are more similar.
The value of K must be decided beforehand.

Steps in K-Means Clustering

Select the number of clusters K.
Choose K random centroids.
Assign each data point to the nearest centroid.
Calculate new centroids based on clusters.
Reassign data points to the nearest new centroid.
Repeat until no changes occur.
Model is ready.

Applications of K-Means Clustering

Market Segmentation: Groups customers based on buying behavior.
Image Segmentation: Divides images into similar color regions.
Document Clustering: Groups similar documents for easy organization.
Anomaly Detection: Finds unusual data points (outliers).
Customer Segmentation: Helps in targeted marketing and personalization.

Advantages of K-Means

Simple and easy to implement.
Works well with large datasets.
Handles many features and data points efficiently.
Results are easy to understand.
Useful in multiple domains.

Limitations of K-Means

Depends on initial centroid selection.
Assumes spherical clusters (not always true).
Requires predefined number of clusters (K).
Sensitive to outliers.
May give suboptimal results.

# K-Means Clustering Program

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Generate synthetic dataset
X, _ = make_blobs(n_samples=300,
                  centers=4,
                  cluster_std=0.60,
                  random_state=0)
# Apply K-Means clustering
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
# Predict cluster labels
y_kmeans = kmeans.predict(X)
# Plot data points
plt.scatter(X[:, 0], X[:, 1],
            c=y_kmeans,
            s=50,
            cmap=’viridis’)
# Plot centroids
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1],
            c=’red’,
            s=200,
            alpha=0.75)
# Add labels and title
plt.title(“K-Means Clustering”)
plt.xlabel(“Feature 1”)
plt.ylabel(“Feature 2”)
# Show graph
plt.show()

This Program Performs the Following Steps:

Generates Data:
Creates synthetic data using make_blobs from sklearn.datasets.
Applies K-Means:
Performs clustering with 4 clusters (n_clusters = 4).
Visualizes Results:
- Plots data points with different colors for each cluster.
- Displays centroids as red circles.

Machine Learning Algorithms Notes – Class 11 AI (843) | Pointwise & Exam-Oriented

Machine Learning

Types of Machine Learning

Supervised Learning

Regression