|

Machine Learning Algorithms Notes – Class 11 AI (843) | Pointwise & Exam-Oriented

Boost your exam preparation with highly structured Machine Learning Algorithms Notes for Class 11 AI (843), carefully designed for CBSE students with simplified concepts, exam-oriented explanations, important definitions and quick revision material for last-minute preparation.

Machine Learning

  • Machine Learning (ML) is a part of Artificial Intelligence that enables computers to learn from data and make decisions without being explicitly programmed.
  • Instead of following fixed instructions, ML models learn patterns and relationships from data and can make predictions on new data.
  • ML works with different types of data such as images, text, sensor data, and historical records.
  • Common ML algorithms include decision trees, neural networks, and support vector machines.
  • ML is used in many real-life applications like recommendation systems (Netflix), speech recognition, medical diagnosis, chatbots, fraud detection, and self-driving cars.

Types of Machine Learning

  • Machine Learning is mainly divided into three types:
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning.

Supervised Learning

  • Supervised learning is a type of Machine Learning where the model learns from labeled data (input with correct output).
  • It helps the model predict results or make decisions based on past examples.
  • There are two main types of supervised learning:
    • Regression → works with continuous data
    • Classification → works with discrete data

Regression

Correlation: Foundation of Regression Analysis

  • Regression is used to predict continuous values such as price, height, or temperature.
  • It is based on the concept of correlation, which shows the relationship between two variables.
  • Correlation means how one variable changes with respect to another.

Types of Correlation

  • Positive Correlation → both variables move in the same direction (increase or decrease together).
  • Negative Correlation → variables move in opposite directions (one increases, the other decreases).
  • Zero Correlationno relationship between variables.

Correlation Values

  • +1 → Perfect positive correlation
  • 0 → No correlation
  • -1 → Perfect negative correlation

PEARSON’S r (Correlation Coefficient)

  • Pearson’s r measures the strength and direction of a linear relationship between two continuous variables.
  • It is important in regression because a strong correlation indicates a meaningful relationship between variables.

Key Conditions to Use Pearson’s r

  • Data should be on interval or ratio scale.
  • Variables should be normally distributed (approximately).
  • Relationship between variables should be linear.
  • There should be no outliers in the data.

Pearson’s r is calculated using the formula:

Range of Values

  • r = +1 → Perfect positive correlation
  • r = 0 → No correlation
  • r = -1 → Perfect negative correlation

Interpretation

  • r > 0 → Positive relationship (both variables increase together)
  • r < 0 → Negative relationship (one increases, the other decreases)
  • r = 0 → No relationship between variables

When Regression is NOT Suitable

  • No correlation: Regression is not useful when there is no correlation between variables, as they change independently.
  • Non-linear relationships It may not work well for non-linear relationships, as simple regression mainly captures linear patterns.
  • Outliers (extreme values): can affect the model and lead to incorrect predictions.
  • Unreliable: If basic assumptions (like linearity or no multicollinearity) are violated, results may become unreliable.

REGRESSION

Regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables.

  • Regression is a technique used to find and predict the relationship between variables.
  • It helps in predicting the value of a dependent variable based on one or more independent variables.
  • It is mainly used with continuous data such as height, temperature, salary, etc.
  • It is useful for prediction, forecasting, and understanding relationships between variables.
  • When there are two variables x and y, and y depends on x, it is called simple regression.
  • y (Dependent Variable) → the value we want to predict or understand.
  • x (Independent Variable) → the value used to predict or explain changes in y.
  • In simple linear regression, the relationship between x and y is represented by the equation:

Simple Linear Regression Equation

  • a → Intercept (value of y when x = 0)
  • b → Slope (change in y for one unit change in x)
  • e → Error term (difference between actual and predicted values)

Finding the Line (Regression Line)

  • Regression analysis finds a best-fit line or curve to show the relationship between variables.
  • It explains how the dependent variable changes with the independent variable.
  • The aim is to make the line as close as possible to all data points.
  • The Least Squares Method is used to determine this best-fit line.
  • It minimizes the squared differences between actual and predicted values.
  • This method helps calculate the slope and intercept of the line.
  • These values are used for accurate prediction of outcomes

Properties of Regression Line

  • It minimizes the total squared error between actual (y) and predicted (ŷ) values.
  • The regression line always passes through the mean of both x and y values.

Linear Regression

  • Linear regression is a basic machine learning technique used to show a linear relationship between variables.
  • It involves a dependent variable and one or more independent (predictor) variables.
  • If there is one independent variable, it is called Simple Linear Regression.
  • If there are multiple independent variables, it is called Multiple Linear Regression.

Types of Linear Regression

  • Simple Linear Regression: Uses one independent variable to predict the dependent variable.
  • Multiple Linear Regression: Uses more than one independent variable for prediction.

Applications of Linear Regression

  • Used in market analysis to understand relationships between factors like price and sales.
  • Helps in sales forecasting by analyzing past data and trends.
  • Used to predict salary based on experience, education, etc.
  • Applied in sports analysis to study player and team performance.
  • Used in medical research to analyze health-related factors.

Python Program for advance learner

import numpy as np
import matplotlib.pyplot as plt
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])
# Calculate mean and standard deviation
x_mean = np.mean(x)
y_mean = np.mean(y)
x_std = np.std(x)
y_std = np.std(y)
# Calculate covariance and slope
covariance = np.sum((x – x_mean) * (y – y_mean)) / (len(x) – 1)
slope = covariance / (x_std ** 2)
# Calculate y-intercept
intercept = y_mean – slope * x_mean
# Predicted values
y_pred = slope * x + intercept
# Plot data and regression line
plt.scatter(x, y)
plt.plot(x, y_pred, color=’red’)
# Add labels and title
plt.xlabel(‘x’)
plt.ylabel(‘y’)
plt.title(‘Simple Linear Regression’)
# Show the plot
plt.show()
# Print slope and intercept
print(“Slope:”, round(slope, 2))
print(“Intercept:”, round(intercept, 2))

Program Explanation

  • Imports NumPy for calculations and Matplotlib for graph plotting.
  • Defines sample data for x (input) and y (output).
  • Calculates mean, standard deviation, covariance, and slope.
  • Finds the y-intercept using slope and mean values.
  • Uses the regression equation to predict y values.
  • Plots actual data points and the regression line.
  • Prints the slope and intercept values.

Output

Classification (Concept)

  • Classification is a machine learning technique used to group data into predefined categories.
  • The main goal is to assign correct labels to data based on its features.
  • It works on labeled data, so it is a type of supervised learning.
  • The model learns from training data and then predicts labels for new (unseen) data.

Example

  • Sorting waste into categories like paper, plastic, metal, food waste.
  • Each type of waste is given a label, which is similar to classification in ML.

How Classification Works

  • Classes/Categories: Data is divided into different classes or categories (e.g., positive/negative).
  • Features/Attributes: Each data instance has features that help identify its class.
  • Training Data: Model learns from labeled data with correct class labels.
  • Classification Model: Algorithm builds a model to learn patterns from data.
  • Prediction/Inference: Model uses learned patterns to predict classes of new data.

Types of Classification

  • Binary Classification → Only two classes (e.g., yes/no, spam/not spam)
  • Multi-Class ClassificationMore than two classes (e.g., cat, dog, bird)
  • Multi-Label Classification → One data item can have multiple labels (e.g., movie genres: action + comedy)
  • Imbalanced Classification → Classes are unevenly distributed (e.g., fraud detection where fraud cases are very few)
Binary ClassificationMulti-Class ClassificationMulti-Label ClassificationImbalanced Classification
Two class labelsMore than two class labelsOne item can have multiple labelsUnequal class distribution
Email spam (spam/not spam)Face classificationPhoto tagging (multiple objects)Fraud detection
Medical test (yes/no)Plant species classificationImage with multiple labelsOutlier detection
Exam result (pass/fail)OCR / Image classificationMedical diagnostic tests

K-Nearest Neighbour (KNN) Algorithm

  • KNN is a supervised learning algorithm used for both classification and regression.
  • It is a non-parametric method, meaning it does not assume any fixed data pattern.
  • It works on the principle of proximity (closeness).
  • It classifies a new data point based on similarity with nearest data points.

Why KNN is Needed

  • Useful when data structure is not clearly defined.
  • Works well when decision boundaries are complex.
  • Provides a simple and effective way to classify new data.

Steps in KNN Algorithm

  • Select the value of K (number of neighbors).
  • Calculate the Euclidean distance between the new point and all data points.
  • Identify the K nearest neighbors.
  • Count how many neighbors belong to each category.
  • Assign the new data point to the category with maximum neighbors.
  • Model is ready for prediction.

Applications of KNN

  • Image recognition and classification
  • Recommendation systems
  • Healthcare diagnostics
  • Text mining and sentiment analysis
  • Anomaly detection
  • Fraud detection
  • Outlier detection
  • Medical diagnostic tests
  • Photo classification (identifying objects like bicycle, apple, person, etc.)

Advantages of KNN

  • Easy to understand and implement
  • No separate training phase required
  • Works for both classification and regression
  • Handles noisy data reasonably well

Limitations of KNN

  • Slow for large datasets
  • Depends on choice of K and distance metric
  • Needs proper data preprocessing and scaling
  • Not suitable for high-dimensional data

Python program for advance user

# Importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing dataset
data_set = pd.read_csv(‘user_data.csv’)
# Extracting Independent and Dependent variables
x = data_set.iloc[:, [2, 3]].values
y = data_set.iloc[:, 4].values
# Splitting the dataset into Training and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, random_state=0
)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
st_x = StandardScaler()
x_train = st_x.fit_transform(x_train)
x_test = st_x.transform(x_test)

Clustering (Unsupervised Learning)

  • Clustering is a technique used to group similar data points together.
  • It works on unlabeled data, so it is a type of unsupervised learning.
  • Data is divided into clusters where:
    • Points in the same cluster are more similar
    • Points in different clusters are less similar
  • It finds patterns based on features like shape, size, color, behavior, etc.
  • No predefined labels are given; the model discovers patterns on its own.

Uses of Clustering

  • Market segmentation
  • Customer grouping
  • Image recognition
  • Data analysis

How Clustering Works:

To cluster data effectively, follow these key steps:
1) Prepare the Data: Select the right features for clustering and make sure the data is ready by scaling or transforming it as needed.
2) Create Similarity Metrics: Define how similar data points are by comparing their features. This similarity measure is crucial for clustering.
3) Run the Clustering Algorithm: Apply a clustering algorithm to group the data. Choose one that works well with your dataset size and characteristics.
4) Interpret the Results: Analyze the clusters to understand what they represent. Since clustering is unsupervised, interpretation is essential for assessing the quality of the clusters.

Types of Clustering Methods

Some of the common clustering methods used in Machine learning are:

1) Partitioning Clustering
2) Density-Based Clustering
3) Distribution Model-Based Clustering
4) Hierarchical Clustering

Partitioning Clustering

  • Divides data into non-hierarchical groups (k clusters).
  • Also called a centroid-based method.
  • Example: K-Means algorithm.
  • Each cluster has a center (centroid).
  • Data points are grouped such that distance within cluster is minimum.

Density-Based Clustering

  • Forms clusters based on high-density regions.
  • Can create clusters of any shape.
  • Separates clusters by low-density (sparse) areas.
  • May face difficulty with varying densities or high-dimensional data.

Distribution Model-Based Clustering

  • Assumes data follows a probability distribution (usually Gaussian).
  • Calculates the likelihood of data points belonging to clusters.
  • Example: Gaussian Mixture Model (GMM).

Hierarchical Clustering

  • Does not require pre-defined number of clusters.
  • Creates a tree-like structure (dendrogram).
  • Clusters are formed by splitting or merging data points.
  • Number of clusters is decided by cutting the tree at a level.
  • Example: Agglomerative Hierarchical algorithm.

K-Means Clustering

  • K-Means is an unsupervised learning algorithm used for clustering.
  • It divides data into K number of clusters.
  • Each cluster has a centroid (center point).
  • It groups data such that points in the same cluster are more similar.
  • The value of K must be decided beforehand.

Steps in K-Means Clustering

  • Select the number of clusters K.
  • Choose K random centroids.
  • Assign each data point to the nearest centroid.
  • Calculate new centroids based on clusters.
  • Reassign data points to the nearest new centroid.
  • Repeat until no changes occur.
  • Model is ready.

Applications of K-Means Clustering

  • Market Segmentation: Groups customers based on buying behavior.
  • Image Segmentation: Divides images into similar color regions.
  • Document Clustering: Groups similar documents for easy organization.
  • Anomaly Detection: Finds unusual data points (outliers).
  • Customer Segmentation: Helps in targeted marketing and personalization.

Advantages of K-Means

  • Simple and easy to implement.
  • Works well with large datasets.
  • Handles many features and data points efficiently.
  • Results are easy to understand.
  • Useful in multiple domains.

Limitations of K-Means

  • Depends on initial centroid selection.
  • Assumes spherical clusters (not always true).
  • Requires predefined number of clusters (K).
  • Sensitive to outliers.
  • May give suboptimal results.

# K-Means Clustering Program

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Generate synthetic dataset
X, _ = make_blobs(n_samples=300,
                  centers=4,
                  cluster_std=0.60,
                  random_state=0)
# Apply K-Means clustering
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
# Predict cluster labels
y_kmeans = kmeans.predict(X)
# Plot data points
plt.scatter(X[:, 0], X[:, 1],
            c=y_kmeans,
            s=50,
            cmap=’viridis’)
# Plot centroids
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1],
            c=’red’,
            s=200,
            alpha=0.75)
# Add labels and title
plt.title(“K-Means Clustering”)
plt.xlabel(“Feature 1”)
plt.ylabel(“Feature 2”)
# Show graph
plt.show()

This Program Performs the Following Steps:

  1. Generates Data:
    Creates synthetic data using make_blobs from sklearn.datasets.
  2. Applies K-Means:
    Performs clustering with 4 clusters (n_clusters = 4).
  3. Visualizes Results:
    • Plots data points with different colors for each cluster.
    • Displays centroids as red circles.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *