Computer Vision Notes - Class 10 AI (417)

Level up your Class 10 AI (417) exam preparation with these complete Computer Vision revision notes! Covering key topics like Computer Vision tasks, Image Processing, and CNN (Convolutional Neural Networks), everything is explained in simple, student-friendly language. With clear images and diagrams, these notes make learning faster, easier, and more effective—helping you revise quickly and aim for full marks with confidence.

Contents hide

1. Computer Vision

2. Computer Vision Vs Image Processing

3. Applications of Computer Vision

4. Computer Vision Tasks

4.1. Image Classification

4.2. Classification + Localization

4.3. Object Detection

4.4. 🔹 Instance Segmentation

5. Basics of Image

5.1. Components of Image

5.7. How Computer Store RGB Images?

6. No Code AI Tools

6.1. Lobe

6.2. Teachable Machine

6.3. Orange Data Mining

6.4. Image Features

7. Convolution

8. Kernel

9. Convolution Neural Network (CNN)

9.1. Convolutional layer

9.2. Rectified Linear Unit Function

9.3. Pooling Layer

9.4. Fully Connected Layer

Computer Vision

Computer Vision is the process of extracting meaningful information from images, videos, and other visual data. It enables machines to process, analyze, and understand visuals in a way similar to how humans interpret what they see.

Computer Vision and Artificial Intelligence

Computer vision is a field of artificial intelligence (AI).
AI enables computers to think, and computer vision enables AI to see, observe and make sense of visual data (like images & videos).

Computer Vision Vs Image Processing

Computer Vision	Image Processing
Extracts meaningful information from images or videos to understand and predict visual input.	Processes raw images to enhance or prepare them for further tasks.
Focuses on interpreting and understanding visual data.	Focuses on improving image quality or modifying images.
Superset of Image Processing.	Subset of Computer Vision.
Examples: Object detection, Handwriting recognition	Examples: Rescaling images, Correcting brightness, Changing tones

Applications of Computer Vision

Facial Recognition
Used in smart homes and smart cities for security purposes. Helps in identifying individuals, managing visitor logs, and enabling attendance systems in schools.
Face Filters
Used in apps like Instagram and Snapchat. Detects facial features and applies real-time filters based on facial movements and expressions.
Search by Image
Allows users to search using images instead of text. The system compares image features with a database to find similar results.
Retail Industry
Helps track customer movement, analyze shopping behaviour, and improve store layout. Also used in inventory management to monitor stock levels and optimize shelf space.
Self-Driving Cars
Core technology behind autonomous vehicles. Helps in object detection, navigation, and real-time environment monitoring for safe driving.
Medical Imaging
Assists doctors in analyzing medical scans. Converts 2D images into 3D models for better diagnosis and understanding of patient conditions.
Google Translate App
Uses camera input to read and translate text instantly. Combines Optical Character Recognition (OCR) and Augmented Reality for real-time translation.

Computer Vision Tasks

The tasks used in a computer vision application are:

Image Classification

Assigns a single label to the entire image from fixed set of categories (e.g., cat, dog, car)
Widely used in basic image recognition tasks

Classification + Localization

Identifies what object is present in the image
Also determines the location of the object
Works for only a single object in the image

Object Detection

Process of identifying real-world objects in images or videos like faces, bicycles, buildings, etc.
Used in applications like image retrieval systems, automated vehicle parking and surveillance systems

🔹 Instance Segmentation

Detects different objects present in an image and assigns them a category (label)
Labels every pixel of the object for precise identification
Divides the image into regions or segments

Basics of Image

Components of Image

Pixels
Resolution
Pixel Value
Grayscale Image
RGB

Pixels

Pixel means Picture Element.
Smallest unit of a digital image.
Images are made up of thousands/millions of pixels.
Arranged in a 2D grid
(rows × columns).
More pixels → Higher image quality.

Resolution

Resolution = Total number of pixels in an image.
Expressed as Width × Height
(e.g., 1280 × 1024).
Can also be expressed in Megapixels (MP).
1 Megapixel = 1 million pixels.
Example:
1280 × 1024 = 1,310,720 pixels ≈ 1.31 MP.
Higher resolution → More detailed image.

Pixel Value

Each pixel has a value that represents:
- Brightness
- Color
In most images, pixel value ranges from 0 to 255.
0 = Black
255 = White
Why 0–255?
1 byte = 8 bits
Each bit has 2 values (0 or 1)
2⁸ = 256 values (0–255)

Grayscale Image

Contain only shades of gray (no color).
Pixel value range: 0 (black) to 255 (white).
Each pixel uses 1 byte (8 bits).
Image size = Height × Width.
Stored as a 2D array of pixel values.

RGB

Color images made from:
- Red
- Green
- Blue
Different colors are created by combining different intensities of R, G, and B.
Each pixel has three values (R, G, B).

How Computer Store RGB Images?

RGB images are stored in three separate channels: Red (R), Green (G), and Blue (B).
Each channel contains pixel values ranging from 0 to 255.
Each channel is stored as a separate layer (plane) in memory.
Every pixel in an RGB image has three values — one each for Red, Green, and Blue.
These three values together determine the final color of that pixel.
When viewed separately, each channel appears as a grayscale image.
0 = Black (no color intensity)
255 = White (full color intensity)

No Code AI Tools

Followings are various No Code AI Tools:

Lobe
Teachable Machine
Orange Data Mining

Lobe

An AutoML (Automated Machine Learning) no-code AI tool.
Designed mainly for image classification tasks.
Allows users to upload labeled images for training.
Automatically selects and trains the most optimal model.
Makes image model creation simple and code-free.

Teachable Machine

AI, Machine Learning, and Deep Learning tool developed by Google (2017).
Built on top of TensorFlow.js.
Web-based platform — works directly in a browser.
Allows users to train models using:
Images, Audio, Body poses
Accepts input through webcam, microphone, or uploaded files.
Simple and beginner-friendly — no coding required.

Orange Data Mining

A no-code, open-source machine learning tool.
Can be used for simple image classification tasks.
Provides a drag-and-drop interface (no coding required).
Helps in visualizing data and model results easily.
Suitable for beginners and school-level AI projects.

Image Features

Feature of an image refer to the details that tell us what is in the image,
It may be specific structures in the image such as points, edges, or objects.

Image Features – Key Points

Image processing extracts features like blobs, edges, and corners.
These features help in analyzing images for different applications.
Corners are considered the best features because:
- They are found at specific locations.
- They are unique and easy to detect.
- They change when moved in any direction.
Edges are less reliable because:
- They look similar along the entire line.
- Harder to locate an exact position.
Flat areas (blobs) are difficult to track because:
- They look the same everywhere.

Convolution

Convolution is a basic mathematical operation used in image processing.
It is commonly used to apply filter or effects like blurring, sharpening, embossing etc.
It combines two arrays (matrices) of numbers to create a new array.
The two arrays:
- May be of different sizes
- Must have the same dimensionality (e.g., both 2D)
It is simply an element-wise multiplication of the image array with another array called the kernel, followed by a sum.
The result is a new array (filtered image) of the same dimensionality as the input image.

Kernel

A Kernel is a matrix, that slide across the image and multiplied with pixel values of the input image to produce effects like blurring, sharpening.
Each kernel has different values depending on the effect we want to apply.
Convolution is used in image processing to extract features from images.
These extracted features are later used in applications like Convolutional Neural Networks (CNNs).
During convolution, the center of the kernel overlaps with the center of the image region to generate the output.
The output image usually becomes smaller because the edges are not fully covered.
To maintain the same size, padding is applied by adding extra pixels (usually with value 0) around the image.

Convolution Operator Application

Convolution Neural Network (CNN)

A Convolutional Neural Network (CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other.

The process of deploying a CNN is as follows:

Layers of a Convolutional Neural Network (CNN)

Convolutional Layer
Rectified Linear Unit (ReLU)
Pooling Layer
Fully Connected Layer

Convolutional layer

It extract the high level features such as edges
There can be multiple Convolutional layers
First Convolutional layer is responsible for capturing low level features such as edges, colors etc
Several kernels are used to produce several features
The output of this layer is called Feature Map (also called Activation map)
We reduce the image size so that it can be processed more efficiently.

Rectified Linear Unit Function

This layer simply gets rid of all the negative numbers in the feature map and lets the positive number stay as it is.
The process of passing it to the ReLU layer introduces non – linearity in the feature map.

Pooling Layer

Pooling Layer: Reduces the spatial size of feature maps while preserving important features.
It helps in lowering computation and preventing overfitting.
Types of Pooling:

Max Pooling: Selects the maximum value from the region covered by the kernel.
Average Pooling: Calculates the average value of all pixels in the region covered by the kernel.

Fully Connected Layer

Fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label.
The output of convolution/pooling is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label.

Computer Vision Notes – Class 10 AI (417) | CBSE Exam Preparation