Computer Vision Notes – Class 10 AI (417) | CBSE Exam Preparation
Level up your Class 10 AI (417) exam preparation with these complete Computer Vision revision notes! Covering key topics like Computer Vision tasks, Image Processing, and CNN (Convolutional Neural Networks), everything is explained in simple, student-friendly language. With clear images and diagrams, these notes make learning faster, easier, and more effective—helping you revise quickly and aim for full marks with confidence.
Computer Vision
Computer Vision is the process of extracting meaningful information from images, videos, and other visual data. It enables machines to process, analyze, and understand visuals in a way similar to how humans interpret what they see.
Computer Vision and Artificial Intelligence
- Computer vision is a field of artificial intelligence (AI).
- AI enables computers to think, and computer vision enables AI to see, observe and make sense of visual data (like images & videos).
Computer Vision Vs Image Processing
| Computer Vision | Image Processing |
| Extracts meaningful information from images or videos to understand and predict visual input. | Processes raw images to enhance or prepare them for further tasks. |
| Focuses on interpreting and understanding visual data. | Focuses on improving image quality or modifying images. |
| Superset of Image Processing. | Subset of Computer Vision. |
| Examples: Object detection, Handwriting recognition | Examples: Rescaling images, Correcting brightness, Changing tones |
Applications of Computer Vision
- Facial Recognition
Used in smart homes and smart cities for security purposes. Helps in identifying individuals, managing visitor logs, and enabling attendance systems in schools. - Face Filters
Used in apps like Instagram and Snapchat. Detects facial features and applies real-time filters based on facial movements and expressions. - Search by Image
Allows users to search using images instead of text. The system compares image features with a database to find similar results. - Retail Industry
Helps track customer movement, analyze shopping behaviour, and improve store layout. Also used in inventory management to monitor stock levels and optimize shelf space. - Self-Driving Cars
Core technology behind autonomous vehicles. Helps in object detection, navigation, and real-time environment monitoring for safe driving. - Medical Imaging
Assists doctors in analyzing medical scans. Converts 2D images into 3D models for better diagnosis and understanding of patient conditions. - Google Translate App
Uses camera input to read and translate text instantly. Combines Optical Character Recognition (OCR) and Augmented Reality for real-time translation.
Computer Vision Tasks
The tasks used in a computer vision application are:
Image Classification
- Assigns a single label to the entire image from fixed set of categories (e.g., cat, dog, car)
- Widely used in basic image recognition tasks
Classification + Localization
- Identifies what object is present in the image
- Also determines the location of the object
- Works for only a single object in the image
Object Detection
- Process of identifying real-world objects in images or videos like faces, bicycles, buildings, etc.
- Used in applications like image retrieval systems, automated vehicle parking and surveillance systems
🔹 Instance Segmentation
- Detects different objects present in an image and assigns them a category (label)
- Labels every pixel of the object for precise identification
- Divides the image into regions or segments
Basics of Image
Components of Image
- Pixels
- Resolution
- Pixel Value
- Grayscale Image
- RGB
Pixels
- Pixel means Picture Element.
- Smallest unit of a digital image.
- Images are made up of thousands/millions of pixels.
- Arranged in a 2D grid
(rows × columns). - More pixels → Higher image quality.
Resolution
- Resolution = Total number of pixels in an image.
- Expressed as Width × Height
(e.g., 1280 × 1024). - Can also be expressed in Megapixels (MP).
1 Megapixel = 1 million pixels. - Example:
1280 × 1024 = 1,310,720 pixels ≈ 1.31 MP. - Higher resolution → More detailed image.
Pixel Value
- Each pixel has a value that represents:
- Brightness
- Color
- In most images, pixel value ranges from 0 to 255.
0 = Black
255 = White - Why 0–255?
1 byte = 8 bits
Each bit has 2 values (0 or 1)
2⁸ = 256 values (0–255)
Grayscale Image
- Contain only shades of gray (no color).
- Pixel value range: 0 (black) to 255 (white).
- Each pixel uses 1 byte (8 bits).
- Image size = Height × Width.
- Stored as a 2D array of pixel values.
RGB
- Color images made from:
- Red
- Green
- Blue
- Different colors are created by combining different intensities of R, G, and B.
- Each pixel has three values (R, G, B).
How Computer Store RGB Images?
- RGB images are stored in three separate channels: Red (R), Green (G), and Blue (B).
- Each channel contains pixel values ranging from 0 to 255.
- Each channel is stored as a separate layer (plane) in memory.
- Every pixel in an RGB image has three values — one each for Red, Green, and Blue.
- These three values together determine the final color of that pixel.
- When viewed separately, each channel appears as a grayscale image.
0 = Black (no color intensity)
255 = White (full color intensity)
No Code AI Tools
Followings are various No Code AI Tools:
- Lobe
- Teachable Machine
- Orange Data Mining
Lobe
- An AutoML (Automated Machine Learning) no-code AI tool.
- Designed mainly for image classification tasks.
- Allows users to upload labeled images for training.
- Automatically selects and trains the most optimal model.
- Makes image model creation simple and code-free.
Teachable Machine
- AI, Machine Learning, and Deep Learning tool developed by Google (2017).
- Built on top of TensorFlow.js.
- Web-based platform — works directly in a browser.
- Allows users to train models using:
Images, Audio, Body poses - Accepts input through webcam, microphone, or uploaded files.
- Simple and beginner-friendly — no coding required.
Orange Data Mining
- A no-code, open-source machine learning tool.
- Can be used for simple image classification tasks.
- Provides a drag-and-drop interface (no coding required).
- Helps in visualizing data and model results easily.
- Suitable for beginners and school-level AI projects.
Image Features
- Feature of an image refer to the details that tell us what is in the image,
- It may be specific structures in the image such as points, edges, or objects.
Image Features – Key Points
- Image processing extracts features like blobs, edges, and corners.
- These features help in analyzing images for different applications.
- Corners are considered the best features because:
- They are found at specific locations.
- They are unique and easy to detect.
- They change when moved in any direction.
- Edges are less reliable because:
- They look similar along the entire line.
- Harder to locate an exact position.
- Flat areas (blobs) are difficult to track because:
- They look the same everywhere.
Convolution
- Convolution is a basic mathematical operation used in image processing.
- It is commonly used to apply filter or effects like blurring, sharpening, embossing etc.
- It combines two arrays (matrices) of numbers to create a new array.
- The two arrays:
- May be of different sizes
- Must have the same dimensionality (e.g., both 2D)
- It is simply an element-wise multiplication of the image array with another array called the kernel, followed by a sum.
- The result is a new array (filtered image) of the same dimensionality as the input image.
Kernel
- A Kernel is a matrix, that slide across the image and multiplied with pixel values of the input image to produce effects like blurring, sharpening.
- Each kernel has different values depending on the effect we want to apply.
- Convolution is used in image processing to extract features from images.
- These extracted features are later used in applications like Convolutional Neural Networks (CNNs).
- During convolution, the center of the kernel overlaps with the center of the image region to generate the output.
- The output image usually becomes smaller because the edges are not fully covered.
- To maintain the same size, padding is applied by adding extra pixels (usually with value 0) around the image.
Convolution Operator Application
Convolution Neural Network (CNN)
A Convolutional Neural Network (CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other.
The process of deploying a CNN is as follows:
Layers of a Convolutional Neural Network (CNN)
- Convolutional Layer
- Rectified Linear Unit (ReLU)
- Pooling Layer
- Fully Connected Layer
Convolutional layer
- It extract the high level features such as edges
- There can be multiple Convolutional layers
- First Convolutional layer is responsible for capturing low level features such as edges, colors etc
- Several kernels are used to produce several features
- The output of this layer is called Feature Map (also called Activation map)
- We reduce the image size so that it can be processed more efficiently.
Rectified Linear Unit Function
- This layer simply gets rid of all the negative numbers in the feature map and lets the positive number stay as it is.
- The process of passing it to the ReLU layer introduces non – linearity in the feature map.
Pooling Layer
- Pooling Layer: Reduces the spatial size of feature maps while preserving important features.
- It helps in lowering computation and preventing overfitting.
- Types of Pooling:
- Max Pooling: Selects the maximum value from the region covered by the kernel.
- Average Pooling: Calculates the average value of all pixels in the region covered by the kernel.
Fully Connected Layer
- Fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label.
- The output of convolution/pooling is flattened into a single vector of values, each representing a probability that a certain feature belongs to a label.