Introduction to Big Data and Data Analytics Notes – Class 12 AI (843)
Whether you are preparing for school exams, practicals, or revision, these Introduction to Big Data and Data Analytics Notes of Class 10 AI will help you study in a smarter and more effective way.
These notes are carefully prepared for CBSE Class 12 Artificial Intelligence students by keeping the latest syllabus, board exam pattern, and student understanding in mind.
What is Big Data?
- Big Data refers to extremely large and complex datasets.
- It cannot be handled by regular computer programs and traditional databases.
- Main Sources of Big Data:
- Transactional Data – data from online purchases and business transactions
- Machine Data – data from sensors, devices, and machines
- Social Data – data from social media posts, comments, and likes
- Special tools and techniques are required to analyze Big Data. These tools help organizations:
- Find valuable insights
- Improve decision-making
- Drive innovation
- Examples:
- Amazon uses Big Data to recommend products
- Netflix uses Big Data to suggest shows based on user activity
Types of Big Data
| Aspect | Structured Data | Semi-Structured Data | Unstructured Data |
| Definition | Quantitative data with a defined structure | A mix of quantitative and qualitative properties | Data without a predefined structure |
| Data Model | Uses a dedicated data model | May not have strict structure or formal rules | Lacks a consistent or specific data model |
| Organization | Organized in clearly defined rows and columns | Less organized than structured data | No clear organization; varies over time |
| Accessibility | Easily accessible and searchable | Accessible but harder to analyze | Depends on data format; often difficult to process |
| Examples | Customer information, transaction records, product data | XML, CSV, JSON, HTML files, semi-structured documents | Audio, images, videos, emails, PDFs, social media posts |
Advantages of Big Data
- Enhanced Decision Making
Helps organizations make data-driven decisions using insights from large datasets. - Improved Efficiency and Productivity
Identifies inefficiencies, streamlines processes, and optimizes resource use. - Better Customer Insights
Provides deeper understanding of customer behavior, preferences, and needs. - Competitive Advantage
Helps identify market trends and opportunities, keeping businesses ahead of competitors. - Innovation and Growth
Supports development of new products, services, and business models.
Disadvantages of Big Data
- Privacy and Security Concerns
Risk of data breaches, unauthorized access, and misuse of personal data. - Data Quality Issues
Unstructured and varied data may lead to errors and inaccurate analysis. - Technical Complexity
Requires skilled professionals and advanced tools to manage Big Data systems. - Regulatory Compliance
Organizations must follow laws like General Data Protection Regulation and Digital Personal Data Protection Act, 2023 to avoid legal issues. - Cost and Resource Intensiveness
High cost of data storage, processing, and hiring skilled staff.
Characteristics of Big Data
Characteristics of Big Data is described by 6Vs framework, which includes:
- Volume – Large quantity of data
- Velocity – High speed of data generation
- Variety – Different forms of data
- Veracity – Accuracy and reliability of data
- Variability – Inconsistency and changing nature of data
- Value – Useful insights obtained from data
Velocity
- Refers to the speed at which data is generated, delivered, and analysed
- Data is produced very rapidly in today’s digital world
- Example: Google processes over 40,000 searches per second
- High velocity requires fast processing tools
Volume
- Refers to the huge amount of data generated daily
- Growth due to increasing use of online platforms
- Typically, if the data volume exceeds gigabytes, it falls into the realm of big data.
- Data size ranges from:
- Gigabytes → Terabytes → Petabytes → Exabytes
- Approx. 328.77 million terabytes of data are created every day
Variety
- Refers to different types and formats of data
- Big data encompasses:
- Structured data
- Semi-structured data
- Unstructured data
- Examples: text, images, audio, video
- Unstructured data is difficult to store in traditional databases but is highly valuable
Veracity
- Refers to accuracy, quality, and reliability of data
- Not all data is useful → needs data cleaning
- Ensures data is trustworthy for analysis
Value
- Refers to the usefulness of data in decision-making
- Main goal of Big Data is to extract meaningful insights
- Without value, other characteristics are not important
Variability
- Refers to inconsistency and changing nature of data
- Data flow may vary over time
- Important to handle unpredictable data patterns
Big Data Analytics
- Involves analysing large and complex datasets.
- Works with: Structured data, Semi-structured data, Unstructured data
- Data size ranges from terabytes to zettabytes
- Includes processes like: Data collection, Data organization, Data storage, Data analysis
- Main objective of Big Data Analysis is:
- Extract meaningful insights
- Solve problems
- Improve decision-making
- Importance Big Data Analysis in business:
- Improves business processes
- Enhances performance
- Helps in forecasting and planning
Types of Big Data Analytics
- Descriptive Analytics – What happened
- Diagnostic Analytics – Why it happened
- Predictive Analytics – What might happen
- Prescriptive Analytics – What should be done
Big Data Analytics emerges as a consequence of four significant global trends:
- Moore’s Law
- Growth in computing power enables handling large datasets
- Mobile Computing
- Smartphones allow real-time data access and generation
- Social Networking
- Platforms like Facebook, Foursquare, and Pinterest generate massive user data
- Cloud Computing
- Provides remote access to storage and computing resources
- Reduces need for physical infrastructure
Working on Big Data Analytics
Big Data Analytics involves collecting, processing, cleaning, and analysing large datasets to improve decision-making and organizational performance.
Steps in Big Data Analytics
Step 1: Gather Data
- Data is collected from various sources which includes Cloud storage, Mobile applications, IoT sensors
- Data can be: Structured or Unstructured
Step 2: Process Data
- Data is processed to make it ready for analysis. Important for handling large and unstructured data
- Various processing options are:
- Stream Processing – which looks at large data blocks over time.
- Batch Processing – looks at small batches of data at once
Step 3: Clean Data
- Scrubbing all data, regardless of size, improves quality and yields better results.
- Ensures accurate and reliable results
Step 4: Analyse Data
- Final step to extract useful insights
- Uses advanced analytical techniques
Examples of Data Analytics Tools
- Tableau
- Apache Hadoop
- Apache Cassandra
- MongoDB
- SAS
Mining Data Streams
- A data stream is a continuous, real-time flow of data.
- It is generated from various sources such as: Sensors, Satellite image data, Internet and web traffic
- Mining data streams refers to extracting meaningful patterns, trends, and knowledge from continuous real-time data
- It is different from traditional data mining because:
- Data is processed as it arrives
- Data is not stored completely
- Example: Website Data
- Websites receive continuous data streams daily
- A sudden increase in searches like “election results” may indicate:
- Elections recently held
- High public interest in results
Future of Big Data Analytics
Real-Time Analytics
- Enables instant data processing
- Provides immediate insights for decision-making
- Helps in Monitoring customer behaviour, Tracking supply chain activities
Advanced Predictive Analytics Models
- Uses Machine Learning and AI algorithms
- Improves accuracy of predictions
- Helps organizations in Forecast trends, Predict customer behaviour
Quantum Computing
- Offers extremely high processing power
- Can solve complex problems faster than traditional computers
- Has the potential to revolutionize Big Data Analytics