Wednesday, November 15, 2023

UNIT-3-ml



Welcome to the letslearningcse blog! today In this post, I came up with some cool and interesting stuff in machine learning that is none other than ensemble learning. According to the JNTU syllabus, ensemble learning is unit 3 of machine learning. 
 
We'll start with a brief introduction to classification and its types.
Classification is nothing but a supervised learning technique that enhances categorical data for its target variable. This kind of target output is called classification.
There are two types of classifications:
  • Binary Classification
  • Multi-class Classification

Binary Classification: If a target variable contains categorical data with exactly 2 categories, this is known as binary classification.

Multi-class Classification: If a target variable contains categorical data with more than two categories, this is known as multi-class classification. Here we have multiple class labels present in the given dataset. It has two types of mechanisms:

  • One vs. All [OVA] or One vs. Rest [OVR]: n-class instances or class labels generate n-binary classifier models. formula: n classes=n classifiers
  • One vs. One: n-class instances or labels generate n (n-1)/2 binary classifier models.                    formula:n classes = n(n-1)/2 classifiers

MNIST [Modified National Institute of Standard Technology] Dataset:

  • It is a set of 70,000 small images of (0–9) digits handwritten by students and employees of the US Census Bureau.
  • This is considered a hello-world program in deep learning, which is a subset of machine learning.
  • MNIST is a dataset of handwritten digits, consisting of 70,000 images of size 28x28 pixels.
  • The dataset is divided into two sets: 60,000 images for training and 10,000 images for testing purposes.
  • The sklearn.dataset package contains three major functions for datasets.
  • load: function loads small toy datasets [.csv files] connected to sklearn
  • fetch: A function such as fetch_openml() can be used to load real-world datasets.
  • make: This function generates the fake datasets useful for testing.
Let's get into a brief introduction to ensemble learning. It is nothing but the addition of numerous machine learning modals to provide an efficient method.
It has mainly two types of ensembles:

  • Homogeneous ensemble
In this ensemble, it uses the same algorithm for various models.
  • Heterogeneous ensemble:
In this type of ensemble technique, different algorithms are used. 
 
Sequential ensemble means the model depends on other models output weighted datasets [error-contained data partitions] and does the processing until the maximum is reached or the output is error-free.
Parallel ensemble means a model independent of other models, and each model predicts the output independently.
 
Ensemble techniques were classified into two types:
 
Traditional Techniques:
    • Mean: We all know that the basic principle of mean is the ratio of the sum of observations by the number of observations.
    • Mode: the item that has the majority of repeats is the mode. 
    • Weighted Mean: It is nothing but the mean having some weights for their individuals.
Advanced Techniques:
  • Boosting: Boosting is a type of ensemble learning that combines the weak learners and forms a strong learner to perform the predictions on new data.
  • Bagging: Bagging is a combination of two words [bootstrap and aggregate]. Bootstrap means randomly creating samples of data out of a dataset with replacement, and Aggregation subsets of bootstrap data are given as training data to predict the aggregate output to new data when sampling is preferred. Replacement is called bagging, and without replacement, it is called posting.
  • Random Forest:This algorithm builds multiple decision trees and merges them together to get more accurate and stable predictions. Then it gets predictions from each tree, and by means of majority voting, it selects the decision that gets the majority vote.
  • Voting Classifier: A very simple way to create a better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes.
  • Stacking [blending] was introduced by Walpert. It is also known as a stacked generalization, and it is an extended form of the model averaging ensemble technique in which all submodels equally participate as per the performance weights and build a new model with better predictions.
What are you going to see in the material?
  • Classification and its types
  • Performance measures
  • MNIST dataset and different types of dataset
  • about ensemble learning
  • ensemble classification
  • traditional ensemble learning
  • advanced ensemble learning
    • boosting
    • bagging
    • random forest
    • voting classifier
    • stacking 
link for the material: UNIT-3-ML

 

No comments: