Chapter 7 - Unsupervised Learning

7.1 Chapter Overview

Unsupervised Learning is a Machine Learning approach where the model learns from data without labelled output. Unlike supervised learning, there is no target column such as Pass/Fail or Price. The algorithm studies the structure of the data and discovers hidden patterns automatically.

Unsupervised learning is commonly used for customer segmentation, market analysis, anomaly detection, document grouping, image compression and dimensionality reduction.

Learning Outcome: By the end of this chapter, learners should be able to explain clustering and dimensionality reduction, apply K-Means, Hierarchical Clustering, DBSCAN and PCA, and discover hidden patterns in datasets.

1Collect Unlabeled Data

2Scale Features

3Find Groups

4Reduce Dimensions

5Interpret Patterns

7.2 Supervised vs Unsupervised Learning

Aspect	Supervised Learning	Unsupervised Learning
Target label	Available	Not available
Goal	Predict known output	Discover hidden structure
Example	Predict Pass or Fail	Group students by learning behavior
Algorithms	Linear Regression, Logistic Regression	K-Means, DBSCAN, PCA

Simple Example: If you already know which students passed or failed, that is supervised learning. If you only have attendance, marks and study hours and want to discover natural student groups, that is unsupervised learning.

7.3 Main Types of Unsupervised Learning

Clustering

Groups similar data points together. Example: grouping customers based on spending behavior.

Dimensionality Reduction

Reduces many features into fewer important components. Example: reducing 20 features into 2 for visualization.

Anomaly Detection

Finds unusual data points. Example: detecting abnormal bank transactions.

Association Discovery

Finds relationships between items. Example: customers who buy bread may also buy butter.

7.4 What is Clustering?

Clustering is the process of grouping similar data points together. The model does not know the group names in advance. It creates groups based on similarity, distance and data patterns.

Use Case	Possible Clusters
Student learning data	High performers, average learners, at-risk learners
Customer purchases	Budget buyers, premium buyers, occasional buyers
Website behavior	Frequent visitors, new visitors, inactive users
Manufacturing sensor data	Normal operation, warning pattern, abnormal pattern

Visual Idea: Similar points form natural groups or clusters.

7.5 K-Means Clustering

K-Means is one of the most popular clustering algorithms. It divides data into K groups, where K is the number of clusters selected by the user.

How K-Means Works

Choose the number of clusters K.
Randomly place cluster centers called centroids.
Assign each data point to the nearest centroid.
Move each centroid to the average position of its assigned points.
Repeat until clusters become stable.

Goal: Minimize distance between data points and their cluster centroid

Python Working Example: Student Grouping

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Create a small student dataset.
# Attendance and marks are used to discover natural groups.
data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

# Convert dictionary into a Pandas DataFrame.
df = pd.DataFrame(data)

# Create K-Means model with 3 clusters.
model = KMeans(n_clusters=3, random_state=42)

# Train model and assign cluster labels to each row.
df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])

print(df)

# Visualize clusters.
plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Clusters using K-Means")
plt.show()

Expected Output:
A table showing Attendance, Marks and Cluster number for each student.

Expected Graph:
A scatter plot where students are grouped into 3 clusters.

Line-by-Line Explanation

Code	Explanation
from sklearn.cluster import KMeans	Imports the K-Means clustering algorithm.
df = pd.DataFrame(data)	Creates a table from the dataset.
KMeans(n_clusters=3)	Creates a model that will form 3 groups.
fit_predict()	Trains the model and returns cluster labels.
plt.scatter()	Draws the clusters on a graph.

7.6 Choosing the Number of Clusters: Elbow Method

The Elbow Method helps choose a suitable K value. It compares the within-cluster error for different values of K. The point where improvement slows down is called the elbow.

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

wcss = []

for k in range(1, 6):
    model = KMeans(n_clusters=k, random_state=42)
    model.fit(df[["Attendance", "Marks"]])
    wcss.append(model.inertia_)

plt.plot(range(1, 6), wcss, marker="o")
plt.xlabel("Number of Clusters K")
plt.ylabel("WCSS")
plt.title("Elbow Method")
plt.show()

Explanation: WCSS means Within-Cluster Sum of Squares. Lower WCSS means points are closer to their cluster center. The elbow point suggests a good K value.

7.7 Feature Scaling for Clustering

Clustering algorithms depend on distance. If one feature has very large values, it may dominate the clustering. Scaling makes features comparable.

from sklearn.preprocessing import StandardScaler
import pandas as pd

data = {
    "Annual_Income": [20000, 25000, 80000, 90000],
    "Spending_Score": [30, 35, 80, 85]
}

df = pd.DataFrame(data)

scaler = StandardScaler()

scaled_data = scaler.fit_transform(df)

print(scaled_data)

Key Idea: Always consider scaling before distance-based clustering such as K-Means and KNN.

7.8 Hierarchical Clustering

Hierarchical Clustering builds a tree-like structure of clusters. It can be useful when you want to understand how groups are related at different levels.

Type	Explanation
Agglomerative	Starts with each point as its own cluster and merges similar clusters.
Divisive	Starts with one large cluster and splits it into smaller clusters.

Python Example

import pandas as pd
from sklearn.cluster import AgglomerativeClustering

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

model = AgglomerativeClustering(n_clusters=3)

df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])

print(df)

Use Case: Hierarchical clustering is useful when you want to explore nested relationships between groups.

7.9 DBSCAN Clustering

DBSCAN groups points based on density. It can detect irregular cluster shapes and identify noise or outliers.

Term	Meaning
eps	Maximum distance between nearby points.
min_samples	Minimum number of points needed to form a dense region.
Noise	Points that do not belong to any cluster.

Python Example

import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40, 5],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40, 10]
}

df = pd.DataFrame(data)

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

model = DBSCAN(eps=0.8, min_samples=2)

df["Cluster"] = model.fit_predict(scaled_data)

print(df)

Expected Output:
Rows assigned to cluster numbers. Outliers may appear as cluster -1.

7.10 Cluster Evaluation

Because unsupervised learning has no true labels, evaluation is more challenging. We often use internal scores and visual inspection.

Method	Purpose
Silhouette Score	Measures how well points fit into their clusters.
Inertia / WCSS	Measures compactness in K-Means.
Visualization	Helps inspect whether groups make sense.
Business Interpretation	Checks whether clusters are useful in real decisions.

from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
import pandas as pd

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

model = KMeans(n_clusters=3, random_state=42)
labels = model.fit_predict(df)

score = silhouette_score(df, labels)

print("Silhouette Score:", score)

7.11 Dimensionality Reduction

Dimensionality reduction reduces the number of features while keeping important information. It helps with visualization, noise reduction, faster training and simpler analysis.

Problem	How Dimensionality Reduction Helps
Too many features	Reduces complexity
Difficult visualization	Converts many features into 2D or 3D
Noise in data	Removes less useful variation
Slow training	Reduces computation

7.12 Principal Component Analysis (PCA)

PCA is a popular dimensionality reduction technique. It transforms original features into new features called principal components. These components capture the most important variation in the data.

PCA finds directions where data varies the most.

Python Example: PCA to 2 Components

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}

df = pd.DataFrame(data)

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

pca = PCA(n_components=2)

pca_data = pca.fit_transform(scaled_data)

print("PCA Data:")
print(pca_data)

print("Explained Variance Ratio:")
print(pca.explained_variance_ratio_)

Explanation: n_components=2 reduces the dataset into 2 new columns while keeping as much information as possible.

7.13 PCA Visualization

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}

df = pd.DataFrame(data)

scaled_data = StandardScaler().fit_transform(df)

pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)

plt.scatter(pca_result[:, 0], pca_result[:, 1])
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Visualization")
plt.show()

Expected Graph:
A 2D scatter plot showing data points after dimensionality reduction.

7.14 Hidden Pattern Discovery

The main purpose of unsupervised learning is to discover useful hidden structure in data. These patterns may not be obvious in raw tables.

Dataset	Hidden Pattern	Business Use
Customer sales	High-value and low-value customer groups	Marketing strategy
Student performance	At-risk learner group	Early support intervention
Machine sensor data	Abnormal operating pattern	Predictive maintenance
Website analytics	User behavior segments	Personalized content

7.15 Complete Mini Project: Student Segmentation

This project groups students into natural learning segments using K-Means clustering.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

data = {
    "Student": ["Amin", "Mei Ling", "Ravi", "Siti", "John", "Farah", "Kumar", "Ali"],
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1]
}

df = pd.DataFrame(data)

features = df[["Attendance", "Marks", "Study_Hours"]]

scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

model = KMeans(n_clusters=3, random_state=42)

df["Cluster"] = model.fit_predict(scaled_features)

print(df)

plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Segmentation")
plt.show()

Project Interpretation: Students can be grouped into high performers, moderate learners and at-risk learners. This helps trainers plan targeted support.

7.16 Common Beginner Mistakes

Mistake	Problem	Correction
Choosing K randomly	Clusters may not make sense	Use Elbow Method and business understanding
Not scaling features	Large values dominate distance	Use StandardScaler
Assuming clusters are automatically meaningful	Groups may not be useful	Interpret clusters carefully
Ignoring outliers	Clusters may be distorted	Use DBSCAN or clean data
Using PCA without explanation	Components may be hard to understand	Check explained variance ratio

7.17 Hands-On Activities

Activity 1: K-Means Clustering

Create a dataset of customer income and spending score. Use K-Means to group customers into 3 clusters.

Activity 2: Elbow Method

Run the Elbow Method for K values from 1 to 8 and choose a suitable number of clusters.

Activity 3: DBSCAN

Create a dataset with one unusual outlier and use DBSCAN to detect noise.

Activity 4: PCA

Create a dataset with 4 features and reduce it into 2 principal components using PCA.

Mini Project

Build a student segmentation system that groups students into learning support categories using attendance, marks and study hours.

7.19 Chapter Summary

In this chapter, learners studied Unsupervised Learning, clustering, K-Means, Elbow Method, Hierarchical Clustering, DBSCAN, cluster evaluation and PCA. Learners also explored how hidden patterns can support business decisions, student support and anomaly detection.

Remember: Unsupervised Learning is about discovering structure when labels are not available. It helps reveal patterns that may not be visible in raw data.

7.1 Chapter Overview

7.2 Supervised vs Unsupervised Learning

7.3 Main Types of Unsupervised Learning

Clustering

Dimensionality Reduction

Anomaly Detection

Association Discovery

7.4 What is Clustering?

7.5 K-Means Clustering

How K-Means Works

Python Working Example: Student Grouping

Line-by-Line Explanation

7.6 Choosing the Number of Clusters: Elbow Method

7.7 Feature Scaling for Clustering

7.8 Hierarchical Clustering

Python Example

7.9 DBSCAN Clustering

Python Example

7.10 Cluster Evaluation

7.11 Dimensionality Reduction

7.12 Principal Component Analysis (PCA)

Python Example: PCA to 2 Components

7.13 PCA Visualization

7.14 Hidden Pattern Discovery

7.15 Complete Mini Project: Student Segmentation

7.16 Common Beginner Mistakes

7.17 Hands-On Activities

Activity 1: K-Means Clustering

Activity 2: Elbow Method

Activity 3: DBSCAN

Activity 4: PCA

Mini Project

7.18 Interactive Final Assessment Quiz

1. Unsupervised learning uses data without target labels.

2. Clustering is used to:

3. K-Means requires choosing the number of clusters K.

4. The Elbow Method helps choose a suitable K value.

5. DBSCAN can detect noise or outliers.

6. PCA is used for:

7. Scaling is important before distance-based clustering.

8. Silhouette Score can evaluate clustering quality.

9. PCA creates principal components from original features.

10. Unsupervised learning can help discover hidden patterns.

Your Score: 0

7.19 Chapter Summary