Chapter 7: Unsupervised Learning

Explore clustering and dimensionality reduction techniques to discover hidden patterns, natural groups, data structure and relationships in unlabeled datasets.

ClusteringK-MeansDBSCANPCAHidden Patterns
Unlabeled
Data
Clustering
Groups
PCA
Reduce
Patterns
Discover

7.1 Chapter Overview

Unsupervised Learning is a Machine Learning approach where the model learns from data without labelled output. Unlike supervised learning, there is no target column such as Pass/Fail or Price. The algorithm studies the structure of the data and discovers hidden patterns automatically.

Unsupervised learning is commonly used for customer segmentation, market analysis, anomaly detection, document grouping, image compression and dimensionality reduction.

Learning Outcome: By the end of this chapter, learners should be able to explain clustering and dimensionality reduction, apply K-Means, Hierarchical Clustering, DBSCAN and PCA, and discover hidden patterns in datasets.
1Collect Unlabeled Data
2Scale Features
3Find Groups
4Reduce Dimensions
5Interpret Patterns

7.2 Supervised vs Unsupervised Learning

AspectSupervised LearningUnsupervised Learning
Target labelAvailableNot available
GoalPredict known outputDiscover hidden structure
ExamplePredict Pass or FailGroup students by learning behavior
AlgorithmsLinear Regression, Logistic RegressionK-Means, DBSCAN, PCA
Simple Example: If you already know which students passed or failed, that is supervised learning. If you only have attendance, marks and study hours and want to discover natural student groups, that is unsupervised learning.

7.3 Main Types of Unsupervised Learning

Clustering

Groups similar data points together. Example: grouping customers based on spending behavior.

Dimensionality Reduction

Reduces many features into fewer important components. Example: reducing 20 features into 2 for visualization.

Anomaly Detection

Finds unusual data points. Example: detecting abnormal bank transactions.

Association Discovery

Finds relationships between items. Example: customers who buy bread may also buy butter.

7.4 What is Clustering?

Clustering is the process of grouping similar data points together. The model does not know the group names in advance. It creates groups based on similarity, distance and data patterns.

Use CasePossible Clusters
Student learning dataHigh performers, average learners, at-risk learners
Customer purchasesBudget buyers, premium buyers, occasional buyers
Website behaviorFrequent visitors, new visitors, inactive users
Manufacturing sensor dataNormal operation, warning pattern, abnormal pattern

Visual Idea: Similar points form natural groups or clusters.

7.5 K-Means Clustering

K-Means is one of the most popular clustering algorithms. It divides data into K groups, where K is the number of clusters selected by the user.

How K-Means Works

  1. Choose the number of clusters K.
  2. Randomly place cluster centers called centroids.
  3. Assign each data point to the nearest centroid.
  4. Move each centroid to the average position of its assigned points.
  5. Repeat until clusters become stable.
Goal: Minimize distance between data points and their cluster centroid

Python Working Example: Student Grouping

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Create a small student dataset.
# Attendance and marks are used to discover natural groups.
data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

# Convert dictionary into a Pandas DataFrame.
df = pd.DataFrame(data)

# Create K-Means model with 3 clusters.
model = KMeans(n_clusters=3, random_state=42)

# Train model and assign cluster labels to each row.
df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])

print(df)

# Visualize clusters.
plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Clusters using K-Means")
plt.show()
Expected Output:
A table showing Attendance, Marks and Cluster number for each student.

Expected Graph:
A scatter plot where students are grouped into 3 clusters.

Line-by-Line Explanation

CodeExplanation
from sklearn.cluster import KMeansImports the K-Means clustering algorithm.
df = pd.DataFrame(data)Creates a table from the dataset.
KMeans(n_clusters=3)Creates a model that will form 3 groups.
fit_predict()Trains the model and returns cluster labels.
plt.scatter()Draws the clusters on a graph.

7.6 Choosing the Number of Clusters: Elbow Method

The Elbow Method helps choose a suitable K value. It compares the within-cluster error for different values of K. The point where improvement slows down is called the elbow.

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

wcss = []

for k in range(1, 6):
    model = KMeans(n_clusters=k, random_state=42)
    model.fit(df[["Attendance", "Marks"]])
    wcss.append(model.inertia_)

plt.plot(range(1, 6), wcss, marker="o")
plt.xlabel("Number of Clusters K")
plt.ylabel("WCSS")
plt.title("Elbow Method")
plt.show()
Explanation: WCSS means Within-Cluster Sum of Squares. Lower WCSS means points are closer to their cluster center. The elbow point suggests a good K value.

7.7 Feature Scaling for Clustering

Clustering algorithms depend on distance. If one feature has very large values, it may dominate the clustering. Scaling makes features comparable.

from sklearn.preprocessing import StandardScaler
import pandas as pd

data = {
    "Annual_Income": [20000, 25000, 80000, 90000],
    "Spending_Score": [30, 35, 80, 85]
}

df = pd.DataFrame(data)

scaler = StandardScaler()

scaled_data = scaler.fit_transform(df)

print(scaled_data)
Key Idea: Always consider scaling before distance-based clustering such as K-Means and KNN.

7.8 Hierarchical Clustering

Hierarchical Clustering builds a tree-like structure of clusters. It can be useful when you want to understand how groups are related at different levels.

TypeExplanation
AgglomerativeStarts with each point as its own cluster and merges similar clusters.
DivisiveStarts with one large cluster and splits it into smaller clusters.

Python Example

import pandas as pd
from sklearn.cluster import AgglomerativeClustering

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

model = AgglomerativeClustering(n_clusters=3)

df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])

print(df)
Use Case: Hierarchical clustering is useful when you want to explore nested relationships between groups.

7.9 DBSCAN Clustering

DBSCAN groups points based on density. It can detect irregular cluster shapes and identify noise or outliers.

TermMeaning
epsMaximum distance between nearby points.
min_samplesMinimum number of points needed to form a dense region.
NoisePoints that do not belong to any cluster.

Python Example

import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40, 5],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40, 10]
}

df = pd.DataFrame(data)

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

model = DBSCAN(eps=0.8, min_samples=2)

df["Cluster"] = model.fit_predict(scaled_data)

print(df)
Expected Output:
Rows assigned to cluster numbers. Outliers may appear as cluster -1.

7.10 Cluster Evaluation

Because unsupervised learning has no true labels, evaluation is more challenging. We often use internal scores and visual inspection.

MethodPurpose
Silhouette ScoreMeasures how well points fit into their clusters.
Inertia / WCSSMeasures compactness in K-Means.
VisualizationHelps inspect whether groups make sense.
Business InterpretationChecks whether clusters are useful in real decisions.
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
import pandas as pd

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}

df = pd.DataFrame(data)

model = KMeans(n_clusters=3, random_state=42)
labels = model.fit_predict(df)

score = silhouette_score(df, labels)

print("Silhouette Score:", score)

7.11 Dimensionality Reduction

Dimensionality reduction reduces the number of features while keeping important information. It helps with visualization, noise reduction, faster training and simpler analysis.

ProblemHow Dimensionality Reduction Helps
Too many featuresReduces complexity
Difficult visualizationConverts many features into 2D or 3D
Noise in dataRemoves less useful variation
Slow trainingReduces computation

7.12 Principal Component Analysis (PCA)

PCA is a popular dimensionality reduction technique. It transforms original features into new features called principal components. These components capture the most important variation in the data.

PCA finds directions where data varies the most.

Python Example: PCA to 2 Components

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}

df = pd.DataFrame(data)

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

pca = PCA(n_components=2)

pca_data = pca.fit_transform(scaled_data)

print("PCA Data:")
print(pca_data)

print("Explained Variance Ratio:")
print(pca.explained_variance_ratio_)
Explanation: n_components=2 reduces the dataset into 2 new columns while keeping as much information as possible.

7.13 PCA Visualization

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

data = {
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}

df = pd.DataFrame(data)

scaled_data = StandardScaler().fit_transform(df)

pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)

plt.scatter(pca_result[:, 0], pca_result[:, 1])
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Visualization")
plt.show()
Expected Graph:
A 2D scatter plot showing data points after dimensionality reduction.

7.14 Hidden Pattern Discovery

The main purpose of unsupervised learning is to discover useful hidden structure in data. These patterns may not be obvious in raw tables.

DatasetHidden PatternBusiness Use
Customer salesHigh-value and low-value customer groupsMarketing strategy
Student performanceAt-risk learner groupEarly support intervention
Machine sensor dataAbnormal operating patternPredictive maintenance
Website analyticsUser behavior segmentsPersonalized content

7.15 Complete Mini Project: Student Segmentation

This project groups students into natural learning segments using K-Means clustering.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

data = {
    "Student": ["Amin", "Mei Ling", "Ravi", "Siti", "John", "Farah", "Kumar", "Ali"],
    "Attendance": [95, 90, 85, 60, 55, 50, 30, 35],
    "Marks": [92, 88, 85, 65, 60, 58, 35, 38],
    "Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1]
}

df = pd.DataFrame(data)

features = df[["Attendance", "Marks", "Study_Hours"]]

scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

model = KMeans(n_clusters=3, random_state=42)

df["Cluster"] = model.fit_predict(scaled_features)

print(df)

plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Segmentation")
plt.show()
Project Interpretation: Students can be grouped into high performers, moderate learners and at-risk learners. This helps trainers plan targeted support.

7.16 Common Beginner Mistakes

MistakeProblemCorrection
Choosing K randomlyClusters may not make senseUse Elbow Method and business understanding
Not scaling featuresLarge values dominate distanceUse StandardScaler
Assuming clusters are automatically meaningfulGroups may not be usefulInterpret clusters carefully
Ignoring outliersClusters may be distortedUse DBSCAN or clean data
Using PCA without explanationComponents may be hard to understandCheck explained variance ratio

7.17 Hands-On Activities

Activity 1: K-Means Clustering

Create a dataset of customer income and spending score. Use K-Means to group customers into 3 clusters.

Activity 2: Elbow Method

Run the Elbow Method for K values from 1 to 8 and choose a suitable number of clusters.

Activity 3: DBSCAN

Create a dataset with one unusual outlier and use DBSCAN to detect noise.

Activity 4: PCA

Create a dataset with 4 features and reduce it into 2 principal components using PCA.

Mini Project

Build a student segmentation system that groups students into learning support categories using attendance, marks and study hours.

7.18 Interactive Final Assessment Quiz

Each correct answer gives +1 mark. Each wrong answer gives -0.5 mark.

1. Unsupervised learning uses data without target labels.

2. Clustering is used to:

3. K-Means requires choosing the number of clusters K.

4. The Elbow Method helps choose a suitable K value.

5. DBSCAN can detect noise or outliers.

6. PCA is used for:

7. Scaling is important before distance-based clustering.

8. Silhouette Score can evaluate clustering quality.

9. PCA creates principal components from original features.

10. Unsupervised learning can help discover hidden patterns.

Your Score: 0

7.19 Chapter Summary

In this chapter, learners studied Unsupervised Learning, clustering, K-Means, Elbow Method, Hierarchical Clustering, DBSCAN, cluster evaluation and PCA. Learners also explored how hidden patterns can support business decisions, student support and anomaly detection.

Remember: Unsupervised Learning is about discovering structure when labels are not available. It helps reveal patterns that may not be visible in raw data.