Chapter 7: Unsupervised Learning
Explore clustering and dimensionality reduction techniques to discover hidden patterns, natural groups, data structure and relationships in unlabeled datasets.
Data
Groups
Reduce
Discover
7.1 Chapter Overview
Unsupervised Learning is a Machine Learning approach where the model learns from data without labelled output. Unlike supervised learning, there is no target column such as Pass/Fail or Price. The algorithm studies the structure of the data and discovers hidden patterns automatically.
Unsupervised learning is commonly used for customer segmentation, market analysis, anomaly detection, document grouping, image compression and dimensionality reduction.
7.2 Supervised vs Unsupervised Learning
| Aspect | Supervised Learning | Unsupervised Learning |
|---|---|---|
| Target label | Available | Not available |
| Goal | Predict known output | Discover hidden structure |
| Example | Predict Pass or Fail | Group students by learning behavior |
| Algorithms | Linear Regression, Logistic Regression | K-Means, DBSCAN, PCA |
7.3 Main Types of Unsupervised Learning
Clustering
Groups similar data points together. Example: grouping customers based on spending behavior.
Dimensionality Reduction
Reduces many features into fewer important components. Example: reducing 20 features into 2 for visualization.
Anomaly Detection
Finds unusual data points. Example: detecting abnormal bank transactions.
Association Discovery
Finds relationships between items. Example: customers who buy bread may also buy butter.
7.4 What is Clustering?
Clustering is the process of grouping similar data points together. The model does not know the group names in advance. It creates groups based on similarity, distance and data patterns.
| Use Case | Possible Clusters |
|---|---|
| Student learning data | High performers, average learners, at-risk learners |
| Customer purchases | Budget buyers, premium buyers, occasional buyers |
| Website behavior | Frequent visitors, new visitors, inactive users |
| Manufacturing sensor data | Normal operation, warning pattern, abnormal pattern |
Visual Idea: Similar points form natural groups or clusters.
7.5 K-Means Clustering
K-Means is one of the most popular clustering algorithms. It divides data into K groups, where K is the number of clusters selected by the user.
How K-Means Works
- Choose the number of clusters K.
- Randomly place cluster centers called centroids.
- Assign each data point to the nearest centroid.
- Move each centroid to the average position of its assigned points.
- Repeat until clusters become stable.
Python Working Example: Student Grouping
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Create a small student dataset.
# Attendance and marks are used to discover natural groups.
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}
# Convert dictionary into a Pandas DataFrame.
df = pd.DataFrame(data)
# Create K-Means model with 3 clusters.
model = KMeans(n_clusters=3, random_state=42)
# Train model and assign cluster labels to each row.
df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])
print(df)
# Visualize clusters.
plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Clusters using K-Means")
plt.show()A table showing Attendance, Marks and Cluster number for each student.
Expected Graph:
A scatter plot where students are grouped into 3 clusters.
Line-by-Line Explanation
| Code | Explanation |
|---|---|
| from sklearn.cluster import KMeans | Imports the K-Means clustering algorithm. |
| df = pd.DataFrame(data) | Creates a table from the dataset. |
| KMeans(n_clusters=3) | Creates a model that will form 3 groups. |
| fit_predict() | Trains the model and returns cluster labels. |
| plt.scatter() | Draws the clusters on a graph. |
7.6 Choosing the Number of Clusters: Elbow Method
The Elbow Method helps choose a suitable K value. It compares the within-cluster error for different values of K. The point where improvement slows down is called the elbow.
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}
df = pd.DataFrame(data)
wcss = []
for k in range(1, 6):
model = KMeans(n_clusters=k, random_state=42)
model.fit(df[["Attendance", "Marks"]])
wcss.append(model.inertia_)
plt.plot(range(1, 6), wcss, marker="o")
plt.xlabel("Number of Clusters K")
plt.ylabel("WCSS")
plt.title("Elbow Method")
plt.show()7.7 Feature Scaling for Clustering
Clustering algorithms depend on distance. If one feature has very large values, it may dominate the clustering. Scaling makes features comparable.
from sklearn.preprocessing import StandardScaler
import pandas as pd
data = {
"Annual_Income": [20000, 25000, 80000, 90000],
"Spending_Score": [30, 35, 80, 85]
}
df = pd.DataFrame(data)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
print(scaled_data)7.8 Hierarchical Clustering
Hierarchical Clustering builds a tree-like structure of clusters. It can be useful when you want to understand how groups are related at different levels.
| Type | Explanation |
|---|---|
| Agglomerative | Starts with each point as its own cluster and merges similar clusters. |
| Divisive | Starts with one large cluster and splits it into smaller clusters. |
Python Example
import pandas as pd
from sklearn.cluster import AgglomerativeClustering
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}
df = pd.DataFrame(data)
model = AgglomerativeClustering(n_clusters=3)
df["Cluster"] = model.fit_predict(df[["Attendance", "Marks"]])
print(df)7.9 DBSCAN Clustering
DBSCAN groups points based on density. It can detect irregular cluster shapes and identify noise or outliers.
| Term | Meaning |
|---|---|
| eps | Maximum distance between nearby points. |
| min_samples | Minimum number of points needed to form a dense region. |
| Noise | Points that do not belong to any cluster. |
Python Example
import pandas as pd
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40, 5],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40, 10]
}
df = pd.DataFrame(data)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
model = DBSCAN(eps=0.8, min_samples=2)
df["Cluster"] = model.fit_predict(scaled_data)
print(df)Rows assigned to cluster numbers. Outliers may appear as cluster -1.
7.10 Cluster Evaluation
Because unsupervised learning has no true labels, evaluation is more challenging. We often use internal scores and visual inspection.
| Method | Purpose |
|---|---|
| Silhouette Score | Measures how well points fit into their clusters. |
| Inertia / WCSS | Measures compactness in K-Means. |
| Visualization | Helps inspect whether groups make sense. |
| Business Interpretation | Checks whether clusters are useful in real decisions. |
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
import pandas as pd
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40]
}
df = pd.DataFrame(data)
model = KMeans(n_clusters=3, random_state=42)
labels = model.fit_predict(df)
score = silhouette_score(df, labels)
print("Silhouette Score:", score)7.11 Dimensionality Reduction
Dimensionality reduction reduces the number of features while keeping important information. It helps with visualization, noise reduction, faster training and simpler analysis.
| Problem | How Dimensionality Reduction Helps |
|---|---|
| Too many features | Reduces complexity |
| Difficult visualization | Converts many features into 2D or 3D |
| Noise in data | Removes less useful variation |
| Slow training | Reduces computation |
7.12 Principal Component Analysis (PCA)
PCA is a popular dimensionality reduction technique. It transforms original features into new features called principal components. These components capture the most important variation in the data.
Python Example: PCA to 2 Components
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
"Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}
df = pd.DataFrame(data)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)
print("PCA Data:")
print(pca_data)
print("Explained Variance Ratio:")
print(pca.explained_variance_ratio_)7.13 PCA Visualization
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
data = {
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35, 40],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38, 40],
"Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1, 2]
}
df = pd.DataFrame(data)
scaled_data = StandardScaler().fit_transform(df)
pca = PCA(n_components=2)
pca_result = pca.fit_transform(scaled_data)
plt.scatter(pca_result[:, 0], pca_result[:, 1])
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Visualization")
plt.show()A 2D scatter plot showing data points after dimensionality reduction.
7.14 Hidden Pattern Discovery
The main purpose of unsupervised learning is to discover useful hidden structure in data. These patterns may not be obvious in raw tables.
| Dataset | Hidden Pattern | Business Use |
|---|---|---|
| Customer sales | High-value and low-value customer groups | Marketing strategy |
| Student performance | At-risk learner group | Early support intervention |
| Machine sensor data | Abnormal operating pattern | Predictive maintenance |
| Website analytics | User behavior segments | Personalized content |
7.15 Complete Mini Project: Student Segmentation
This project groups students into natural learning segments using K-Means clustering.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
data = {
"Student": ["Amin", "Mei Ling", "Ravi", "Siti", "John", "Farah", "Kumar", "Ali"],
"Attendance": [95, 90, 85, 60, 55, 50, 30, 35],
"Marks": [92, 88, 85, 65, 60, 58, 35, 38],
"Study_Hours": [5, 5, 4, 3, 3, 2, 1, 1]
}
df = pd.DataFrame(data)
features = df[["Attendance", "Marks", "Study_Hours"]]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
model = KMeans(n_clusters=3, random_state=42)
df["Cluster"] = model.fit_predict(scaled_features)
print(df)
plt.scatter(df["Attendance"], df["Marks"], c=df["Cluster"])
plt.xlabel("Attendance")
plt.ylabel("Marks")
plt.title("Student Segmentation")
plt.show()7.16 Common Beginner Mistakes
| Mistake | Problem | Correction |
|---|---|---|
| Choosing K randomly | Clusters may not make sense | Use Elbow Method and business understanding |
| Not scaling features | Large values dominate distance | Use StandardScaler |
| Assuming clusters are automatically meaningful | Groups may not be useful | Interpret clusters carefully |
| Ignoring outliers | Clusters may be distorted | Use DBSCAN or clean data |
| Using PCA without explanation | Components may be hard to understand | Check explained variance ratio |
7.17 Hands-On Activities
Activity 1: K-Means Clustering
Create a dataset of customer income and spending score. Use K-Means to group customers into 3 clusters.
Activity 2: Elbow Method
Run the Elbow Method for K values from 1 to 8 and choose a suitable number of clusters.
Activity 3: DBSCAN
Create a dataset with one unusual outlier and use DBSCAN to detect noise.
Activity 4: PCA
Create a dataset with 4 features and reduce it into 2 principal components using PCA.
Mini Project
Build a student segmentation system that groups students into learning support categories using attendance, marks and study hours.
7.18 Interactive Final Assessment Quiz
Each correct answer gives +1 mark. Each wrong answer gives -0.5 mark.
1. Unsupervised learning uses data without target labels.
2. Clustering is used to:
3. K-Means requires choosing the number of clusters K.
4. The Elbow Method helps choose a suitable K value.
5. DBSCAN can detect noise or outliers.
6. PCA is used for:
7. Scaling is important before distance-based clustering.
8. Silhouette Score can evaluate clustering quality.
9. PCA creates principal components from original features.
10. Unsupervised learning can help discover hidden patterns.
Your Score: 0
7.19 Chapter Summary
In this chapter, learners studied Unsupervised Learning, clustering, K-Means, Elbow Method, Hierarchical Clustering, DBSCAN, cluster evaluation and PCA. Learners also explored how hidden patterns can support business decisions, student support and anomaly detection.