{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML Practice Series: Module 05 - K-Means Clustering\n",
"\n",
"Welcome to the final module of this basic series! We are exploring **Unsupervised Learning** with **K-Means Clustering**.\n",
"\n",
"### Objectives:\n",
"1. **Unsupervised Learning**: Pattern discovery without labels.\n",
"2. **K-Means**: How the algorithm groups data.\n",
"3. **Elbow Method**: Deciding the number of clusters (K).\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Setup\n",
"We will generate a synthetic dataset for this exercise to clearly see the clusters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.datasets import make_blobs\n",
"\n",
"# Generate synthetic data\n",
"X, _ = make_blobs(n_samples=500, centers=4, cluster_std=1.0, random_state=42)\n",
"df = pd.DataFrame(X, columns=['Feature 1', 'Feature 2'])\n",
"\n",
"plt.scatter(df['Feature 1'], df['Feature 2'], s=30, alpha=0.5)\n",
"plt.title(\"Original Data (Unlabeled)\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. K-Means Implementation\n",
"\n",
"### Task 1: Find Optimal K (Elbow Method)\n",
"Calculate inertia (Within-Cluster Sum of Squares) for K values from 1 to 10."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"inertia = []\n",
"for k in range(1, 11):\n",
" kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)\n",
" kmeans.fit(X)\n",
" inertia.append(kmeans.inertia_)\n",
"\n",
"plt.plot(range(1, 11), inertia, 'bx-')\n",
"plt.xlabel('K values')\n",
"plt.ylabel('Inertia')\n",
"plt.title('Elbow Method')\n",
"plt.show()\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 2: Fit K-Means\n",
"From the elbow plot, choose the best K (looks like 4) and fit the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)\n",
"df['cluster'] = kmeans.fit_predict(X)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Task 3: Visualize Clusters\n",
"Scatter plot again, but color points by their assigned cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"plt.scatter(df['Feature 1'], df['Feature 2'], c=df['cluster'], cmap='viridis', s=30)\n",
"plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red', s=200, marker='X', label='Centroids')\n",
"plt.legend()\n",
"plt.title(\"Clustered Data\")\n",
"plt.show()\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Congratulations! \n",
"You've completed the foundational Machine Learning practice series. \n",
"You now have hands-on experience with:\n",
"1. EDA & Feature Engineering\n",
"2. Linear Regression\n",
"3. Logistic Regression\n",
"4. Decision Trees & Random Forests\n",
"5. K-Means Clustering\n",
"\n",
"Keep practicing with new datasets!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}