{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 05 - K-Means Clustering\n", "\n", "Welcome to the final module of this basic series! We are exploring **Unsupervised Learning** with **K-Means Clustering**.\n", "\n", "### Objectives:\n", "1. **Unsupervised Learning**: Pattern discovery without labels.\n", "2. **K-Means**: How the algorithm groups data.\n", "3. **Elbow Method**: Deciding the number of clusters (K).\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "We will generate a synthetic dataset for this exercise to clearly see the clusters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.cluster import KMeans\n", "from sklearn.datasets import make_blobs\n", "\n", "# Generate synthetic data\n", "X, _ = make_blobs(n_samples=500, centers=4, cluster_std=1.0, random_state=42)\n", "df = pd.DataFrame(X, columns=['Feature 1', 'Feature 2'])\n", "\n", "plt.scatter(df['Feature 1'], df['Feature 2'], s=30, alpha=0.5)\n", "plt.title(\"Original Data (Unlabeled)\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. K-Means Implementation\n", "\n", "### Task 1: Find Optimal K (Elbow Method)\n", "Calculate inertia (Within-Cluster Sum of Squares) for K values from 1 to 10." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "inertia = []\n", "for k in range(1, 11):\n", " kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)\n", " kmeans.fit(X)\n", " inertia.append(kmeans.inertia_)\n", "\n", "plt.plot(range(1, 11), inertia, 'bx-')\n", "plt.xlabel('K values')\n", "plt.ylabel('Inertia')\n", "plt.title('Elbow Method')\n", "plt.show()\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: Fit K-Means\n", "From the elbow plot, choose the best K (looks like 4) and fit the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)\n", "df['cluster'] = kmeans.fit_predict(X)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 3: Visualize Clusters\n", "Scatter plot again, but color points by their assigned cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "plt.scatter(df['Feature 1'], df['Feature 2'], c=df['cluster'], cmap='viridis', s=30)\n", "plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red', s=200, marker='X', label='Centroids')\n", "plt.legend()\n", "plt.title(\"Clustered Data\")\n", "plt.show()\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Congratulations! \n", "You've completed the foundational Machine Learning practice series. \n", "You now have hands-on experience with:\n", "1. EDA & Feature Engineering\n", "2. Linear Regression\n", "3. Logistic Regression\n", "4. Decision Trees & Random Forests\n", "5. K-Means Clustering\n", "\n", "Keep practicing with new datasets!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }