241005
This commit is contained in:
		
							
								
								
									
										241
									
								
								week2/C1_W2_Lab06_Sklearn_Normal_Soln.ipynb
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										241
									
								
								week2/C1_W2_Lab06_Sklearn_Normal_Soln.ipynb
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,241 @@
 | 
			
		||||
{
 | 
			
		||||
 "cells": [
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "# Optional Lab: Linear Regression using Scikit-Learn"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
 | 
			
		||||
    "\n"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## Goals\n",
 | 
			
		||||
    "In this lab you will:\n",
 | 
			
		||||
    "- Utilize  scikit-learn to implement linear regression using a close form solution based on the normal equation"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## Tools\n",
 | 
			
		||||
    "You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "import numpy as np\n",
 | 
			
		||||
    "import matplotlib.pyplot as plt\n",
 | 
			
		||||
    "from sklearn.linear_model import LinearRegression\n",
 | 
			
		||||
    "from lab_utils_multi import load_house_data\n",
 | 
			
		||||
    "plt.style.use('./deeplearning.mplstyle')\n",
 | 
			
		||||
    "np.set_printoptions(precision=2)"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "<a name=\"toc_40291_2\"></a>\n",
 | 
			
		||||
    "# Linear Regression, closed-form solution\n",
 | 
			
		||||
    "Scikit-learn has the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) which implements a closed-form linear regression.\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "Let's use the data from the early labs - a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "| Size (1000 sqft)     | Price (1000s of dollars) |\n",
 | 
			
		||||
    "| ----------------| ------------------------ |\n",
 | 
			
		||||
    "| 1               | 300                      |\n",
 | 
			
		||||
    "| 2               | 500                      |\n"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "### Load the data set"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "X_train = np.array([1.0, 2.0])   #features\n",
 | 
			
		||||
    "y_train = np.array([300, 500])   #target value"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "### Create and fit the model\n",
 | 
			
		||||
    "The code below performs regression using scikit-learn. \n",
 | 
			
		||||
    "The first step creates a regression object.  \n",
 | 
			
		||||
    "The second step utilizes one of the methods associated with the object, `fit`. This performs regression, fitting the parameters to the input data. The toolkit expects a two-dimensional X matrix."
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "linear_model = LinearRegression()\n",
 | 
			
		||||
    "#X must be a 2-D Matrix\n",
 | 
			
		||||
    "linear_model.fit(X_train.reshape(-1, 1), y_train) "
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "### View Parameters \n",
 | 
			
		||||
    "The $\\mathbf{w}$ and $\\mathbf{b}$ parameters are referred to as 'coefficients' and 'intercept' in scikit-learn."
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "b = linear_model.intercept_\n",
 | 
			
		||||
    "w = linear_model.coef_\n",
 | 
			
		||||
    "print(f\"w = {w:}, b = {b:0.2f}\")\n",
 | 
			
		||||
    "print(f\"'manual' prediction: f_wb = wx+b : {1200*w + b}\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "### Make Predictions\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "Calling the `predict` function generates predictions."
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "y_pred = linear_model.predict(X_train.reshape(-1, 1))\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "print(\"Prediction on training set:\", y_pred)\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "X_test = np.array([[1200]])\n",
 | 
			
		||||
    "print(f\"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## Second Example\n",
 | 
			
		||||
    "The second example is from an earlier lab with multiple features. The final parameter values and predictions are very close to the results from the un-normalized 'long-run' from that lab. That un-normalized run took hours to produce results, while this is nearly instantaneous. The closed-form solution work well on smaller data sets such as these but can be computationally demanding on larger data sets. \n",
 | 
			
		||||
    ">The closed-form solution does not require normalization."
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "# load the dataset\n",
 | 
			
		||||
    "X_train, y_train = load_house_data()\n",
 | 
			
		||||
    "X_features = ['size(sqft)','bedrooms','floors','age']"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "linear_model = LinearRegression()\n",
 | 
			
		||||
    "linear_model.fit(X_train, y_train) "
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "b = linear_model.intercept_\n",
 | 
			
		||||
    "w = linear_model.coef_\n",
 | 
			
		||||
    "print(f\"w = {w:}, b = {b:0.2f}\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": [
 | 
			
		||||
    "print(f\"Prediction on training set:\\n {linear_model.predict(X_train)[:4]}\" )\n",
 | 
			
		||||
    "print(f\"prediction using w,b:\\n {(X_train @ w + b)[:4]}\")\n",
 | 
			
		||||
    "print(f\"Target values \\n {y_train[:4]}\")\n",
 | 
			
		||||
    "\n",
 | 
			
		||||
    "x_house = np.array([1200, 3,1, 40]).reshape(-1,4)\n",
 | 
			
		||||
    "x_house_predict = linear_model.predict(x_house)[0]\n",
 | 
			
		||||
    "print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}\")"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "markdown",
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "source": [
 | 
			
		||||
    "## Congratulations!\n",
 | 
			
		||||
    "In this lab you:\n",
 | 
			
		||||
    "- utilized an open-source machine learning toolkit, scikit-learn\n",
 | 
			
		||||
    "- implemented linear regression using a close-form solution from that toolkit"
 | 
			
		||||
   ]
 | 
			
		||||
  },
 | 
			
		||||
  {
 | 
			
		||||
   "cell_type": "code",
 | 
			
		||||
   "execution_count": null,
 | 
			
		||||
   "metadata": {},
 | 
			
		||||
   "outputs": [],
 | 
			
		||||
   "source": []
 | 
			
		||||
  }
 | 
			
		||||
 ],
 | 
			
		||||
 "metadata": {
 | 
			
		||||
  "kernelspec": {
 | 
			
		||||
   "display_name": "Python 3",
 | 
			
		||||
   "language": "python",
 | 
			
		||||
   "name": "python3"
 | 
			
		||||
  },
 | 
			
		||||
  "language_info": {
 | 
			
		||||
   "codemirror_mode": {
 | 
			
		||||
    "name": "ipython",
 | 
			
		||||
    "version": 3
 | 
			
		||||
   },
 | 
			
		||||
   "file_extension": ".py",
 | 
			
		||||
   "mimetype": "text/x-python",
 | 
			
		||||
   "name": "python",
 | 
			
		||||
   "nbconvert_exporter": "python",
 | 
			
		||||
   "pygments_lexer": "ipython3",
 | 
			
		||||
   "version": "3.8.10"
 | 
			
		||||
  }
 | 
			
		||||
 },
 | 
			
		||||
 "nbformat": 4,
 | 
			
		||||
 "nbformat_minor": 5
 | 
			
		||||
}
 | 
			
		||||
		Reference in New Issue
	
	Block a user