笔记Andrew Ng:Machine Learning Week1
IntroductionWhat is Machine LearningdefinitionArthur Samuel:(an older, informal definition)the field of study that gives computers the ability to learn without being explicitly programmed.Tom M...
笔记Andrew Ng:Machine Learning Week1
)
一、Introduction
What is Machine Learning
definition
Arthur Samuel:(an older, informal definition)
the field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell:(a more modern definition)
A computer program is said to learn from experience E
with respect to some class of tasks T
and performance measure P,
if its performance at tasks in T, as measured by P, improves with experience E.
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.
Supervised Learning
In supervised learning, we are given a data set and already know what our correct output should look like,
having the idea that there is a relationship between the input and the output.
regression problem or classification problem
Supervised learning problems are categorized into “regression” and “classification” problems:
- In a regression problem, we are trying to predict results within a continuous output,
meaning that we are trying to map input variables to some continuous function. - In a classification problem, we are instead trying to predict results in a discrete output.
In other words, we are trying to map input variables into discrete categories.
Example:
- Given data about the size of houses on the real estate market, try to predict their price.
Price as a function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about
whether the house “sells for more or less than the asking price.”
Here we are classifying the houses based on price into two discrete categories. - (a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture.
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
Unsupervised Learning
Unsupervised learning allows us to approach problems with little or no idea what our results should look like.
We can derive structure from data where we don’t necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
With unsupervised learning there is no feedback based on the prediction results.
Example: Clustering or Non-clustering
- Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups
that are somehow similar or related by different variables, such as lifespan, location, roles, and so on. - Non-clustering: The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment.
(i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
二、Linear Regression with One Variable
Linear regression predicts a real-valued output based on an input value. We discuss the application of linear regression to housing price prediction, present the notion of a cost function, and introduce the gradient descent method for learning.
Model and Cost Function
Model Representation


When the target variable that we’re trying to predict is continuous, such as in our housing example,
we call the learning problem a regression problem. When y can take on only a small number of discrete values
(such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say),
we call it a classification problem.
Cost Function

Cost Function - Intuition 1
Cost Function - Intuition 2
Parameter Learning
Gradient Descent

note: simultaneously update
learning rate
The way we do this is by taking the derivative (the tangential line to a function) of our cost function. The slope of the tangent is the derivative at that point and it will give us a direction to move towards.
We make steps down the cost function in the direction with the steepest descent.
The size of each step is determined by the parameter α, which is called the learning rate.
A smaller α would result in a smaller step and a larger α results in a larger step.
Gradient Descent Intuition

learning rate
- We should adjust our parameter α to ensure that the gradient descent algorithm converges in a reasonable time. Failure to converge or too much time to obtain the minimum value imply that our step size is wrong.

- How does gradient descent converge with a fixed step size α?

3.gradient descent will automatically take smaller steps
Gradient Descent For Linear Regression
when specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived.
The point: if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.
This method looks at every example in the entire traing set on every step,and is called batch gradient descent.
Note that, while gradient descent can be susceptible to local minima in general, the optimization problem we have posed here for linear regression ++has only one global, and no other local optima++; thus gradient descent always converges (assuming the learning rate α is not too large) to the global minimum. Indeed, ++J is a convex quadratic function++. Here is an example of gradient descent as it is run to minimize a quadratic function.
更多推荐





所有评论(0)