Read and plot data

# graphical parameter
par(lwd=2,pch=4)
# read data
data <-read.table('ex1data1.txt',
                  sep=",", 
                  encoding="UTF-8",
                  header=FALSE
)

# plot Data
plot(data, col="red",
     xlab='Population of City in 10,000s',ylab='Profit in $10,000s'
     )

Gradient descent: initial values

# Some gradient descent settings
theta_init = c(0,0); # initialize fitting parameters
iterations = 1500
alpha = 0.01

# compute initial cost
cost_init <- computeCost(data, theta_init)

Basic parameters for gradient descent:
* $\alpha$ : learning rate 0.01
* $\Theta_{\text{init}}$ : vector of initial parameters $\begin{pmatrix} 0 \\ 0 \end{pmatrix}$
* iterations: 1500
* initial cost: $J(\Theta_{\text{init}})= J\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\right) = 32.0727339$

Gradient descent: results

# run gradient descent
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
theta <- grad_desc$theta
theta_vec <- grad_desc$theta_vec
cost_final <- computeCost(data,theta)

# plot cost development
plotCostDev(grad_desc)

# Plot the linear fit
plotLinearFit(data,grad_desc$theta)

resulting theta: $\Theta=\begin{pmatrix} -3.6302914 \\ 1.1663624 \end{pmatrix}$
resulting cost: $J(\Theta)= 4.4833883$

Gradient descent: some predicted values

# Predict values for population sizes of 35,000 and 70,000
predict1 = h(theta,c(1, 3.5))
predict2 = h(theta,c(1, 7))

For population = 35,000, we predict a profit of 4520
For population = 70,000, we predict a profit of 45342

Contour plot of the cost function

plotCostSurface(data,grad_desc)

cut through contour plot: $\theta_1$ fixed to optimum

#######################################################
# only vary theta0
t1 <- theta[2]
t0_vals <- seq(from=-10, to =10, length.out=100)
J0_vals <- rep(0,times=length(t0_vals))
for (i in 1:length(t0_vals)){
  J0_vals[i] <- computeCost(data,c(t0_vals[i],t1))
}

plot(t0_vals,J0_vals,type="l",col="red",xlab="t0",ylab="cost",main=paste("t1 fixed to optimum: ",round(theta[2],digits=3)))

cut through contour plot: $\theta_0$ fixed to optimum

# only vary theta1
t0 <- theta[1]
t1_vals <- seq(from=-1, to =4, length.out=100)
J1_vals <- rep(0,times=length(t1_vals))
for (i in 1:length(t1_vals)){
  J1_vals[i] <- computeCost(data,c(t0,t1_vals[i]))
}

plot(t1_vals,J1_vals,type="l",col="red",xlab="t1",main=paste("t0 fixed to optimum: ",round(theta[1],digits=3)))

Questions

What happens, if one starts gradient descent with the optimal $\Theta$ -value?
Does gradient descent ever reach the optimal $\theta$ -value?
Leiten Sie die Formeln für die partiellen Ableitung $\frac{\delta}{\delta\theta_0}J(\Theta)$ und $\frac{\delta}{\delta\theta_1}J(\Theta)$ her.
Nennen Sie drei degenerierte Matrizen (also Matrizen, die kein Inverses haben).

Machine Learning - Session 1: Linear Regression

Part1: Example data from Andrew Ng’s course

Wiebke Petersen

01 Mai, 2017

Read and plot data

Gradient descent: initial values

Gradient descent: results

Gradient descent: some predicted values

Contour plot of the cost function

cut through contour plot: $\theta_1$ fixed to optimum

cut through contour plot: $\theta_0$ fixed to optimum

Questions

Machine Learning - Session 1: Linear Regression

Part1: Example data from Andrew Ng’s course

Wiebke Petersen

01 Mai, 2017

Read and plot data

Gradient descent: initial values

Gradient descent: results

Gradient descent: some predicted values

Contour plot of the cost function

cut through contour plot: θ1\theta_1 fixed to optimum

cut through contour plot: θ0\theta_0 fixed to optimum

Questions

cut through contour plot: $\theta_1$ fixed to optimum

cut through contour plot: $\theta_0$ fixed to optimum