Read and plot data

# graphical parameter
par(lwd=2,pch=4)
# read data
data <-read.table('ex1data1.txt',
                  sep=",", 
                  encoding="UTF-8",
                  header=FALSE
)

# plot Data
plot(data, col="red",
     xlab='Population of City in 10,000s',ylab='Profit in $10,000s'
     )  

Gradient descent: initial values

# Some gradient descent settings
theta_init = c(0,0); # initialize fitting parameters
iterations = 1500
alpha = 0.01

# compute initial cost
cost_init <- computeCost(data, theta_init)

Basic parameters for gradient descent:
* \(\alpha\): learning rate 0.01
* \(\Theta_{\text{init}}\): vector of initial parameters \(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\)
* iterations: 1500
* initial cost: \(J(\Theta_{\text{init}})= J\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\right) = 32.0727339\)

Gradient descent: results

# run gradient descent
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
theta <- grad_desc$theta
theta_vec <- grad_desc$theta_vec
cost_final <- computeCost(data,theta)

# plot cost development
plotCostDev(grad_desc)

# Plot the linear fit
plotLinearFit(data,grad_desc$theta)

  • resulting theta: \(\Theta=\begin{pmatrix} -3.6302914 \\ 1.1663624 \end{pmatrix}\)
  • resulting cost: \(J(\Theta)= 4.4833883\)

Gradient descent: some predicted values

# Predict values for population sizes of 35,000 and 70,000
predict1 = h(theta,c(1, 3.5))
predict2 = h(theta,c(1, 7))

For population = 35,000, we predict a profit of 4520
For population = 70,000, we predict a profit of 45342

Contour plot of the cost function

plotCostSurface(data,grad_desc)

You must enable Javascript to view this page properly.

cut through contour plot: \(\theta_1\) fixed to optimum

#######################################################
# only vary theta0
t1 <- theta[2]
t0_vals <- seq(from=-10, to =10, length.out=100)
J0_vals <- rep(0,times=length(t0_vals))
for (i in 1:length(t0_vals)){
  J0_vals[i] <- computeCost(data,c(t0_vals[i],t1))
}

plot(t0_vals,J0_vals,type="l",col="red",xlab="t0",ylab="cost",main=paste("t1 fixed to optimum: ",round(theta[2],digits=3)))

You must enable Javascript to view this page properly.

cut through contour plot: \(\theta_0\) fixed to optimum

# only vary theta1
t0 <- theta[1]
t1_vals <- seq(from=-1, to =4, length.out=100)
J1_vals <- rep(0,times=length(t1_vals))
for (i in 1:length(t1_vals)){
  J1_vals[i] <- computeCost(data,c(t0,t1_vals[i]))
}

plot(t1_vals,J1_vals,type="l",col="red",xlab="t1",main=paste("t0 fixed to optimum: ",round(theta[1],digits=3)))

You must enable Javascript to view this page properly.

Questions

  1. What happens, if one starts gradient descent with the optimal \(\Theta\)-value?
  2. Does gradient descent ever reach the optimal \(\theta\)-value?
  3. Leiten Sie die Formeln für die partiellen Ableitung \(\frac{\delta}{\delta\theta_0}J(\Theta)\) und \(\frac{\delta}{\delta\theta_1}J(\Theta)\) her.
  4. Nennen Sie drei degenerierte Matrizen (also Matrizen, die kein Inverses haben).