02 Mai, 2017
In the following, we will try alternative cost functions for data set data1
.
\[\begin{eqnarray} J \left(\begin{pmatrix}\theta_0\\ \theta_1\end{pmatrix} \right) &=& \frac{1}{2m}\sum_{i=1}^{m} (h(x^{(i)})-y^{(i)})^2 \\ &=& \frac{1}{2m}\sum_{i=1}^{m} ((\theta_0 + \theta_1 x^{(i)})-y^{(i)})^2 \\ &=& \frac{1}{2m}(X\times\Theta'-y)'\times (X\times\Theta'-y) \end{eqnarray}\]
# experiment with cost functions
### original cost function
# graphical parameter
par(lwd=2,pch=4)
### Play with data and linear regression
n = 120 # number of samples
m = 10 # max value of x (min is zero)
X <- seq(from=0,to=m,length.out=n)
y <- rep(1,times=n)
y[seq(from=1,to=n,by=3)] <- 4
data1 <- cbind(X,y)
data <- data1
plot(data,col="red",main="data1")
Looking at the data it is clear that the optimal value for \(\theta_1\) is \(0\). Let \(J_0\) denote the cost function with \(\theta_1=0\) fixed, thus \(J_0(\theta)=J(\theta,0)\).
# vary only theta0
theta0_vals <- seq(from=0,to=4,length.out=100)
cost_vals <- rep(1,times=100)
for (i in 1:100){
cost_vals[i] <- computeCost(data,c(theta0_vals[i],0))
}
plot(theta0_vals,cost_vals,type="l",main="Original cost function with fixed theta1=0")
plotCostSurface_pure(data,computeCost,c(-3,3),c(-2,2))
val_vec<-c(0,1,1.5,2,2.5,3,4)
val_df <- data.frame()
for (i in val_vec){
val_df<- rbind(val_df,c(i,round(computeCost(data,c(i,0)),digits=3)))
}
names(val_df) <- c("t0","cost")
val_df$t1 <- 0
val_df <-val_df[,c(1,3,2)]
You must enable Javascript to view this page properly.
The optimal \(\theta_0\)-value for the original cost function is \(2\). That gives the following linear fit:
plotLinearFit(data,c(2,0))
You must enable Javascript to view this page properly.
### cost function 1
data <- data1
# vary only theta0
theta0_vals <- seq(from=0,to=4,length.out=100)
cost_vals <- rep(1,times=100)
for (i in 1:100){
cost_vals[i] <- computeCost1(data,c(theta0_vals[i],0))
}
plot(theta0_vals,cost_vals,type="l")
plotCostSurface_pure(data,computeCost1,c(-3,3),c(-2,2))
cost1<- c()
for (i in val_vec){
cost1<- c(cost1,round(computeCost1(data,c(i,0)),digits=3))
}
val_df$cost1 <- cost1
You must enable Javascript to view this page properly.
The optimal \(\theta_0\)-value for the 1st alternative cost function is \(1\). That gives the following linear fit:
plotLinearFit(data,c(1,0))
You must enable Javascript to view this page properly.
Careful: the cost specified in the subtitle of the plot is the cost computed by the original cost function. The cost for the 1. alternative cost function given the parameter \(\theta_0=2\) and \(\theta_1=0\) is \(1\) (\(=\frac{1}{m}(0\frac{2m}{3}+3\frac{m}{3})\).
## experiment with cost functions
### cost function 2
data <- data1
# vary only theta0
theta0_vals <- seq(from=0,to=4,length.out=100)
cost_vals <- rep(1,times=100)
for (i in 1:100){
cost_vals[i] <- computeCost2(data,c(theta0_vals[i],0))
}
plot(theta0_vals,cost_vals,type="l")
plotCostSurface_pure(data,computeCost2,c(-3,3),c(-2,2))
cost2<-c()
for (i in val_vec){
cost2<- c(cost2,round(computeCost2(data,c(i,0)),digits=3))
}
val_df$cost2 <- cost2
print("cost values for selected t0 values")
## [1] "cost values for selected t0 values"
print(val_df,row.names=FALSE)
## t0 t1 cost cost1 cost2
## 0.0 0 3.000 2.000 240
## 1.0 0 1.500 1.000 120
## 1.5 0 1.125 1.167 60
## 2.0 0 1.000 1.333 0
## 2.5 0 1.125 1.500 60
## 3.0 0 1.500 1.667 120
## 4.0 0 3.000 2.000 240
You must enable Javascript to view this page properly.
The optimal \(\theta_0\)-value for the 2nd alternative cost function is \(2\). That gives the following linear fit:
plotLinearFit(data,c(2,0))
You must enable Javascript to view this page properly.
Careful: the cost specified in the subtitle of the plot is the cost computed by the original cost function. The cost for the 2. alternative cost function given the parameter \(\theta_0=2\) and \(\theta_1=0\) is \(0\)(\(=-1\frac{2m}{3}+2\frac{m}{3})\).
#######################################################
## Experiments on setting the initial parameters
# data preparation
X <- seq(from=0,to=m,length.out=n)
y <- 2 + 4*X
res_vec <- rnorm(n, mean = 0, sd = 8)
y <- y + res_vec
data4 <- cbind(X,y)
data <- data4 # select data you are interested in
X <- cbind(rep(1,times=dim(data)[1]),data[,1])
y <- data[,2]
# plot data
plot(data,col="red")
# gradient descent
theta_init = c(-20,-10)
iterations = 1000
alpha = 0.058 # try differnt values
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
plotCostSurface(data,grad_desc)
You must enable Javascript to view this page properly.
data4
theta_init=
-20, -10# gradient descent
iterations = 100
alpha = 0.06 # try differnt values
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
plotCostSurface(data,grad_desc)
You must enable Javascript to view this page properly.
\(\alpha=0.06\)
Is there a way to see before we draw the contour plot of the cost function, that something goes miss?
Yes, plot the cost at iteration step function:
# plot cost development
plotCostDev(grad_desc)
You must enable Javascript to view this page properly.
iterations
?theta_init
?alpha
?iterations
value:not a good idea
iterations = 10
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
plotCostDev(grad_desc)
You must enable Javascript to view this page properly.
theta_init
value:could work, but but only if one knows the optimal fit
iterations = 1000
alpha = 0.02
grad_desc <- gradientDescent(data, theta_init,0.02, iterations)
theta_opt <- grad_desc$theta
alpha = 0.058
grad_desc <- gradientDescent(data, theta_opt, alpha, iterations)
plotCostDev(grad_desc)
plotCostSurface(data,grad_desc)
You must enable Javascript to view this page properly.
theta_init
= 3.4571955, 3.7885747
iterations = 1000
alpha
value:Yes, but don’t make it too small:
alpha = 0.00002
theta_init = c(-20,-10)
iterations = 1000
alpha = 0.00002 # try differnt values
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
plotCostDev(grad_desc)
plotLinearFit(data,grad_desc$theta)
plotCostSurface(data,grad_desc)
You must enable Javascript to view this page properly.
Note, the cost at iteration steps function is still decreasing
theta_init
to some reasonable valuetheta_init = c(0,0)
iterations = 1000
alpha = 0.02 # try differnt values
grad_desc <- gradientDescent(data, theta_init, alpha, iterations)
plotCostDev(grad_desc)
plotLinearFit(data,grad_desc$theta)
plotCostSurface(data,grad_desc)
You must enable Javascript to view this page properly.
Also control the last cost values you received and how much the theta value is still changing:
values <- cbind(last(grad_desc$cost_vec,10),
last(grad_desc$theta_vec[,1],10),
last(grad_desc$theta_vec[,2],10))
colnames(values) <- c("last 10 cost values","last 10 theta0 values","last 10 theta1 values")
kable(values,caption="important controll values",row.names=FALSE)
last 10 cost values | last 10 theta0 values | last 10 theta1 values |
---|---|---|
36.90818 | 3.582604 | 3.769703 |
36.90818 | 3.582712 | 3.769686 |
36.90818 | 3.582819 | 3.769670 |
36.90817 | 3.582925 | 3.769654 |
36.90817 | 3.583031 | 3.769638 |
36.90817 | 3.583136 | 3.769622 |
36.90817 | 3.583241 | 3.769607 |
36.90817 | 3.583345 | 3.769591 |
36.90817 | 3.583449 | 3.769575 |
36.90817 | 3.583552 | 3.769560 |
This looks good, nearly no changes anymore, the cost function has converged.