Title: | Sparse Additive Modelling |
---|---|
Description: | Computationally efficient tools for high dimensional predictive modeling (regression and classification). SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks. |
Authors: | Haoming Jiang, Yukun Ma, Han Liu, Kathryn Roeder, Xingguo Li, and Tuo Zhao |
Maintainer: | Haoming Jiang <[email protected]> |
License: | GPL-2 |
Version: | 1.1.3 |
Built: | 2024-10-08 04:31:57 UTC |
Source: | https://github.com/hmjianggatech/sam |
The package SAM targets at high dimensional predictive modeling (regression and classification) for complex data analysis. SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.
Package: | SAM |
Type: | Package |
Version: | 1.0.5 |
Date: | 2014-02-11 |
License: | GPL-2 |
Tuo Zhao, Xingguo Li, Haoming Jiang, Han Liu, and Kathryn Roeder
Maintainers: Haoming Jiang<[email protected]>;
P. Ravikumar, J. Lafferty, H.Liu and L. Wasserman. "Sparse Additive Models", Journal of Royal Statistical Society: Series B, 2009.
T. Zhao and H.Liu. "Sparse Additive Machine", International Conference on Artificial Intelligence and Statistics, 2012.
"samEL"
This function plots the regularization path (regularization parameter versus functional norm)
## S3 method for class 'samEL' plot(x, ...)
## S3 method for class 'samEL' plot(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The horizontal axis is for the regularization parameters in log scale. The vertical axis is for the functional norm of each component.
"samHL"
This function plots the regularization path (regularization parameter versus functional norm)
## S3 method for class 'samHL' plot(x, ...)
## S3 method for class 'samHL' plot(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The horizontal axis is for the regularization parameters in log scale. The vertical axis is for the functional norm of each component.
"samLL"
This function plots the regularization path (regularization parameter versus functional norm)
## S3 method for class 'samLL' plot(x, ...)
## S3 method for class 'samLL' plot(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The horizontal axis is for the regularization parameters in log scale. The vertical axis is for the functional norm of each component.
"samQL"
This function plots the regularization path (regularization parameter versus functional norm)
## S3 method for class 'samQL' plot(x, ...)
## S3 method for class 'samQL' plot(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The horizontal axis is for the regularization parameters in log scale. The vertical axis is for the functional norm of each component.
"samEL"
Predict the labels for testing data.
## S3 method for class 'samEL' predict(object, newdata, ...)
## S3 method for class 'samEL' predict(object, newdata, ...)
object |
An object with S3 class |
newdata |
The testing dataset represented in a |
... |
System reserved (No specific usage) |
The testing dataset is rescale to the samELe range, and expanded by the samELe spline basis functions as the training data.
expectations |
Estimated expected counts also represented in a |
"samHL"
Predict the labels for testing data.
## S3 method for class 'samHL' predict(object, newdata, thol = 0, ...)
## S3 method for class 'samHL' predict(object, newdata, thol = 0, ...)
object |
An object with S3 class |
newdata |
The testing dataset represented in a |
thol |
The decision value threshold for prediction. The default value is 0.5 |
... |
System reserved (No specific usage) |
The testing dataset is rescale to the samHLe range, and expanded by the samHLe spline basis functions as the training data.
values |
Predicted decision values also represented in a |
labels |
Predicted labels also represented in a |
"samLL"
Predict the labels for testing data.
## S3 method for class 'samLL' predict(object, newdata, thol = 0.5, ...)
## S3 method for class 'samLL' predict(object, newdata, thol = 0.5, ...)
object |
An object with S3 class |
newdata |
The testing dataset represented in a |
thol |
The decision value threshold for prediction. The default value is 0.5 |
... |
System reserved (No specific usage) |
The testing dataset is rescale to the samLLe range, and expanded by the samLLe spline basis functions as the training data.
probs |
Estimated Posterior Probability for Prediction also represented in a |
labels |
Predicted labels also represented in a |
"samQL"
Predict the labels for testing data.
## S3 method for class 'samQL' predict(object, newdata, ...)
## S3 method for class 'samQL' predict(object, newdata, ...)
object |
An object with S3 class |
newdata |
The testing dataset represented in a |
... |
System reserved (No specific usage) |
The testing dataset is rescale to the samQLe range, and expanded by the samQLe spline basis functions as the training data.
values |
Predicted values also represented in a |
"samEL"
Summarize the information of the object with S3 class samEL
.
## S3 method for class 'samEL' print(x, ...)
## S3 method for class 'samEL' print(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The output includes length and d.f. of the regularization path.
"samHL"
Summarize the information of the object with S3 class samHL
.
## S3 method for class 'samHL' print(x, ...)
## S3 method for class 'samHL' print(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The output includes length and d.f. of the regularization path.
"samLL"
Summarize the information of the object with S3 class samLL
.
## S3 method for class 'samLL' print(x, ...)
## S3 method for class 'samLL' print(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The output includes length and d.f. of the regularization path.
"samQL"
Summarize the information of the object with S3 class samQL
.
## S3 method for class 'samQL' print(x, ...)
## S3 method for class 'samQL' print(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
The output includes length and d.f. of the regularization path.
The log-linear model is learned using training data.
samEL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.25, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
samEL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.25, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
X |
The |
y |
The |
p |
The number of basis spline functions. The default value is 3. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda. Supply instead a decreasing sequence of lambda values. samEL relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default is 0.1. |
thol |
Stopping precision. The default value is 1e-5. |
max.ite |
The number of maximum iterations. The default value is 1e5. |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
We adopt various computational algorithms including the block coordinate descent, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by "warm-start" and "active-set" tricks.
p |
The number of basis spline functions used in training. |
X.min |
A vector with each entry corresponding to the minimum of each input variable. (Used for rescaling in testing) |
X.ran |
A vector with each entry corresponding to the range of each input variable. (Used for rescaling in testing) |
lambda |
A sequence of regularization parameter used in training. |
w |
The solution path matrix ( |
df |
The degree of freedom of the solution path (The number of non-zero component function) |
knots |
The |
Boundary.knots |
The |
func_norm |
The functional norm matrix ( |
SAM
,plot.samEL,print.samEL,predict.samEL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1) y = rep(0,n) for(i in 1:n) y[i] = rpois(1,u[i]) ## Training out.trn = samEL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1) yt = rep(0,nt) for(i in 1:nt) yt[i] = rpois(1,ut[i]) ## predicting response out.tst = predict(out.trn,Xt)
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1) y = rep(0,n) for(i in 1:n) y[i] = rpois(1,u[i]) ## Training out.trn = samEL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1) yt = rep(0,nt) for(i in 1:nt) yt[i] = rpois(1,ut[i]) ## predicting response out.tst = predict(out.trn,Xt)
The classifier is learned using training data.
samHL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.4, thol = 1e-05, mu = 0.05, max.ite = 1e+05, w = NULL )
samHL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.4, thol = 1e-05, mu = 0.05, max.ite = 1e+05, w = NULL )
X |
The |
y |
The |
p |
The number of basis spline functions. The default value is 3. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda. Supply instead a decreasing sequence of lambda values. samHL relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default is 0.4. |
thol |
Stopping precision. The default value is 1e-5. |
mu |
Smoothing parameter used in approximate the Hinge Loss. The default value is 0.05. |
max.ite |
The number of maximum iterations. The default value is 1e5. |
w |
The |
We adopt various computational algorithms including the block coordinate descent, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by "warm-start" and "active-set" tricks.
p |
The number of basis spline functions used in training. |
X.min |
A vector with each entry corresponding to the minimum of each input variable. (Used for rescaling in testing) |
X.ran |
A vector with each entry corresponding to the range of each input variable. (Used for rescaling in testing) |
lambda |
A sequence of regularization parameter used in training. |
w |
The solution path matrix ( |
df |
The degree of freedom of the solution path (The number of non-zero component function) |
knots |
The |
Boundary.knots |
The |
func_norm |
The functional norm matrix ( |
SAM
,plot.samHL,print.samHL,predict.samHL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) ## Training out.trn = samHL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) ## predicting response out.tst = predict(out.trn,Xt)
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) ## Training out.trn = samHL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) ## predicting response out.tst = predict(out.trn,Xt)
The logistic model is learned using training data.
samLL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.1, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
samLL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.1, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
X |
The |
y |
The |
p |
The number of basis spline functions. The default value is 3. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda. Supply instead a decreasing sequence of lambda values. samLL relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit. |
nlambda |
The number of lambda values. The default value is 20. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default is 0.1. |
thol |
Stopping precision. The default value is 1e-5. |
max.ite |
The number of maximum iterations. The default value is 1e5. |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
We adopt various computational algorithms including the block coordinate descent, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by "warm-start" and "active-set" tricks.
p |
The number of basis spline functions used in training. |
X.min |
A vector with each entry corresponding to the minimum of each input variable. (Used for rescaling in testing) |
X.ran |
A vector with each entry corresponding to the range of each input variable. (Used for rescaling in testing) |
lambda |
A sequence of regularization parameter used in training. |
w |
The solution path matrix ( |
df |
The degree of freedom of the solution path (The number of non-zero component function) |
knots |
The |
Boundary.knots |
The |
func_norm |
The functional norm matrix ( |
SAM
,plot.samLL,print.samLL,predict.samLL
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) y = sign(y==1) ## Training out.trn = samLL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) yt = sign(yt==1) ## predicting response out.tst = predict(out.trn,Xt)
## generating training data n = 200 d = 100 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y y = y*sign(runif(n)-0.05) y = sign(y==1) ## Training out.trn = samLL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06) ## flipping about 5 percent of y yt = yt*sign(runif(nt)-0.05) yt = sign(yt==1) ## predicting response out.tst = predict(out.trn,Xt)
The regression model is learned using training data.
samQL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.005, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
samQL( X, y, p = 3, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.005, thol = 1e-05, max.ite = 1e+05, regfunc = "L1" )
X |
The |
y |
The |
p |
The number of basis spline functions. The default value is 3. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda. Supply instead a decreasing sequence of lambda values. samQL relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit. |
nlambda |
The number of lambda values. The default value is 30. |
lambda.min.ratio |
Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default is 5e-3. |
thol |
Stopping precision. The default value is 1e-5. |
max.ite |
The number of maximum iterations. The default value is 1e5. |
regfunc |
A string indicating the regularizer. The default value is "L1". You can also assign "MCP" or "SCAD" to it. |
We adopt various computational algorithms including the block coordinate descent, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by "warm-start" and "active-set" tricks.
p |
The number of basis spline functions used in training. |
X.min |
A vector with each entry corresponding to the minimum of each input variable. (Used for rescaling in testing) |
X.ran |
A vector with each entry corresponding to the range of each input variable. (Used for rescaling in testing) |
lambda |
A sequence of regularization parameter used in training. |
w |
The solution path matrix ( |
intercept |
The solution path of the intercept. |
df |
The degree of freedom of the solution path (The number of non-zero component function) |
knots |
The |
Boundary.knots |
The |
func_norm |
The functional norm matrix ( |
sse |
Sums of square errors of the solution path. |
SAM
,plot.samQL,print.samQL,predict.samQL
## generating training data n = 100 d = 500 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) ## generating response y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1 ## Training out.trn = samQL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1 ## predicting response out.tst = predict(out.trn,Xt)
## generating training data n = 100 d = 500 X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d) ## generating response y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1 ## Training out.trn = samQL(X,y) out.trn ## plotting solution path plot(out.trn) ## generating testing data nt = 1000 Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d) yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1 ## predicting response out.tst = predict(out.trn,Xt)