Title: | Conditional Multivariate t Distribution |
---|---|
Description: | The packages helps sample from the conditional multivariate t distribution. |
Authors: | Paul Kimani Kinyanjui, Cox Lwaka Tamba, Justin Obwoge Okenye |
Maintainer: | Paul Kimani Kinyanjui <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-10-31 21:07:10 UTC |
Source: | https://github.com/paulkinyanjui01/condmvt |
These functions provide the conditional location vector, scatter matrix, and degrees of freedom of [Y given X], where Z = (X,Y) is the fully-joint multivariate t distribution with location vector equal to mean, scatter matrix sigma, and degrees of freedom df.
CondMVT(mean, sigma, df, dependent.ind, given.ind, X.given, check.sigma = TRUE)
CondMVT(mean, sigma, df, dependent.ind, given.ind, X.given, check.sigma = TRUE)
mean |
location vector, which must be specified. |
sigma |
a symmetric, positive-definte matrix of dimension n x n, which must be specified. |
df |
degrees of freedom, which must be specified. |
dependent.ind |
a vector of integers denoting the indices of dependent variable Y. |
given.ind |
a vector of integers denoting the indices of conditoning variable X. |
X.given |
a vector of reals denoting the conditioning value of X. When both given.ind and X.given are missing, the distribution of Y becomes Z[dependent.ind] |
check.sigma |
logical; if TRUE, the scatter matrix is checked for appropriateness (symmetry, positive-definiteness). This could be set to FALSE if the user knows it is appropriate. |
# 10-dimensional multivariate normal distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) CondMVT(mean=rep(1,n), sigma=A, df=df, dependent=c(2,3,5), given=c(1,4,7,9),X.given=c(1,1,0,-1)) CondMVT(mean=rep(1,n), sigma=A, df=df, dep=3, given=c(1,4,7,9), X=c(1,1,0,-1))
# 10-dimensional multivariate normal distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) CondMVT(mean=rep(1,n), sigma=A, df=df, dependent=c(2,3,5), given=c(1,4,7,9),X.given=c(1,1,0,-1)) CondMVT(mean=rep(1,n), sigma=A, df=df, dep=3, given=c(1,4,7,9), X=c(1,1,0,-1))
These functions provide the density function and a random number generator for the conditional multivariate t distribution, [Y given X], where Z = (X,Y) is the fully-joint multivariate t distribution with location vector (or mode) equal to mean and covariance matrix sigma.
dcmvt(x, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, log = FALSE) rcmvt(n, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, method=c("eigen", "svd", "chol"))
dcmvt(x, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, log = FALSE) rcmvt(n, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, method=c("eigen", "svd", "chol"))
x |
vector or matrix of quantiles of Y. If x is a matrix, each row is taken to be a quantile. |
n |
number of random deviates. |
mean |
location vector, which must be specified. |
sigma |
a symmetric, positive-definte matrix of dimension n x n, which must be specified. |
df |
degrees of freedom, which must be specified. |
dependent.ind |
a vector of integers denoting the indices of dependent variable Y. |
given.ind |
a vector of integers denoting the indices of conditoning variable X. |
X.given |
a vector of reals denoting the conditioning value of X. When both given.ind and X.given are missing, the distribution of Y becomes Z[dependent.ind] |
check.sigma |
logical; if TRUE, the scatter matrix is checked for appropriateness (symmetry, positive-definiteness). This could be set to FALSE if the user knows it is appropriate. |
log |
logical; if TRUE, densities d are given as log(d). |
method |
string specifying the matrix decomposition used to determine the matrix root of sigma. Possible methods are eigenvalue decomposition ("eigen", default), singular value decomposition ("svd"), and Cholesky decomposition ("chol"). The Cholesky is typically fastest, not by much though. |
# 10-dimensional multivariate t distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) # density of Z[c(2,5)] given Z[c(1,4,7,9)]=c(1,1,0,-1) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,dependent.ind=c(2,5),df=df, given.ind=c(1,4,7,9), X.given=c(1,1,0,-1)) dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,df=df, dep=c(2,5)) # gives an error since `x' and `dep' are incompatibe #dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=c(2,3), # given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="eigen") rcmvt(n=10, mean=rep(1,n), sigma=A,df=df,dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="chol")
# 10-dimensional multivariate t distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) # density of Z[c(2,5)] given Z[c(1,4,7,9)]=c(1,1,0,-1) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,dependent.ind=c(2,5),df=df, given.ind=c(1,4,7,9), X.given=c(1,1,0,-1)) dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,df=df, dep=c(2,5)) # gives an error since `x' and `dep' are incompatibe #dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=c(2,3), # given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="eigen") rcmvt(n=10, mean=rep(1,n), sigma=A,df=df,dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="chol")
The sub-package contains subroutines for imputation of missing values as well as parameter estimation (for the location vector and the scatter matrix) in multivariate t distribution using the Expectation Maximization (EM) algorithm when the degrees of freedom are known. EM algorithm iteratively imputes the missing values and computes the estimates for the multivariate t parameters in two steps (E-step and M-step) as explained in Kinyanjui et al. (2021). For a single iteration, the function EM_onestep is run. For multiple iterations, the function EM_msteps is run. The function LIKE (specifying the likelihood) facilitates the setting of tolerance level for convergence of the algorithm (that is L(θ^(t+1))-L(θ^(t))≤δ, where δ is a set tolerance level and t denotes the number of iterations).
EM_onestep(Y,mu,Sigma,df) EM_msteps(Y,mu,Sigma,df,K,error)
EM_onestep(Y,mu,Sigma,df) EM_msteps(Y,mu,Sigma,df,K,error)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
K |
the number of iterations, which must be specified. |
error |
tolerance level for convergence of the EM algorithm. |
Kinyanjui, P.K., Tamba, C.L., & Okenye, J.O. (2021). Missing Data Imputation in a t -Distribution with Known Degrees of Freedom Via Expectation Maximization Algorithm and Its Stochastic Variants. International Journal of Applied Mathematics and Statistics.
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EM1=EM_onestep(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df) #Multiple Iterations (EM) EM=EM_msteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=3,K=1000,error=0.00001) #Results for Newly Completed Dataset (EM) EM$IMP #Newly completed Dataset (with imputed values) EM$mu #updated location vector EM$Sigma #updated scatter matrix EM$K1 #number of iterations the algorithm takes to converge
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EM1=EM_onestep(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df) #Multiple Iterations (EM) EM=EM_msteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=3,K=1000,error=0.00001) #Results for Newly Completed Dataset (EM) EM$IMP #Newly completed Dataset (with imputed values) EM$mu #updated location vector EM$Sigma #updated scatter matrix EM$K1 #number of iterations the algorithm takes to converge
The sub-package contains subroutines for imputation of missing values as well as parameter estimation (for the location vector and the scatter matrix) in multivariate t distribution using the Expectation Maximization (EM) algorithm when the degrees of freedom are known. EM algorithm imputes the missing values and computes the estimates for the multivariate t parameters in two steps (E-step and M-step) as explained in Kinyanjui et al. (2021). For a single iteration, the function EM_onestep is run.
EM_onestep(Y,mu,Sigma,df)
EM_onestep(Y,mu,Sigma,df)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
Kinyanjui, P.K., Tamba, C.L., & Okenye, J.O. (2021). Missing Data Imputation in a t -Distribution with Known Degrees of Freedom Via Expectation Maximization Algorithm and Its Stochastic Variants. International Journal of Applied Mathematics and Statistics.
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EM1=EM_onestep(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df) #Results for Newly Completed Dataset (EM) EM1$Y2 #Newly completed Dataset (with imputed values) EM1$mu #updated location vector EM1$Sigma #updated scatter matrix
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EM1=EM_onestep(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df) #Results for Newly Completed Dataset (EM) EM1$Y2 #Newly completed Dataset (with imputed values) EM1$mu #updated location vector EM1$Sigma #updated scatter matrix
This sub-package constitutes the subroutines for EM algorithm (for multiple iterations). It has 2 functions namely LIKE and EM_Umsteps. The function EM_Umsteps carries out missing data imputation as well as parameter estimation in multivariate t distribution in multiple iterations; assuming that the degrees of freedom are unknown. In addition to updating the location vector and the scatter matrix, therefore, the function also finds an estimate for the degrees of freedom. The bisection method is employed in the algorithm to iteratively update the degrees of freedom.The function LIKE (specifying the likelihood) facilitates the setting of tolerance level for convergence of the EM algorithm (that is L(θ^(t+1))-L(θ^(t))≤δ, where δ is a set tolerance level and t denotes the number of iterations).Details of how EM works in light of unknown degrees of freedom can be found in Kinyanjui et al. (2020) and Liu and Rubin (1995).
EM_Uonestep(Y,mu,Sigma,df,e) EM_Umsteps(Y,mu,Sigma,df,K,e,error)
EM_Uonestep(Y,mu,Sigma,df,e) EM_Umsteps(Y,mu,Sigma,df,K,e,error)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
Scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
e |
tolerance level for convergence of the bisection method for estimation of df. |
error |
tolerance level for convergence of the EM algorithm. |
K |
the number of iterations, which must be specified. |
Kinyanjui, P. K., Tamba, C. L., Orawo, L. A. O., & Okenye, J. O. (2020). Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants. Model Assisted Statistics and Applications, 15(3), 263-272.
Liu, C. and Rubin, D. B. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 19-39.
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EMU1=EM_Uonestep (Y=Y8,mu=mu,Sigma= Sigma_stat,df= df_stat,e=0.00001) #Multiple Iterations (EM) EMU=EM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,K=1000,e=0.00001,error=0.00001) #Results for Newly Completed Dataset (EM) EMU$IMP #Newly completed Dataset (with imputed values) EMU$mu #updated location vector EMU$Sigma #updated scatter matrix EMU$df #Updated degrees of freedom. EMU$K1 #number of iterations the algorithm takes to converge
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EMU1=EM_Uonestep (Y=Y8,mu=mu,Sigma= Sigma_stat,df= df_stat,e=0.00001) #Multiple Iterations (EM) EMU=EM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,K=1000,e=0.00001,error=0.00001) #Results for Newly Completed Dataset (EM) EMU$IMP #Newly completed Dataset (with imputed values) EMU$mu #updated location vector EMU$Sigma #updated scatter matrix EMU$df #Updated degrees of freedom. EMU$K1 #number of iterations the algorithm takes to converge
This sub-package constitutes the subroutines for EM algorithm (for a single iteration). It has 4 functions namely fun1, dfun1, Bisec, and EM_Uonestep. The function EM_Uonestep carries out missing data imputation as well as parameter estimation in multivariate t distribution in one iteration; assuming that the degrees of freedom are unknown. In addition to updating the location vector and the scatter matrix, therefore, the function also finds an estimate for the degrees of freedom. The bisection method is employed in the algorithm to iteratively update the degrees of freedom. In this respect, function fun1 specifies the degrees of freedom equation to be solved. dfun1 is its derivative. The two functions (fun1 and dfun1) are then solved numerically using the bisection method as specified in the function Bisec.Details of how EM works in light of unknown degrees of freedom can be found in Kinyanjui et al. (2020) and Liu and Rubin (1995).
EM_Uonestep(Y,mu,Sigma,df,e)
EM_Uonestep(Y,mu,Sigma,df,e)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
Scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
e |
tolerance level for convergence of the bisection method for estimation of df. |
Kinyanjui, P. K., Tamba, C. L., Orawo, L. A. O., & Okenye, J. O. (2020). Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants. Model Assisted Statistics and Applications, 15(3), 263-272.
Liu, C., & Rubin, D. B. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 19-39.
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EMU1=EM_Uonestep (Y=Y8,mu=mu,Sigma= Sigma_stat,df= df_stat,e=0.00001) #Results for Newly Completed Dataset (EM) EMU1$Y2 #Newly completed Dataset (with imputed values) EMU1$mu #updated location vector EMU1$Sigma #updated scatter matrix
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (EM) EMU1=EM_Uonestep (Y=Y8,mu=mu,Sigma= Sigma_stat,df= df_stat,e=0.00001) #Results for Newly Completed Dataset (EM) EMU1$Y2 #Newly completed Dataset (with imputed values) EMU1$mu #updated location vector EMU1$Sigma #updated scatter matrix
This function randomly creates missing values in a multivariate dataset. The resultant missing data mechanism is missing at random (MAR). The percentage of missingness has to be specified. This percentage is computed as a proportion of the sample size. In addition, the function allows for more than one missing value in any given case. It is set such that in a p-variate dataset, for any i^th case, the maximum allowable number of missing values is p-1. This helps avoid a situation where a case has no observed value.
MISS (TT, Percent)
MISS (TT, Percent)
TT |
n×p complete dataset. |
Percent |
the proportion of missing values, which must be specified. |
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset.
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset.
Computes the distribution function of the conditional multivariate t, [Y given X], where Z = (X,Y) is the fully-joint multivariate t distribution with mean equal to location vector, df equal to degrees of freedom and scatter matrix sigma. Computations are based on algorithms by Genz and Bretz.
pcmvt(lower = -Inf, upper = Inf, mean, sigma, df, dependent.ind, given.ind, X.given, check.sigma = TRUE, algorithm = GenzBretz(), ...)
pcmvt(lower = -Inf, upper = Inf, mean, sigma, df, dependent.ind, given.ind, X.given, check.sigma = TRUE, algorithm = GenzBretz(), ...)
lower |
the vector of lower limits of length n. |
upper |
the vector of upper limits of length n. |
mean |
the mean vector of length n. |
sigma |
a symmetric, positive-definte matrix, of dimension n x n, which must be specified. |
df |
degrees of freedom, which must be specified. |
dependent.ind |
a vector of integers denoting the indices of the dependent variable Y. |
given.ind |
a vector of integers denoting the indices of the conditioning variable X. |
X.given |
a vector of reals denoting the conditioning value of X. When both given.ind and X.given are missing, the distribution of Y becomes Z[dependent.ind] |
check.sigma |
logical; if TRUE, the variance-covariance matrix is checked for appropriateness (symmetry, positive-definiteness). This could be set to FALSE if the user knows it is appropriate. |
algorithm |
an object of class GenzBretz, Miwa or TVPACK specifying both the algorithm to be used as well as the associated hyper parameters. |
... |
additional parameters (currently given to GenzBretz for backward compatibility issues). |
This program involves the computation of multivariate t probabilities with arbitrary correlation matrices.
The evaluated distribution function is returned with attributes
error |
estimated absolute error and |
msg |
Normal Completion |
Genz, A. and Bretz, F. (1999), Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. Journal of Statistical Computation and Simulation, 63, 361–378.
Genz, A. and Bretz, F. (2002), Methods for the computation of multivariate t-probabilities. Journal of Computational and Graphical Statistics, 11, 950–971.
Genz, A. (2004), Numerical computation of rectangular bivariate and trivariate normal and t-probabilities, Statistics and Computing, 14, 251–260.
Genz, A. and Bretz, F. (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195. Springer-Verlag, Heidelberg.
n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) pcmvt(lower=-Inf, upper=1, mean=rep(1,n), sigma=A, df=df, dependent.ind=3, given.ind=c(1,4,7,9,10), X.given=c(1,1,0,0,-1)) pcmvt(lower=-Inf, upper=c(1,2), mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) pcmvt(lower=-Inf, upper=c(1,2), mean=rep(1,n), sigma=A,df=df, dep=c(2,5))
n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) pcmvt(lower=-Inf, upper=1, mean=rep(1,n), sigma=A, df=df, dependent.ind=3, given.ind=c(1,4,7,9,10), X.given=c(1,1,0,0,-1)) pcmvt(lower=-Inf, upper=c(1,2), mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) pcmvt(lower=-Inf, upper=c(1,2), mean=rep(1,n), sigma=A,df=df, dep=c(2,5))
These functions provide the density function and a random number generator for the conditional multivariate t distribution, [Y given X], where Z = (X,Y) is the fully-joint multivariate t distribution with location vector equal to mean and scatter matrix sigma.
dcmvt(x, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, log = FALSE) rcmvt(n, mean, sigma,df,dependent.ind, given.ind, X.given, check.sigma=TRUE,type = c("shifted", "Kshirsagar"), method=c("eigen", "svd", "chol"))
dcmvt(x, mean, sigma,df, dependent.ind, given.ind, X.given, check.sigma=TRUE, log = FALSE) rcmvt(n, mean, sigma,df,dependent.ind, given.ind, X.given, check.sigma=TRUE,type = c("shifted", "Kshirsagar"), method=c("eigen", "svd", "chol"))
x |
vector or matrix of quantiles of Y. If x is a matrix, each row is taken to be a quantile. |
n |
number of random deviates. |
mean |
location vector, which must be specified. |
df |
degrees of freedom, which must be specified |
sigma |
a symmetric, positive-definte matrix of dimension n x n, which must be specified. |
dependent.ind |
a vector of integers denoting the indices of dependent variable Y. |
given.ind |
a vector of integers denoting the indices of conditoning variable X. |
X.given |
a vector of reals denoting the conditioning value of X. When both given.ind and X.given are missing, the distribution of Y becomes Z[dependent.ind] |
check.sigma |
logical; if TRUE, the scatter matrix is checked for appropriateness (symmetry, positive-definiteness). This could be set to FALSE if the user knows it is appropriate. |
log |
logical; if TRUE, densities d are given as log(d). |
type |
type of the noncentral multivariate t distribution. type = "Kshirsagar" corresponds to formula (1.4) in Genz and Bretz (2009) (see also Chapter 5.1 in Kotz and Nadarajah (2004)). This is the noncentral t-distribution needed for calculating the power of multiple contrast tests under a normality assumption. type = "shifted" corresponds to the formula right before formula (1.4) in Genz and Bretz (2009) (see also formula (1.1) in Kotz and Nadarajah (2004)). It is a location shifted version of the central t-distribution. This noncentral multivariate t-distribution appears for example as the Bayesian posterior distribution for the regression coefficients in a linear regression. In the central case both types coincide. |
method |
string specifying the matrix decomposition used to determine the matrix root of sigma. Possible methods are eigenvalue decomposition ("eigen", default), singular value decomposition ("svd"), and Cholesky decomposition ("chol"). The Cholesky is typically fastest, not by much though. |
Genz, A. and Bretz, F. (2009), Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics, Vol. 195. Springer-Verlag, Heidelberg.
S. Kotz and S. Nadarajah (2004), Multivariate t Distributions and Their Applications. Cambridge University Press. Cambridge.
# 10-dimensional multivariate t distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) # density of Z[c(2,5)] given Z[c(1,4,7,9)]=c(1,1,0,-1) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A, df=df, dependent.ind=c(2,5), given.ind=c(1,4,7,9), X.given=c(1,1,0,-1)) dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,df=df, dep=c(2,5)) # gives an error since `x' and `dep' are incompatibe #dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=c(2,3), #given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="eigen") rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="chol")
# 10-dimensional multivariate t distribution n <- 10 df=3 A <- matrix(rt(n^2,df), n, n) A <- tcrossprod(A,A) #A %*% t(A) # density of Z[c(2,5)] given Z[c(1,4,7,9)]=c(1,1,0,-1) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A, df=df, dependent.ind=c(2,5), given.ind=c(1,4,7,9), X.given=c(1,1,0,-1)) dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) dcmvt(x=c(1.2,-1), mean=rep(1,n), sigma=A,df=df, dep=c(2,5)) # gives an error since `x' and `dep' are incompatibe #dcmvt(x=-1, mean=rep(1,n), sigma=A,df=df, dep=c(2,3), #given=c(1,4,7,9,10), X=c(1,1,0,0,-1)) rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=c(2,5), given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="eigen") rcmvt(n=10, mean=rep(1,n), sigma=A,df=df, dep=3, given=c(1,4,7,9,10), X=c(1,1,0,0,-1), method="chol")
This sub-package contains the subroutines for iterative imputation of missing values as well as parameter estimation (for the location vector and the scatter matrix) in multivariate t distribution using Stochastic EM (SEM) and Monte Carlo EM (MCEM). In this case, the degrees of freedom for the distribution are known or fixed a priori. SEM is implemented when the analyst specifies a single draw in the E-step. In case we have multiple draws in the E-step, the algorithm changes to MCEM. In both algorithms, the function SMCEM_onestep is run when we are only interested in the imputed values and the parameter updates in a single iteration. The function SMCEM_msteps is run when we are interested in multiple iterations (this is usually the case). Essentially, the first iterations (for instance, 10 percent of all iterations) is usually burnt-in in order to ward off the effects of initial values. Details of how SEM and MCEM operate can be found in among others Kinyanjui et al. (2021), Nielsen (2000), Levine and Casella (2001) Jank (2005) and Karimi et al. (2019).
SMCEM_onestep(Y,mu,Sigma,df,nob) SMCEM_msteps(Y,mu,Sigma,df, nob,K)
SMCEM_onestep(Y,mu,Sigma,df,nob) SMCEM_msteps(Y,mu,Sigma,df, nob,K)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
nob |
number of draws in the E-step |
K |
the number of iterations, which must be specified. |
Karimi, B., Lavielle, M., and Moulines, É. (2019). On the Convergence Properties of the Mini-Batch EM and MCEM Algorithms.
Kinyanjui, P.K., Tamba, C.L., & Okenye, J.O. (2021). Missing Data Imputation in a t -Distribution with Known Degrees of Freedom Via Expectation Maximization Algorithm and Its Stochastic Variants. International Journal of Applied Mathematics and Statistics.
Levine, R. A. and Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10(3), 422-439.
Nielsen, S.F. (2000). The stochastic EM algorithm: estimation and asymptotic results. Bernoulli, 6(3), 457-489.
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=Sigma_stat,df=df,nob=1) #Single Iteration (MCEM) MCEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=Sigma_stat,df=df,nob=100) #Multiple Iterations (SEM) SEM=SMCEM_msteps(Y=Y8,mu= mu_stat,Sigma= Sigma_stat,df=df,nob=1,K=1000) #Results for Newly Completed Dataset (Burning in first 100 iterations in SEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*10), nrow =10) for (l in 101:1000){ T_mu = T_mu + SEM$muchain[l,] T_Sigma = T_Sigma + SEM$SigmaChain[,,l] T_Data= T_Data+ SEM$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4)#updated scatter matrix T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations. #Multiple Iterations (MCEM) MCEM=SMCEM_msteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df,nob=100, K=1000) #Results for Newly Completed Dataset (Burning in first 100 iterations in MCEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*10), nrow =10) for (l in 101:1000){ T_mu = T_mu + MCEM$muchain[l,] T_Sigma = T_Sigma + MCEM$SigmaChain[,,l] T_Data= T_Data+ MCEM$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4) #updated scatter matrix T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations.
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=Sigma_stat,df=df,nob=1) #Single Iteration (MCEM) MCEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=Sigma_stat,df=df,nob=100) #Multiple Iterations (SEM) SEM=SMCEM_msteps(Y=Y8,mu= mu_stat,Sigma= Sigma_stat,df=df,nob=1,K=1000) #Results for Newly Completed Dataset (Burning in first 100 iterations in SEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*10), nrow =10) for (l in 101:1000){ T_mu = T_mu + SEM$muchain[l,] T_Sigma = T_Sigma + SEM$SigmaChain[,,l] T_Data= T_Data+ SEM$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4)#updated scatter matrix T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations. #Multiple Iterations (MCEM) MCEM=SMCEM_msteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df,nob=100, K=1000) #Results for Newly Completed Dataset (Burning in first 100 iterations in MCEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*10), nrow =10) for (l in 101:1000){ T_mu = T_mu + MCEM$muchain[l,] T_Sigma = T_Sigma + MCEM$SigmaChain[,,l] T_Data= T_Data+ MCEM$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4) #updated scatter matrix T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations.
This sub-package contains the subroutines for iterative imputation of missing values as well as parameter estimation (for the location vector and the scatter matrix) in multivariate t distribution using Stochastic EM (SEM) and Monte Carlo EM (MCEM). In this case, the degrees of freedom for the distribution are known or fixed a priori. SEM is implemented when the analyst specifies a single draw in the E-step. In case we have multiple draws in the E-step, the algorithm changes to MCEM. In both algorithms, the function SMCEM_onestep is run when we are only interested in the imputed values and the parameter updates in a single iteration.
SMCEM_onestep(Y,mu,Sigma,df,nob)
SMCEM_onestep(Y,mu,Sigma,df,nob)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
nob |
number of draws in the E-step |
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=A,df=df,nob=1) #Single Iteration (MCEM) MCEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=A,df=df,nob=100) #Results for Newly Completed Dataset (SEM) SEM1$Y2 #Newly completed Dataset (with imputed values) SEM1$mu #updated location vector SEM1$Sigma #updated scatter matrix #Results for Newly Completed Dataset (MCEM) MCEM1$Y2 #Newly completed Dataset (with imputed values) MCEM1$mu #updated location vector MCEM1$Sigma #updated scatter matrix
# 3-dimensional multivariate t distribution n <- 10 p=3 df=3 mu=c(1:3) A <- matrix(rt(p^2,df), p, p) A <- tcrossprod(A,A) #A %*% t(A) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=A,df=df,nob=1) #Single Iteration (MCEM) MCEM1=SMCEM_onestep(Y=Y8,mu= mu_stat,Sigma=A,df=df,nob=100) #Results for Newly Completed Dataset (SEM) SEM1$Y2 #Newly completed Dataset (with imputed values) SEM1$mu #updated location vector SEM1$Sigma #updated scatter matrix #Results for Newly Completed Dataset (MCEM) MCEM1$Y2 #Newly completed Dataset (with imputed values) MCEM1$mu #updated location vector MCEM1$Sigma #updated scatter matrix
This sub-package provides subroutines for implementation of SEM and MCEM techniques in imputing missing values as well as estimating multivariate t parameters when the degrees of freedom are unknown.The functions SMCEM_msteps constitute the SEM and MCEM algorithms for multiple-iterative data imputation and parameter estimation for multivariate t data with unknown degrees of freedom. The functions represent SEM when the number of draws in the E-step (denoted by nob) is 1 and MCEM when we have more than one draw in the E-step.More details on the implementation of SEM and MCEM techniques can be found in Kinyanjui et al. (2020).
SMCEM_Uonestep(Y,mu,Sigma,df,nob,e) SMCEM_Umsteps(Y,mu,Sigma,df,nob,K,e)
SMCEM_Uonestep(Y,mu,Sigma,df,nob,e) SMCEM_Umsteps(Y,mu,Sigma,df,nob,K,e)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
nob |
number of draws in the E-step |
K |
the number of iterations, which must be specified. |
e |
tolerance level for convergence of the bisection method for estimation of df. |
Kinyanjui, P. K., Tamba, C. L., Orawo, L. A. O., & Okenye, J. O. (2020). Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants. Model Assisted Statistics and Applications, 15(3), 263-272.
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1,e=0.00001) #Single Iteration (MCEM) MCEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1000,e=0.00001) #Multiple Iterations (SEM) SEMU=SMCEM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,nob=1,K=1000,e=0.00001) #Results for Newly Completed Dataset (Burning in first 100 iterations in SEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*25), nrow =25) T_df=rep() for (l in 101:1000){ T_mu = T_mu + SEMU$muchain[l,] T_Sigma = T_Sigma + SEMU$SigmaChain[,,l] T_Data= T_Data+ SEMU$YChain[,,l] } #Updated location vector round((T_mu/900),4) #Updated scatter matrix round((T_Sigma/900),4) #Updated degrees of freedom udfs=mean(SEMU$dfchain[101:1000]) #Complete dataset as an average of (K-100) complete datasets for the various iterations. T_Data1= T_Data/900 #Multiple Iterations (MCEM) MCEMU=SMCEM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,nob=1000,K=1000,e=0.00001) #Results for Newly Completed Dataset (Burning in first 100 iterations in MCEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*25), nrow =25) T_df=rep() for (l in 101:1000){ T_mu = T_mu + MCEMU$muchain[l,] T_Sigma = T_Sigma + MCEMU$SigmaChain[,,l] T_Data= T_Data+ MCEMU$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4) #updated scatter matrix udf=mean(MCEMU$dfchain[101:1000]) #updated degrees of freedom T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations.
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1,e=0.00001) #Single Iteration (MCEM) MCEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1000,e=0.00001) #Multiple Iterations (SEM) SEMU=SMCEM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,nob=1,K=1000,e=0.00001) #Results for Newly Completed Dataset (Burning in first 100 iterations in SEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*25), nrow =25) T_df=rep() for (l in 101:1000){ T_mu = T_mu + SEMU$muchain[l,] T_Sigma = T_Sigma + SEMU$SigmaChain[,,l] T_Data= T_Data+ SEMU$YChain[,,l] } #Updated location vector round((T_mu/900),4) #Updated scatter matrix round((T_Sigma/900),4) #Updated degrees of freedom udfs=mean(SEMU$dfchain[101:1000]) #Complete dataset as an average of (K-100) complete datasets for the various iterations. T_Data1= T_Data/900 #Multiple Iterations (MCEM) MCEMU=SMCEM_Umsteps(Y=Y8,mu=mu_stat,Sigma=Sigma_stat,df=df_stat,nob=1000,K=1000,e=0.00001) #Results for Newly Completed Dataset (Burning in first 100 iterations in MCEM) T_mu=rep(0,3) T_Sigma=matrix(rep(0,3*3),nrow=3) T_Data=matrix(rep(0,3*25), nrow =25) T_df=rep() for (l in 101:1000){ T_mu = T_mu + MCEMU$muchain[l,] T_Sigma = T_Sigma + MCEMU$SigmaChain[,,l] T_Data= T_Data+ MCEMU$YChain[,,l] } round((T_mu/900),4) #updated location vector round((T_Sigma/900),4) #updated scatter matrix udf=mean(MCEMU$dfchain[101:1000]) #updated degrees of freedom T_Data1= T_Data/900 #complete dataset as an average of (K-100) complete datasets for the various iterations.
This sub-package provides subroutines for implementation of SEM and MCEM techniques in imputing missing values as well as estimating multivariate t parameters when the degrees of freedom are unknown. It has 4 functions namely fun1, dfun1, Bisec, and SMCEM_Uonestep. The functions fun1 and dfun1 in the sub-package constitute the equation for the degrees of freedom and its derivative respectively. The Bisec function contains the bisection method subroutines to facilitate the iterative estimation of the degrees of freedom using fun1 and dfun1. The function SMCEM_Uonestep constitute the SEM and MCEM algorithms for single-iteration data imputation and parameter estimation for multivariate t data with unknown degrees of freedom. The functions represent SEM when the number of draws in the E-step (denoted by nob) is 1 and MCEM when we have more than one draw in the E-step.Details of how SEM and MCEM impute missing values and estimate parameters in multivariate t context (unknown degrees of freedom) are explained by Kinyanjui et al. (2020).
SMCEM_Uonestep(Y,mu,Sigma,df,nob,e)
SMCEM_Uonestep(Y,mu,Sigma,df,nob,e)
Y |
the multivariate t dataset |
mu |
the location vector, which must be specified. In cases where it is unknown, starting values are provided. |
Sigma |
scatter matrix, which must be specified. In cases where it is unknown, starting values are provided. |
df |
degrees of freedom, which must be specified. |
nob |
number of draws in the E-step |
e |
tolerance level for convergence of the bisection method for estimation of df. |
Kinyanjui, P. K., Tamba, C. L., Orawo, L. A. O., & Okenye, J. O. (2020). Missing data imputation in multivariate t distribution with unknown degrees of freedom using expectation maximization algorithm and its stochastic variants. Model Assisted Statistics and Applications, 15(3), 263-272.
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1,e=0.00001) #Single Iteration (MCEM) MCEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1000,e=0.00001) #Results for Newly Completed Dataset (SEM) SEMU1$Y2 #Newly completed Dataset (with imputed values) SEMU1$mu #updated location vector SEMU1$Sigma #updated scatter matrix #Results for Newly Completed Dataset (MCEM) MCEMU1$Y2 #Newly completed Dataset (with imputed values) MCEMU1$mu #updated location vector MCEMU1$Sigma #updated scatter matrix
# 3-dimensional multivariate t distribution n <- 25 p=3 df=3 mu=c(10,20,30) A=matrix(c(14,10,12,10,13,9,12,9,18), 3,3) Y7 <-mvtnorm::rmvt(n, delta=mu, sigma=A, df=df) Y7 TT=Y7 #Complete Dataset #Introduce MAR Data Y8= MISS(TT,20) #The newly created incomplete dataset. #Initializing Values mu_stat=c(0.5,1,2) Sigma_stat=matrix(c(0.33,0.31,0.3,0.31,0.335,0.295,0.3,0.295,0.32),3,3) df_stat=6 #Imputing Missing Values and Updating Parameter Estimates #Single Iteration (SEM) SEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1,e=0.00001) #Single Iteration (MCEM) MCEMU1=SMCEM_Uonestep(Y=Y8,mu=mu,Sigma=Sigma_stat,df= df_stat,nob=1000,e=0.00001) #Results for Newly Completed Dataset (SEM) SEMU1$Y2 #Newly completed Dataset (with imputed values) SEMU1$mu #updated location vector SEMU1$Sigma #updated scatter matrix #Results for Newly Completed Dataset (MCEM) MCEMU1$Y2 #Newly completed Dataset (with imputed values) MCEMU1$mu #updated location vector MCEMU1$Sigma #updated scatter matrix