How to estimate feasible generalized least square regression using Stata?
How to estimate feasible generalized least square regression using Stata? or how to estimate FGLS using Stata? It is a very frequently asked question that appears on social media groups including Facebook and LinkedIn. In this tutorial, we will show you how to estimate feasible generalized least square regression using Stata. How to estimate feasible generalized least square regression using Stata?
INTRODUCTION
The general linear regression model is a statistical model that describes a data generation process. The general linear regression model is a generalization of the classical linear regression model. You can obtain the general linear regression from the classical linear regression by changing one assumption: assume that the disturbances are nonspherical rather than spherical. Because of this, the general linear regression model can be used to describe data generation processes characterized by heteroscedasticity and autocorrelation.
SPECIFICATION
The specification of the general linear regression model is defined by the following set of assumptions.
Assumptions
The functional form is linear in parameters.
y = Xb + m
The error term has mean zero.
E(m) = 0
The errors are nonspherical.
Cov(m) = E(mmT) = W
where W is any nonsingular TxT variance-covariance matrix of disturbances.
The error term has a normal distribution
m ~ N
The error term is uncorrelated with each independent variable.
Cov (m,X) = 0
Sources of Nonspherical Errors
There are 2 major sources of nonspherical errors.
The error term does not have constant variance.
This is called heteroscedasticity. In this case, the disturbances are drawn from probability distributions that have different variances. This often occurs when using cross-section data. When the error term has non constant variance, the variance-covariance matrix of disturbances is not given by a constant times the identity matrix (i.e., W ¹ s2I). This is because the elements on the principal diagonal of W, which are the variances of the distributions from which the disturbances are drawn, are not a constant given by s2 but have different values.
The errors are correlated.
This is called autocorrelation or serial correlation. In this case, the disturbances are correlated with one another. This often occurs when using time-series data. When the disturbances are correlated, the variance-covariance matrix of disturbances is not given by a constant times the identity matrix (i.e., W ¹ s2I). This is because the elements off the principal diagonal of W, which are the covariances of the disturbances, are non-zero numbers.
Classical Linear Regression Model as a Special Case of the General Linear Regression Model
If the error term has constant variance and the errors are uncorrelated, then W = s2I and the general linear regression model reduces to the classical linear regression model.
General Linear Regression Model Concisely Stated in Matrix Format
The sample of T multivariate observations (Yt, Xt1, Xt2, …, Xtk) are generated by a process described as follows.
y = Xb + m, m ~ N(0, W)
or alternatively
y ~ N(Xb, W)
ESTIMATION
Choosing an Estimator
To obtain estimates of the parameters of the model, you need to choose an estimator. We will consider the following 3 estimators:
Ordinary least squares (OLS) estimator
Generalized least squares (GLS) estimator
Feasible generalized least squares (FGLS) estimator
How to estimate feasible generalized least square regression using Stata?
Ordinary Least Squares (OLS) Estimator
How to estimate feasible generalized least square regression using Stata?
To obtain estimates of the parameters of the general linear regression model, you can apply the OLS estimator to the sample data. The OLS estimator is given by the rule:
b^ = (XTX)-1XTy
The variance-covariance matrix of estimates for the OLS estimator is
Cov(b^) = s2(XTX)-1
Properties of the OLS Estimator
If the sample data are generated by the general linear regression model, then the OLS estimator has the following properties.
The OLS estimator is unbiased
The OLS estimator is inefficient.
The OLS estimator is not the maximum likelihood estimator.
The variance-covariance matrix of estimates is incorrect, and therefore the estimates of the standard errors are biased and inconsistent
Hypothesis tests are not valid.
Property 2 means that in the class of linear unbiased estimators, the OLS estimator does not have minimum variance. Thus, an alternative estimator exists that will yield more precise estimates.
How to estimate feasible generalized least square regression using Stata?
Generalize Least Squares (GLS) Estimator
The GLS estimator is given by the rule:
b^GLS = (XTW-1X)-1XT W-1y
The variance-covariance matrix of estimates for the GLS estimator is
Cov(b^) = (XTW-1X)-1
Properties of the GLS Estimator
If the sample data are generated by the general linear regression model, then the GLS estimator has the following properties.
The GLS estimator is unbiased
The GLS estimator is efficient.
The GLS estimator is the maximum likelihood estimator.
The variance-covariance matrix of estimates is correct, and therefore the estimates of the standard errors are unbiased and consistent.
Hypothesis tests are valid.
If the sample data are generated by the general linear regression model, then the GLS estimator is the best linear unbiased estimator (BLUE) of the population parameters. The reason that the GLS estimator is more precise than the OLS estimator is because the OLS estimator wastes information. That is, the OLS estimator does not use the information contained in W about heteroscedasticity and/or autocorrelation, while the GLS estimator does.
Major Shortcoming of the GLS Estimator
To actually use the GLS estimator, we must know the elements of the variance-covariance matrix of disturbances, W. That means that you must know the true values of the variances and covariances for the disturbances. However, since you never know the true elements of W, you cannot actually use the GLS estimator, and therefore the GLS estimator is not a feasible estimator.
How to estimate feasible generalized least square regression using Stata?
Feasible Generalized Least Squares (FGLS) Estimator
To make the GLS estimator a feasible estimator, you can use the sample of data to obtain an estimate of W. When you replace true W with its estimate W^ you get the FGLS estimator. The FGLS estimator is given by the rule:
b^FGLS = (XTW-1^X)-1XT W-1^y
The variance-covariance matrix of estimates for the GLS estimator is
Cov(b^) = (XTW-1^X)-1
FGLS Estimator as a Weighted Least Squares Estimator
The FGLS estimator is also a weighted least squares estimator. The weighted least squares estimated is derived as follows. Find a TxT transformation matrix P such that μ* = Pμ, where μ* has variance-covariance matrix Cov(μ*) = E(μ* μ*T) = σ2I. This transforms the original error term μ that is nonspherical to a new error term that is spherical. Use the matrix P to derive a transformed model.
Py = PXβ + Pμ
or y* = X*β + μ*
where y* = Py, X* = PX, μ* = Pμ. The transformed model satisfies all of the assumptions of the classical linear regression model. The FGLS estimator is the OLS estimator applied to the transformed model. Note that the transformed model is a computational device only. We use it to obtain efficient estimates of the parameters and standard errors of the original model of interest.
Major Problem with Using the FGLS Estimator
A major problem with using the FGLS estimator is that to estimate W you must obtain an estimate of each element in W (i.e., each variance and covariance). The matrix W is a TxT matrix and therefore contains T 2 elements. Because it is a symmetric matrix, ½T(T + 1) of these elements are different. Thus, if you have a sample size of T = 100, then you must use these 100 observations to obtain estimates of 5,050 different variances and covariances. You cannot obtain this many estimates with 100 observations because you do not have enough degrees of freedom.
Resolving the Degrees of Freedom Problem
To circumvent the degrees of freedom problem and obtain estimates of the variances and covariances in W, you must specify a model that describes what you believe is the nature of heteroscedasticity and/or autocorrelation. You can then use the sample data to estimate the parameters of your model of heteroscedasticity and/or autocorrelation. You can then use these parameter estimates to obtain estimates of the variances and covariances in W. Some often used models of heteroscedasticity are the following.
Assume that the error variance is a linear function of the explanatory variables.
Assume that the error variance is an exponential function of the explanatory variables.
Assume the error variance is a polynomial function of the explanatory variables.
Some often used models of autocorrelation are the following.
First-order autoregressive process
Second-order autoregressive process
Higher-order autoregressive process
Properties of the FGLS Estimator
If the sample data are generated by the general linear regression model, then the FGLS estimator has the following properties. The FGLS estimator may or may not be unbiased in small samples. However, if W^ is a consistent estimator of W, then the FGLS estimator is asymptotically unbiased, efficient, and consistent. In this case, Monte Carlo studies have shown that the FGLS estimator generally yields better estimates than the OLS estimator.
Caveat
For W^ to be a consistent estimator of W, your model of heteroscedasticity or autocorrelation must be a reasonable approximation of the true unknown heteroscedasticity or autocorrelation. If it is not, then the FGLS estimator will not have desirable small or large sample properties.
HYPOTHESIS TESTING
The following statistical tests can be used to test hypotheses in the general linear regression model. 1) t-test. 2) F-test. 3) Likelihood ratio test. 4) Wald test. 5) Lagrange multiplier test.
GOODNESS-OF-FIT
It is somewhat more difficult to measure the goodness-of-fit of the model when the sample data are generated by the general linear regression model. The FGLS estimator is simply the OLS estimator applied to a transformed regression that purges the heteroscedasticity and/or autocorrelation. Many economists use as their measure of goodness of fit the R2 statistic applied to the transformed regression. However, the transformed regression is simply a computational device, not the original model of interest. The fact that you have a good or bad fit for the transformed regression may be of no interest. ()
How to estimate feasible generalized least square regression using Stata?
GLS and FGLS can be estimated using Stata using the following codes:
findit xtgls
Which is for panel data estimators and a common solution to resolve issues with autocorrelation upto second order and heteroscedasticity. This is also mentioned above that the FGLS can be a better estimator with such problems present.
The example FGLS codes (borrowed from xtgls help file in Stata) can be run like the following:
Setup
webuse invest2
xtset company time
Fit panel-data model with heteroskedasticity across panels
xtgls invest market stock, panels(hetero)
Correlation and heteroskedasticity across panels
xtgls invest market stock, panels(correlated)
Heteroskedasticity across panels and autocorrelation within panels
xtgls invest market stock, panels(hetero) corr(ar1)
Interpreting Results from FGLS
Coefficients from FGLS can be interpreted as coefficients from other panel data estimators like Fixed Effects and Random Effects. The difference in FE, RE and GLS is the background assumptions of handling the covariance and standard errors not the nature of coefficients that can change the meaning of coefficients. How to estimate feasible generalized least square regression using Stata?
Professionally trained and highly recognized online course provider in Advanced Econometrics and Top Freelancer in Stata, Eviews, SPSS, Nvivo10/11, WinRATS, GAUSS, Gretl and Minitab. Anees has helped 1100+ clients (organizations, students and researchers) from around the world in applied econometrics and applied statistics. He has completed research and analytics in corporate governance, financial performance, economics problems, business evaluation, value at risk, options pricing, stock evaluation, currency and pairs trading and backtesting using major statistical softwares. Academically, he has masters in Econometrics and Economics from The University of Sheffield, UK.