**Initial thoughts**

Estimating causal relationships from data is one of the fundamental endeavors of researchers, but causality is elusive. In the presence of omitted confounders, endogeneity, omitted variables, or a misspecified model, estimates of predicted values and effects of interest are inconsistent; causality is obscured.

A controlled experiment to estimate causal relations is an alternative. Yet conducting a controlled experiment may be infeasible. Policy makers cannot randomize taxation, for example. In the absence of experimental data, an option is to use instrumental variables or a control function approach.

Stata has many built-in estimators to implement these potential solutions and tools to construct estimators for situations that are not covered by built-in estimators. Below I illustrate both possibilities for a linear model and, in a later post, will talk about nonlinear models. Read more…

We discuss estimating population-averaged parameters when some of the data are missing. In particular, we show how to use **gmm** to estimate population-averaged parameters for a probit model when the process that causes some of the data to be missing is a function of observable covariates and a random process that is independent of the outcome. This type of missing data is known as missing at random, selection on observables, and exogenous sample selection.

This is a follow-up to an earlier post where we estimated the parameters of a probit model under endogenous sample selection (http://blog.stata.com/2015/11/05/using-mlexp-to-estimate-endogenous-treatment-effects-in-a-probit-model/). In endogenous sample selection, the random process that affects which observations are missing is correlated with an unobservable random process that affects the outcome. Read more…

In Stata 14.2, we added the ability to use **margins** to estimate covariate effects after **gmm**. In this post, I illustrate how to use **margins** and **marginsplot** after **gmm** to estimate covariate effects for a probit model.

Margins are statistics calculated from predictions of a previously fit model at fixed values of some covariates and averaging or otherwise integrating over the remaining covariates. They can be used to estimate population average parameters like the marginal mean, average treatment effect, or the average effect of a covariate on the conditional mean. I will demonstrate how using margins is useful after estimating a model with the generalized method of moments. Read more…

**teffects ipw** uses multinomial logit to estimate the weights needed to estimate the potential-outcome means (POMs) from a multivalued treatment. I show how to estimate the POMs when the weights come from an ordered probit model. Moment conditions define the ordered probit estimator and the subsequent weighted average used to estimate the POMs. I use **gmm** to obtain consistent standard errors by stacking the ordered-probit moment conditions and the weighted mean moment conditions. Read more…

We estimate the average treatment effect (ATE) for an exponential mean model with an endogenous treatment. We have a two-step estimation problem where the first step corresponds to the treatment model and the second to the outcome model. As shown in *Using gmm to solve two-step estimation problems*, this can be solved with the generalized method of moments using **gmm**.

This continues the series of posts where we illustrate how to obtain correct standard errors and marginal effects for models with multiple steps. In the previous posts, we used **gsem** and **mlexp** to estimate the parameters of models with separable likelihoods. In the current model, because the treatment is endogenous, the likelihood for the model is no longer separable. We demonstrate how we can use **gmm** to estimate the parameters in these situations. Read more…

This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

The command **gmm** is used to estimate the parameters of a model using the generalized method of moments (GMM). GMM can be used to estimate the parameters of models that have more identification conditions than parameters, overidentified models. The specification of these models can be evaluated using Hansen’s *J* statistic (Hansen, 1982).

We use **gmm** to estimate the parameters of a Poisson model with an endogenous regressor. More instruments than regressors are available, so the model is overidentified. We then use **estat overid** to calculate Hansen’s *J* statistic and test the validity of the overidentification restrictions.

In previous posts Read more…

\(\newcommand{\Eb}{{\bf E}}\)This post was written jointly with Enrique Pinzon, Senior Econometrician, StataCorp.

The generalized method of moments (**GMM**) is a method for constructing estimators, analogous to maximum likelihood (**ML**). **GMM** uses assumptions about specific moments of the random variables instead of assumptions about the entire distribution, which makes **GMM** more robust than **ML**, at the cost of some efficiency. The assumptions are called moment conditions.

**GMM** generalizes the method of moments (**MM**) by allowing the number of moment conditions to be greater than the number of parameters. Using these extra moment conditions makes **GMM** more efficient than **MM**. When there are more moment conditions than parameters, the estimator is said to be overidentified. **GMM** can efficiently combine the moment conditions when the estimator is overidentified.

We illustrate these points by estimating the mean of a \(\chi^2(1)\) by **MM**, **ML**, a simple **GMM** estimator, and an efficient **GMM** estimator. This example builds on Efficiency comparisons by Monte Carlo simulation and is similar in spirit to the example in Wooldridge (2001). Read more…

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}

\newcommand{\ebi}{\boldsymbol{\epsilon}_i}

\newcommand{\Sigmab}{\boldsymbol{\Sigma}}

\newcommand{\Omegab}{\boldsymbol{\Omega}}

\newcommand{\Lambdab}{\boldsymbol{\Lambda}}

\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\gammab}{\boldsymbol{\gamma}}

\newcommand{\Gammab}{\boldsymbol{\Gamma}}

\newcommand{\deltab}{\boldsymbol{\delta}}

\newcommand{\xib}{\boldsymbol{\xi}}

\newcommand{\iotab}{\boldsymbol{\iota}}

\newcommand{\xb}{{\bf x}}

\newcommand{\xbit}{{\bf x}_{it}}

\newcommand{\xbi}{{\bf x}_{i}}

\newcommand{\zb}{{\bf z}}

\newcommand{\zbi}{{\bf z}_i}

\newcommand{\wb}{{\bf w}}

\newcommand{\yb}{{\bf y}}

\newcommand{\ub}{{\bf u}}

\newcommand{\Gb}{{\bf G}}

\newcommand{\Hb}{{\bf H}}

\newcommand{\thetab}{\boldsymbol{\theta}}

\newcommand{\XBI}{{\bf x}_{i1},\ldots,{\bf x}_{iT}}

\newcommand{\Sb}{{\bf S}} \newcommand{\Xb}{{\bf X}}

\newcommand{\Xtb}{\tilde{\bf X}}

\newcommand{\Wb}{{\bf W}}

\newcommand{\Ab}{{\bf A}}

\newcommand{\Bb}{{\bf B}}

\newcommand{\Zb}{{\bf Z}}

\newcommand{\Eb}{{\bf E}}\) This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

**Overview**

We provide an introduction to parameter estimation by maximum likelihood and method of moments using **mlexp** and **gmm**, respectively (see **[R] mlexp** and **[R] gmm**). We include some background about these estimation techniques; see Pawitan (2001, Casella and Berger (2002), Cameron and Trivedi (2005), and Wooldridge (2010) for more details.

Maximum likelihood (ML) estimation finds the parameter values that make the observed data most probable. The parameters maximize the log of the likelihood function that specifies the probability of observing a particular set of data given a model.

Method of moments (MM) estimators specify population moment conditions and find the parameters that solve the equivalent sample moment conditions. MM estimators usually place fewer restrictions on the model than ML estimators, which implies that MM estimators are less efficient but more robust than ML estimators. Read more…

Two-step estimation problems can be solved using the **gmm** command.

When a two-step estimator produces consistent point estimates but inconsistent standard errors, it is known as the two-step-estimation problem. For instance, inverse-probability weighted (IPW) estimators are a weighted average in which the weights are estimated in the first step. Two-step estimators use first-step estimates to estimate the parameters of interest in a second step. The two-step-estimation problem arises because the second step ignores the estimation error in the first step.

One solution is to convert the two-step estimator into a one-step estimator. My favorite way to do this conversion is to stack the equations solved by each of the two estimators and solve them jointly. This one-step approach produces consistent point estimates and consistent standard errors. There is no two-step problem because all the computations are performed jointly. Newey (1984) derives and justifies this approach. Read more…