Statistics in geophysics generalized linear regression

25 259 0
Statistics in geophysics generalized linear regression

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Generalized linear models Binary regression Count data regression Statistics in Geophysics: Generalized Linear Regression Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/25 Generalized linear models Binary regression Count data regression Components of the classical linear model Generalized linear models (GLMs) are an extension of classical linear models Recall the classical linear regression model: y = Xβ + The systematic part of the model is a specification for the (conditional) mean of y, which takes the form E(y) = Xβ For the random part we assume Cov( ) = σ I A further specialization of the model involves the assumption that ∼ N (0, σ I) Then, y ∼ N (Xβ, σ I) and E(y) = µ, where µ = Xβ and the ith component of µ ∈ Rn×1 is µi = xi β (i = , n) Winter Term 2013/14 2/25 Generalized linear models Binary regression Count data regression Components of a generalized linear model II Three-part specification of the classical linear model: The random component: y ∼ N (µ, σ I) The systematic component: The p predictor variables produce a linear predictor η = (ηi , , ηn ) , where ηi = xi β , (i = 1, , n) The link between the random and systematic components: µ=η This specification introduces a new symbol η for the linear predictor and the 3rd component then specifies that µ and η are identical Winter Term 2013/14 3/25 Generalized linear models Binary regression Count data regression The generalization If we write ηi = g (µi ) or µi = h(ηi ) , then g (·) will be called the link function and h(·) the response function with g = h−1 Classical linear models have a Gaussian distribution in component and the identity function for the link in component GLMs allow two extensions: The distribution in component may come from an exponential family other than the Gaussian The link function in component may become any monotonic differentiable function Winter Term 2013/14 4/25 Generalized linear models Binary regression Count data regression Exponential family We assume that each component of y has a distribution in the (univariate) exponential family, taking the form f (y |θ) = exp y θ − b(θ) + c(y , φ) φ , for some specific functions b(·) and c(·) The parameter θ is called the natural or canonical parameter The second parameter φ is a dispersion parameter It can be shown that E(y ) = µ = b (θ) and Var(y ) = φb (θ) Winter Term 2013/14 5/25 Generalized linear models Binary regression Count data regression Exponential family parameters, expectation and variance Distribution Normal Bernoulli Poisson Distribution Normal Bernoulli Poisson N (µ, σ ) B(1, π) P(λ) θ(µ) µ log(π/(1 − π)) log(λ) E(y ) = b (θ) µ=θ exp(θ) π = 1+exp(θ) λ = exp(θ) Winter Term 2013/14 b (θ) π(1 − π) λ 6/25 b(θ) θ2 /2 log(1 + exp(θ)) exp(θ) φ σ2 1 Var(y ) = b (θ)φ σ2 π(1 − π) λ Generalized linear models Binary regression Count data regression Maximum likelihood estimation in GLMs ˆ is obtained in form of iteratively The ML estimator β weighted least squares estimates ˆ (t+1) = (X W(t) X)−1 X W(t) ˜ β y(t) , (t) t = 0, 1, 2, (t) where W(t) = diag w ˜ (ˆ η1 ), , w ˜ n (ˆ ηn ) is a matrix of “working weights” (t) (t) w ˜ i (ˆ ηi ) = (t) (h (ˆ ηi ))2 (t) σi2 (ˆ ηi ) (t) and ˜y(t) = y˜1 (ˆ η1 ), , y˜n (ˆ ηn ) observations” with elements is a vector of “working (t) (t) y˜i (ˆ ηi ) = (t) ηˆi Winter Term 2013/14 yi − h(ˆ ηi ) (t) h (ˆ ηi ) 7/25 Generalized linear models Binary regression Count data regression Maximum likelihood estimation in GLMs II A key role in the iterations plays the matrix X W(t) X Invertibility of X W(t) X does not follow from the invertibility of X X However, usually all of the weights are positive such that X W(t) X is invertible Then, the algorithm typically converges after a number of iterative steps Winter Term 2013/14 8/25 Generalized linear models Binary regression Count data regression Maximum likelihood estimation in GLMs III Asymptotic properties of the ML estimator ˆ n denote the ML estimator based on a sample of size n Let β Under regularity conditions: a ˆn ∼ β N (β, F−1 (β)) , where F(β) = X WX is the expected Fisher information matrix The expected Fisher information matrix is E − where − ∂ l (β ) ∂β∂β ∂ l (β ) ∂β∂β , = Fobs is the observed Fisher information matrix and l(β) is the log-likelihood Winter Term 2013/14 9/25 Generalized linear models Binary regression Count data regression Estimation of the scale parameter Denote by v (µi ) = b (θi ) the so-called variance function and note that b (θi ) implicitly depends on µi through the relation b (θi ) = µi The dispersion parameter is estimated by φˆ = n−p n i=1 (yi − µ ˆi )2 , v (ˆ µi ) where p denotes the number of regression parameters, ˆ is the estimated expectation and v (µi ) is the µ ˆi = h(xi β) estimated variance function Winter Term 2013/14 10/25 Generalized linear models Binary regression Count data regression Testing linear hypotheses Hypotheses H0 : Cβ = d versus H1 : Cβ = d: ˜ be the ML estimator under H0 Let β Test statistics: ˜ − l(β) ˆ Likelihood ratio statistic: lr = −2 l(β) ˆ − d) [CF−1 (β)C ˆ ˆ − d) Wald statistic: w = (Cβ ]−1 (Cβ −1 ˜ β) ˜ ˜ (β)s( Score statistic: u = s (β)F Test decision: For large n and under H0 , it holds that a lr , w , u ∼ χ2r , where r is the (full) row rank of the r × p matrix C We reject H0 when lr , w , u > χ2r (1 − α) Winter Term 2013/14 11/25 Generalized linear models Binary regression Count data regression Criteria for model fit The most frequently used goodness-of-fit statistics are the Pearson statistic n (yi − µ ˆ i )2 χ2 = v (ˆ µi ) i=1 and the deviance D = {(l(y) − l(µ)} ˆ , where l(µ) ˆ and l(y) represent the log-likelihood for the estimated and the saturated model, respectively Both statistics are approximately χ2n−p -distributed Winter Term 2013/14 12/25 Generalized linear models Binary regression Count data regression Criteria for model selection The Akaike information criterion (AIC) for model selection is defined generally as ˆ + 2p AIC = −2l(β) The Bayesian information criterion (BIC) is defined generally as ˆ + log(n)p BIC = −2l(β) If the model contains a dispersion parameter φ, its ML estimator should be substituted into the respective model and the total number of parameters should be increased to p + Winter Term 2013/14 13/25 Generalized linear models Binary regression Count data regression Binary regression models Suppose that the response variable y is binary and can take only two possible values, denoted by and We may write πi = P(yi = 1) and − πi = P(yi = 0) for the probabilities of ‘success’ and ‘failure’, respectively (i = 1, , n) We want to model and and estimate the effects of the covariates on the (conditional) probability πi = P(yi = 1) = E(yi ) , for the outcome yi = and given values of the covariates xi1 , , xik In this specification, the response variables are assumed to be (conditionally) independent Winter Term 2013/14 14/25 Generalized linear models Binary regression Count data regression Binary regression models III We combine the probability πi with the linear predictor ηi through a relation of the form πi = h(ηi ) = h(β0 + β1 xi1 + · · · + βk xik ) , where the response function h is a strictly monotonically increasing cdf on the real line This ensures h(η) ∈ [0, 1] and the relation above can always be expressed in the form ηi = g (πi ) , with the inverse link function g = h−1 Logit and probit models are the most widely used binary regression models Winter Term 2013/14 15/25 Generalized linear models Binary regression Count data regression Logit model The logit model results from the choice of the logistic response function: π = h(η) = exp(η) + exp(η) or (equivalently) the logit link function g (π) = logit(π) = log π 1−π = η = β0 +β1 x1 + .+βk xk This yields a linear model for the logarithmic odds (log-odds) log(π/(1 − π)) The effects of the covariates affect the odds π/(1 − π) in an exponential-multiplicative form Winter Term 2013/14 16/25 Generalized linear models Binary regression Count data regression Probit model In the probit model we use for h the standard normal cumulative distribution function Φ(·), that is, π = Φ(η) = Φ(β0 + β1 x1 + + βk xk ) or (equivalently) the probit link function g (π) = probit(π) = Φ−1 (π) = η = β0 + β1 x1 + + βk xk A (minor) disadvantage is the required numerical evaluation of Φ in the maximum likelihood estimation of β Winter Term 2013/14 17/25 Generalized linear models Binary regression Count data regression Interpretation of the logit model Summary: The odds πi /(1 − πi ) = P(yi = 1|xi )/P(yi = 0|xi ) follow the multiplicative model P(yi = 1|xi ) = exp(β0 ) · exp(xi1 β1 ) · · exp(xik βk ) P(yi = 0|xi ) If, for example, xi1 increases by one unit to xi1 + 1, the following applies to the odds ratio: P(yi = 1|xi1 + 1, ) P(yi = 1|xi1 , ) / = exp(β1 ) P(yi = 0|xi1 + 1, ) P(yi = 0|xi1 , ) β1 > : odds ratio > 1, β1 < : odds ratio < 1, β1 = : odds ratio = Winter Term 2013/14 18/25 Generalized linear models Binary regression Count data regression Fitting the logit model The parameters of the logistic regression model are estimated by using the method of maximum likelihood ˆ has been obtained, the relationship between the Once β estimated response probability and values x1 , x2 , , xk can be expressed as logit(ˆ π ) = βˆ0 + βˆ1 x1 + · · · + βˆk xk , or equivalently, π ˆ= exp(βˆ0 + βˆ1 x1 + · · · + βˆk xk ) + exp(βˆ0 + βˆ1 x1 + · · · + βˆk xk ) Winter Term 2013/14 19/25 Generalized linear models Binary regression Count data regression Fitting the logit model II The estimated value of the linear systematic component of the model for the ith observation is ηˆi = βˆ0 + βˆ1 xi1 + · · · + βˆk xik From this, the fitted probabilities, π ˆi , can be found from π ˆi = Winter Term 2013/14 exp(ˆ ηi ) + exp(ˆ ηi ) 20/25 Generalized linear models Binary regression Count data regression Standard errors of parameter estimates Following the estimation of the β-parameters in a logistic linear model, information about their precision will generally be needed Such information is conveyed in the standard error of an estimate, se(βˆj ), for j = 0, , k From the standard error of βˆj , 100(1 − α)% confidence limits for the corresponding true value, βj , are βˆj ± z1− α2 × se(βˆj ) These interval estimates throw light on the likely range of values of the parameter Winter Term 2013/14 21/25 Generalized linear models Binary regression Count data regression Count data Count data are frequently observed when the number of events within a fixed time frame or frequencies in a contingency table have to be analyzed Sometimes, a normal approximation can be sufficient In general, however, discrete distributions recognizing the specific properties of count data are most appropriate The Poisson distribution is the simplest and most widely used choice Winter Term 2013/14 22/25 Generalized linear models Binary regression Count data regression Log-linear Poisson model The most widely used model for count data connects the rate λi = E(yi ) of the Poisson distribution with the linear predictor ηi = xi β via λi = exp(ηi ) = exp(β0 ) exp(β1 xi1 ) · · exp(βk xik ) or in log-linear form through log(λi ) = ηi = xi β = β0 + β1 xi1 + + βk xik The effect of covariates on the rate λ is, thus, exponentially multiplicative similar to the effect on the odds π/(1 − π) in the logit model The effect on the logarithm of the rate is linear Winter Term 2013/14 23/25 Generalized linear models Binary regression Count data regression Overdispersion The assumption of a Poisson distribution for the responses implies λi = E(yi ) = Var(yi ) For similar reasons as in case with binomial data, a significantly higher empirical variance is frequently observed in applications of Poisson regression This phenomenon is known as overdispersion For this reason, it is often useful to introduce an overdispersion parameter φ by assuming Var(yi ) = φλi Winter Term 2013/14 24/25 Generalized linear models Binary regression Count data regression Overdispersion II The overdispersion parameter φ can be estimated as the average Pearson statistic or the average deviance: φˆP = χ2 n−p or φˆD = D n−p We then have to multiply the estimated covariance matrix ˆ i.e., Cov(β) ˆ = φF ˆ −1 (β) ˆ with φ, This approach to the estimation of overdispersion does not correspond to a true likelihood method, but rather to a quasi-likelihood model Winter Term 2013/14 25/25 [...]... function h is a strictly monotonically increasing cdf on the real line This ensures h(η) ∈ [0, 1] and the relation above can always be expressed in the form ηi = g (πi ) , with the inverse link function g = h−1 Logit and probit models are the most widely used binary regression models Winter Term 2013/14 15/25 Generalized linear models Binary regression Count data regression Logit model The logit model... for the outcome yi = 1 and given values of the covariates xi1 , , xik In this specification, the response variables are assumed to be (conditionally) independent Winter Term 2013/14 14/25 Generalized linear models Binary regression Count data regression Binary regression models III We combine the probability πi with the linear predictor ηi through a relation of the form πi = h(ηi ) = h(β0 + β1... simplest and most widely used choice Winter Term 2013/14 22/25 Generalized linear models Binary regression Count data regression Log -linear Poisson model The most widely used model for count data connects the rate λi = E(yi ) of the Poisson distribution with the linear predictor ηi = xi β via λi = exp(ηi ) = exp(β0 ) exp(β1 xi1 ) · · exp(βk xik ) or in log -linear form through log(λi ) = ηi = xi... π/(1 − π) in the logit model The effect on the logarithm of the rate is linear Winter Term 2013/14 23/25 Generalized linear models Binary regression Count data regression Overdispersion The assumption of a Poisson distribution for the responses implies λi = E(yi ) = Var(yi ) For similar reasons as in case with binomial data, a significantly higher empirical variance is frequently observed in applications... or (equivalently) the logit link function g (π) = logit(π) = log π 1−π = η = β0 +β1 x1 + .+βk xk This yields a linear model for the logarithmic odds (log-odds) log(π/(1 − π)) The effects of the covariates affect the odds π/(1 − π) in an exponential-multiplicative form Winter Term 2013/14 16/25 Generalized linear models Binary regression Count data regression Probit model In the probit model we use... approximately χ2n−p -distributed Winter Term 2013/14 12/25 Generalized linear models Binary regression Count data regression Criteria for model selection The Akaike information criterion (AIC) for model selection is defined generally as ˆ + 2p AIC = −2l(β) The Bayesian information criterion (BIC) is defined generally as ˆ + log(n)p BIC = −2l(β) If the model contains a dispersion parameter φ, its... model contains a dispersion parameter φ, its ML estimator should be substituted into the respective model and the total number of parameters should be increased to p + 1 Winter Term 2013/14 13/25 Generalized linear models Binary regression Count data regression Binary regression models Suppose that the response variable y is binary and can take only two possible values, denoted by 0 and 1 We may write... regression Fitting the logit model II The estimated value of the linear systematic component of the model for the ith observation is ηˆi = βˆ0 + βˆ1 xi1 + · · · + βˆk xik From this, the fitted probabilities, π ˆi , can be found from π ˆi = Winter Term 2013/14 exp(ˆ ηi ) 1 + exp(ˆ ηi ) 20/25 Generalized linear models Binary regression Count data regression Standard errors of parameter estimates Following the... the parameter Winter Term 2013/14 21/25 Generalized linear models Binary regression Count data regression Count data Count data are frequently observed when the number of events within a fixed time frame or frequencies in a contingency table have to be analyzed Sometimes, a normal approximation can be sufficient In general, however, discrete distributions recognizing the specific properties of count data... Φ(η) = Φ(β0 + β1 x1 + + βk xk ) or (equivalently) the probit link function g (π) = probit(π) = Φ−1 (π) = η = β0 + β1 x1 + + βk xk A (minor) disadvantage is the required numerical evaluation of Φ in the maximum likelihood estimation of β Winter Term 2013/14 17/25 Generalized linear models Binary regression Count data regression Interpretation of the logit model Summary: The odds πi /(1 − πi ) .. .Generalized linear models Binary regression Count data regression Components of the classical linear model Generalized linear models (GLMs) are an extension of classical linear models... substituted into the respective model and the total number of parameters should be increased to p + Winter Term 2013/14 13/25 Generalized linear models Binary regression Count data regression Binary regression. .. xik In this specification, the response variables are assumed to be (conditionally) independent Winter Term 2013/14 14/25 Generalized linear models Binary regression Count data regression Binary

Ngày đăng: 04/12/2015, 17:08

Từ khóa liên quan

Mục lục

  • Generalized linear models

  • Binary regression

  • Count data regression

Tài liệu cùng người dùng

Tài liệu liên quan