Tiểu luận môn định giá doanh further development and analysis of the classical linear regression model

Trang 1

Chapter 4

Further development and analysis of the classical linear regression model

Phan Tuyết TrinhTô Thị Phương ThảoNguyễn Hoàng Minh Huy

Lâm Bá DuLê Chí CangHuỳnh Thái Huy

GVHD: TS Phùng Đức Nam

Trang 2

1 Generalising the simple model to multiple linear regression2 The constant term

3 How are the parameters calculated in the generalised case?

4 Testing multiple hypotheses: the F-test

5 Sample output for multiple hypothesis tests6 Multiple regression using an APT-style model7 Data mining and the true size of the test

8 Goodness of fit statistics9 Hedonic pricing models

10 Tests of non-nested hypotheses11 Quantile regression

Trang 3

4.1 Generalising the simple model to multiple linear regression

Stock returns might be purported to depend on their sensitivity to unexpected changes in:

• inflation

• the differences in returns on short- and long-dated bonds• industrial production

• default risks

Trang 4

4.2 The constant term

k is defined as the number of ‘explanatory variables’ or

‘regressors’ including the constant term.

= the number of parameters that are estimated in the regression equation.

Trang 5

The elements of the β vector

●SRF(Sample Regression Function)

, where:,

4.3 How are the parameters (the elements of the β vector) calculated in the generalised case?

Trang 6

Ordinary least squares (OLS)

● (: an estimate of the variance of the errors - )

● var

Trang 7

Trang 8

Trang 9

Trang 11

●var •

Trang 12

● Mô hình gốc/Mô hình không ràng buộc – UnRestricted

Ước lượng bằng OLS thu được tổng bình phương các phần dư

URSS, có bậc tự do df (degree of freedom) = T – k

● Mô hình có ràng buộc (Mô hình bị thu hẹp, mất đi m hệ số hồi

Trang 13

Ví dụ mô hình có ràng buộc (Restricted)

Trang 15

View/Coefficient Diagnostics/Wald Test –

Trang 17

•Whether the monthly returns on Microsoft stock can be explained bay reference to unexpected changes in a set of macroeconomic and financial variables.

=> Arbitrage pricing theory (APT)

4.6 Multiple regression using an APT-style model

Trang 18

The steps to take regression model

• Step 1: Open a new Eviews workfile • Step 2: Import the data

• Step 3: Generate variables:

The APT posits that the stock return can be

explained by reference to the unexpected changes in the macroeconomic varibles rather their levels Unexpected value = Actual value – expected value

Trang 19

Generate variables

•Genr

Dspread = baa_aaa_spread – baa_aaa_spread(-1)Dcredit = consumer_credit – consumer_credit (-1)Rmsoft = 100*dlog(microsoft)

Rsandp = 100*dlog(sandp)

Dmoney = m1money_supply – m1money_supply(-1)Inflation = 100*dlog(cpi)

Term = ustb10y – ustb3m

Dinflation = inflation – inflation(-1)Mustb3m = ustb3m/12

Rterm = term – term(-1)

Ermsoft = rmsoft – mustb3mErsandp = rsandp – mustb3m

Trang 20

• Step 4: Object/New Object/ Equation msoftreg: ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD RTERM • Method: Least Squares.

Trang 22

• View/Coefficient Diagnostics/Wald Test – Coefficient Restrictions

• C(3) = 0, C(4) = 0, C(5) = 0, C(6) = 0, C(7) = 0

Trang 23

Null Hypothes is Sum m ary:

Norm alized Res triction (= 0)ValueStd Err.

Trang 24

Stepwise regression

• Stepwise regression is an automatic variable selection produre which chooses the jointly most important’s explanatory variables from a set of candidate variables.

• The simplest is the uni-directional forwards method.

• No variables => first variable(the lowest p-value) =>the next lowest p-value

Trang 25

• Object/New Object

• Equation: Msoftstepwise

• Method: STEPLS- Stepwise Least Square • Dependent variable: ERMSOFT C

• Explanatory variables: ERSANDP DPROD DCREDIT DINFLATION DMONEY DSPREAD

Trang 27

Stepwise regression

Stepwise procedures have been strongly criticised by statistical purists At the most basic level, they are sometimes argued to be no better than automated procedures for data mining, in particular if the list of potential candidate variables is long and results from a ‘fishing trip’ rather than a strong prior financial theory.

Trang 28

Sample sizes and asymptotic theory

•A question that is often asked by those new to econometrics is ‘what is an appropriate sample size for model estimation?’ - Most testing procedures in econometrics rely on asymptotic

theory The results in theory hold only if there are an infinite

Trang 29

• test statistics are assumed to follow a random distribution

=> they will take on extreme values that fall in the rejection region some of the time by chance alone Þ the possibility of rejecting a correct null

4.7 Data mining and the true size of the test

Trang 30

• If enough explanatory variables are employed in a regression, often one or more will be significant by chance alone.

• If an α% size of test is used, on average one in every (100/αα) regressions will have a significant

slope coefficient by chance alone.

Trang 31

• Trying many variables in a regression without basing the selection of the candidate variables on a financial or economic theory is known as

‘data mining’ or ‘data snooping’.

=> The true significance level will be considerably greater than the nominal significance level assumed.

Trang 32

To avoid data mining:

• ensuring that the selection of candidate regressors for inclusion in a model is made on the basis of financial or economic theory

• examining the forecast performance of the model in an ‘out-of-sample’ data set

Trang 33

4.8 Goodness of fit statistics

“How well does the model containing the explanatory variables that was proposed actually explain variations in the dependent variable?”

Trang 34

• Quantities known as goodness of fit statistics

are available to test how well the SRF fits the data – that is, how ‘close’ the fitted regression line is to all of the data points taken together.

Trang 35

What measures might make plausible candidates to be goodness of fit statistics?

• RSS

The value of RSS depends to a great extent on the

scale of the dependent variable • R2

A scaled version of RSS

Trang 36

• It is the square of the correlation coefficient between and

• the square of the correlation between the values of the dependent variable and the corresponding fitted values from the model

• must lie between 0 and 1

• If this correlation is high, the model fits the data well, while if the correlation is low (close to zero), the model is not providing a good fit to the data

•

Trang 37

The TSS can be split into 2 parts:

• the part that has been explained by the model (the

explained sum of squares, ESS)

• the part that the model was not able to explain (the RSS).

Trang 38

R2

Trang 39

RSS = TSS i.e ESS =0 so R2 = ESS/TSS = 0

• The model has not succeeded in explaining any

of the variability of y about its mean value

• This would happen only where the estimated values of all of the coefficients = 0

Trang 40

ESS = TSS i.e RSS =0 so R2 = ESS/TSS = 1

• The model has explained all of the variability of

y about its mean value

• This would happen only in the case where all of the observation points lie exactly on the fitted line.

Trang 41

Trang 43

Problems with R2 as a goodness of fit measure

•R2 is defined in terms of variation about the mean of y so that if a model is reparameterised (rearranged) and the dependent variable changes, R2 will change.

• R2 never falls if more regressors are added to the regression

• (3) R2 can take values of 0.9 or higher for time series regressions, and hence it is not good at discriminating between models, since a wide array of models will

frequently have broadly similar (and high) values of R2.

Trang 44

Adjusted R2

So if an extra regressor (variable) is added to the

model, k increases and unless R2 increases by a

more than off-setting amount, will actually fall.

Trang 45

• One application of econometric techniques where the coefficients have a particularly intuitively appealing interpretation is in the area of hedonic pricing models.

• Hedonic models are often used to produce appraisals or valuations of properties, given their characteristics (e.g size of dwelling, number of bedrooms, location, number of bathrooms, etc) In these models, the coefficient estimates represent ‘prices of the characteristics’.

4.9 Hedonic pricing models

Trang 46

• One such application of a hedonic pricing model is given by Des Rosiers and Theriault (1996), who consider the effect of various amenities on rental values for ´ buildings and apartments in five sub-markets in the Quebec area of Canada

• The paper employs 1990 data for the Quebec City region, and there are 13,378 observations

Trang 47

LnAGE log of the apparent age of the property

NBROOMS number of bedrooms

AREABYRM area per room (in square metres)

ELEVATOR a dummy variable = 1 if the building has anelevator; 0 otherwise

BASEMENT a dummy variable = 1 if the unit is located in a basement; 0 otherwise OUTPARK number of outdoor parking spaces

INDPARK number of indoor parking spaces

NOLEASE a dummy variable = 1 if the unit has no leaseattached to it; 0 otherwise LnDISTCBD log of the distance in kilometres to the centralbusiness district (CBD) SINGLPAR percentage of single parent families in the areawhere the building stands DSHOPCNTR distance in kilometres to the nearest shoppingcentre VACDIFF1 vacancy difference between the building and thecensus figure

Trang 48

-Hedonic model of rental values in Quebec City, 1990.Dependent variable: Canadian dollars per month

Trang 49

• This list includes several variables that are dummy variables.

• Dummy variables can be used in the context of cross-sectional or time series regressions.

• The dummy variables are used in the same way as other explanatory variables and the coefficients on the dummy variables can be interpreted as the average differences in the values of the dependent variable for each category.

Trang 50

The relationship between the regression F -statistic and R.

• Recall that the regression F -statistic tests the

null hypothesis that all of the regression slope parameters are simultaneously zero

Trang 51

• One limitation of such studies that is worth

mentioning at this stage is their assumption that the implicit price of each characteristic is

identical across types of property, and that

these characteristics do not become saturated.

Trang 52

• Suppose that there are two researchers

working independently, each with a separate financial theory for explaining the variation in

some variable, yt

(4.48) (4.49)

• An alternative approach to comparing between non-nested models would be to estimate an encompassing or hybrid model In the case of (4.48) and (4.49), the relevant encompassing

Trang 53

1.γ2 is statistically significant but γ3 is not In this case, (4.50) collapses to (4.48), and the latter is the preferred model.

2.γ3 is statistically significant but γ2 is not In this case, (4.50) collapses to (4.49), and the latter is the preferred model.

3.γ2 and γ3 are both statistically significant This would imply that

both x2 and x3 have incremental explanatory power for y, in

which case both variables should be retained Models (4.48) and (4.49) are both ditched and (4.50) is the preferred model.4. Neither γ2 nor γ3 are statistically significant In this case, none

of the models can be dropped, and some other method for choosing between them must be employed.

Selecting between models

4.10 Tests of non-nested hypotheses

Trang 54

• There are several limitations to the use of encompassing regressions to select between non-nested models.

• It could be the case that if they are both

included, neither γ2 nor γ3 are statistically significant, while each is significant in their separate regressions (4.48) and (4.49)

4.10 Tests of non-nested hypotheses

Trang 55

Background and motivation

• We may think of there being a non-linear (∩-shaped) relationship between regulation and GDP growth

• Estimating a standard linear regression model may lead to seriously misleading estimates: it will ‘average’ the positive and negative effects from very low and very high regulation.

4.11 Quantile regression

Trang 56

• Quantile regressions, developed by Koenker

and Bassett (1978), represent a more natural and flexible way to capture the complexities inherent in the relationship by estimating models for the conditional quantile functions.

Trang 57

• Quantile regressions can be conducted in both time series and cross-sectional contexts

• It is usually assumed that the dependent

variable (response variable) in the literature on

quantile regressions, is independently distributed and homoscedastic

• Quantile regression is a non-parametric technique

Trang 58

• Quantiles, denoted , refer to the position where

an observation falls within an ordered series for

Q(τ ) = inf y : F(y) ≥ τ

where inf refers to the infimum, or the ‘greatest

lower bound’ which is the smallest value of y

satisfying the inequality

• quantiles must lie between 0 and 1

Trang 59

Estimation of quantile functions

Trang 60

An application of quantile regression: evaluating fund performance

• A study by Bassett and Chen (2001) performs a style attribution analysis for a mutual fund and, for comparison, the S&P500 index.

• Examine how a portfolio’s exposure to various styles varies with performance

Trang 61

An application of quantile regression: evaluating fund performance

• Bassett and Chen (2001) conduct a style

analysis in this spirit by regressing the returns of a fund on the returns of a large growth

portfolio, the returns of a large value portfolio, the returns of a small growth portfolio, and the returns of a small value portfolio.