Unit VII discussion Board RCH

Please make sure that it is your own work and not copy and paste. Please read the study guide and Please watch out for Spelling and Grammar errors. Please use the APA 7th edition.

Book Reference: Fox, J. (2017). Using the R Commander: A point-and-click interface for R. CRC Press. https://online.vitalsource.com/#/books/9781498741934

Provide an example of how simple linear regression could be used within your potential field of study for your dissertation. Please make sure you address the purpose of regression and the type of results you would obtain. Also please discuss the assumptions that need to be met to use this type of analysis. Your EOSA modules discuss this. Clearly identify the variables you are considering.

7.1 Linear Regression Models

As mentioned, linear least-squares regression is typically taken up in a basic statistics course. Thenormal linear regression modelis written

yi=0+1x1i+2x2i++kxki+i=E(yi)+i

(7.1)

whereyiis the value of the response variable for theith ofnindependently sampled observations;x1i,x2i,,xkiare the values ofkexplanatory variables; and the errorsiare normally and independently distributed with 0 means and constant variance,i NID(0,2). Bothyand thexs are numeric variables, and the model assumes that the average value E(y)ofyis a linear functionthat is, a simple weighted sumof thexs.

1

If there is just onex(i.e., ifk= 1), then

Equation 7.1

is called thelinear simple regression model; if there are more than onex(k 2), then it is called thelinear multiple regression model.

The normal linear model is optimally estimated by themethod of least squares, producing thefitted model

yi=b0+b1x1i+b2x2i++bkxki+ei=y^i+ei

wherey^iis thefitted valueandeitheresidualfor observationi. The least-squares criterion

FIGURE 7.1: TheLinear Regressiondialog for Duncans occupational prestige data.

selects the values of the bs that minimize the sum of squared residuals,ei2. The least-squares regression coefficients are easily computed, and, in addition to having desirable statistical properties under the model (such as efficiency and unbias), statistical inference based on the least-squares estimates is very simple (see, e.g., the references given at the beginning of the chapter).

The simplest way to fit a linear regression model in theR Commanderis by theLinear Regressiondialog. To illustrate, Ill use Duncans occupational prestige data (introduced in

Chapter 4

). Duncans data set resides in thecarpackage, and so I can read the data into theR CommanderviaData > Data in packages > Read data from an attached package(see

Section 4.2.4

). Then selectingStatistics > Fit models > Linear regressionproduces the dialog in

Figure 7.1

. To complete the dialog, I click onprestigein theResponse variablelist, and Ctrl-click oneducationandincomein theExplanatory variableslist. Finally, pressing theOKbutton produces the output shown in

Figure 7.2

.

The commands generated by theLinear Regressiondialog use thelm(linear model) function inRto fit the model, creatingRegModel.1, and then summarize the model to produce printed output. The summary output includes information about the distribution of the residuals; coefficient estimates, their standard errors,tstatistics for testing the null hypothesis that each population regression coefficient is 0, and the two-sided p-values for these tests; the standard deviation of the residuals (residual standard error) and residual degrees of freedom; the squared multiple correlation,R2, for the model and R2adjusted for degrees of freedom; and the omnibusFtest for the hypothesis that all population slope coefficients (here the coefficients ofeducationandincome)are 0 (H0:1=2= 0, for the example).

This is more or less standard least-squares regression output, similar to printed output produced by almost all statistical packages. What is unusual is that in addition to the printout in

Figure 7.2

, theR Commandercreates and retains alinear model objecton which I can perform further computations, as illustrated later in this chapter.

TheModelbutton in theR Commandertoolbar now readsRegModel.1, rather than

FIGURE 7.2: Output from Duncans regression of occupationalprestigeonincomeandeducation, produced by theLinear Regressiondialog.

The variable lists in theLinear Regressiondialog in

Figure 7.1

include only numeric variables. For example, the factortype(type of occupation) in Duncans data set, with levelsbc(blue-collar),wc(white-collar), andprof(professional, technical, or managerial), doesnt appear in either variable list. Moreover, the explanatory variables that are selected enter the model linearly and additively. TheLinear Modeldialog, described in the next section, is capable of fitting a much wider variety of regression models.

In completing theLinear Regressiondialog in

Figure 7.1

, I left the name of the model at its default,RegModel.1. TheR Commandergenerates unique model names automatically during a session, each time incrementing the model number (here 1).

I also left theSubset expressionat its default,

2

for example, the regression model would have been fit only to blue-collar occupations. As in this example, the subset expression can be a logical expression, returning the valueTRUEorFALSEfor each case (see

Section 4.4.2

), a vector of case indices to include,

3

or a negative vector of case indices toexclude. For example,1:25would include the first 25 occupations, while-c(6, 16)would exclude occupations 6 and 16.

4

All of the statistical modeling dialogs in theR Commanderallow subsets of cases to be specified in this manner.

7.2 Linear Models with Factors*

Like theLinear Regressiondialog described in the preceding section, theLinear Modeldialog can fit additive linear regression models, but it is much more flexible: TheLinear Modeldialog accommodates transformations of the response and explanatory variables, factors as well as numeric explanatory variables on the right-hand-side of the regression model, nonlinear functions of explanatory variables expressed as polynomials and regression splines, and interactions among explanatory variables. All this is accomplished by allowing the user to specify the model as anRlinear-model formula. Linear-model formulas inRare inherited from theSprogramming language (Chambers and Hastie, 1992), and are a version of notation for expressing linear models originally introduced by Wilkinson and Rogers (1973).

7.2.1 Linear-Model Formulas

AnRlinear-model formula is of the general formresponse-variablelinear-predictor. The tilde (~) in a linear-model formula can be read as is regressed on. Thus, in this general form, the response variable is regressed on a linear predictor comprising thetermsin the right-hand side of the model.

The left-hand side of the model formula,response-variable, is anRexpression that evaluates to the numeric response variable in the model, and is usually simply thenameof the response variablefor example,prestigein Duncans regression. You can, however, transform the response variable directly in the model formula (e.g.,log10(income)) or compute the response as a more complex arithmetic expression (e.g.,log(investment.income + hourly.wage.rate*hours.worked).

5

The formulation of the linear predictor on the right-hand side of a model formula is more complex. What are normally arithmetic operators (+,-,*,/, and^) inRexpressions have special meanings in a model formula, as do the operators : (colon) and%in%. The numeral1(one) may be used to represent the regression constant (i.e., the intercept) in a model formula; this is usually unnecessary, however, because an intercept is included by default. A period (.) represents all of the variables in the data set with the exception of the response. Parentheses may be used for grouping, much as in an arithmetic expression.

In the large majority of cases, youll be able to formulate a model using only the operators+(interpreted as and) and*(interpreted as crossed with), and so Ill emphasize these operators here. The meaning of these and the other model-formula operators are summarized and illustrated in

Table 7.1

. Especially on first reading, feel free to ignore everything in the table except+, :, and*(and : is rarely used directly).

A final formula subtlety: As Ive explained, the arithmetic operators take on special meanings on the right-hand side of a linear-model formula. A consequence is that you cant use these operators directly for arithmetic. For example, fitting the modelsavings~wages + interest + dividendsestimates aseparateregression coefficient for each ofwages, interest, anddividends. Suppose, however, that you want to estimate asinglecoefficient for the sum of these variablesin effect, setting the three coefficients equal to each other. The solution is to protect the+operator inside a call to theI(identityorinhibit) function, which simply returns its argument unchanged:

6

savingsI(wages + interest + dividends). This formula works as desired because arithmetic operators like+have their usual meaningwithina function call on the right-hand side of the formulaimplying, incidentally, thatsavings log10(wages + interest + dividends)also works as intended, estimating a single coefficient for the log base 10 of the sum ofwages, interest, anddividends.

TABLE 7.1: Operators and other symbols used on the right-hand side of R linear-model formulas.

Operator

Meaning

Example

Interpretation

+

and

x1 + x2

x1andx2

:

interaction

x1:x2

interaction ofx1andx2

*

crossing

x1*x2

x1crossed withx2(i.e.,x1+x2+x1:x2)

–

remove

x1-1

regression through the origin (for numericx1)

^k

cross to orderk

(x1 + x2 + x3)^2

same asx1*x2+x1*x3+x2*x3

%.in%.

nesting

province %in% country

provincenested incountry

/

nesting

country/province

same ascountry + province %in% country

Symbol

Meaning

Example

Interpretation

1

intercept

x1-1

suppress the intercept

.

everything but the response

y ~.

regressyon everything else

( )

grouping

x1*(x2 + x3)

same asx1*x2 + x1*x3

The symbolsx1, x2, andx3represent explanatory variables and could be either numeric or factors.

7.2.2 The Principle of Marginality

Introduced by Nelder (1977), theprinciple of marginalityis a rule for formulating and interpreting linear (and similar) statistical models. According to the principle of marginality, if aninteraction, sayx1:x2, is included in a linear model, then so should themain effects,x1andx2, that aremarginaltothat islower-order relativesofthe interaction. Similarly, thelower-order interactionsx1:x2, x1:x3, andx2:x3are marginal to thethree-way interactionx1:x2:x3. The regression constant(1in anRmodel formula) is marginal to every other term in the model.

7

It is in most circumstances difficult inRto formulate models that violate the principle of marginality, and trying to do so can produce unintended results. For example, although it may appear that the modely f*x – x – 1, wherefis a factor andxis a numeric explanatory variable,

8

violates the principle of marginality by removing the regression constant andxslope, the model thatRactually fits includes a separate intercept and slope for each level of the factorf. Thus, the modely f*x – x – 1is equivalent to (i.e., an alternative parametrization of)y f*x. It is almost always best to stay away from such unusual model formulas.

7.2.3 Examples Using the Canadian Occupational Prestige Data

For concreteness, Ill formulate several linear models for the Canadian occupational prestige data (introduced in

Section 4.2.3

and described in

Table 4.2

on

page 61

), regressingprestigeonincome, education, women(gender composition), andtype(type of occupation). The last variable is a factor (categorical variable) and so it cannot enter into the linear model directly. When a factor is included in a linear-model formula,Rgeneratescontraststo represent the factorone fewer than the number of levels of the factor. Ill explain how this works in greater detail inSection 7.2.4, but the default in theR Commander(andRmore generally) is to use 0/1dummy-variable regressors, also calledindicator variables.

A version of the Canadian occupational prestige data resides in the data framePrestigein thecarpackage,9and its convenient to read the data into theR Commanderfrom this source viaData > Data in packages > Read data from an attached package.PrestigereplacesDuncanas the active data set.

Recall that 4 of the 102 occupations in thePrestigedata set have missing values(NA)for occupationaltype. Because I will fit several regression models to thePrestigedata, not all of which includetype, I begin by filtering the data set for missing values, selectingData > Active data set > Remove cases with missing data(as described inSection 4.5.2).

Moreover, the default alphabetical ordering of the levels oftypebc,prof,wcis not the natural ordering, and so I also reorder the levels of this factor viaData > Manage variables in active data set > Reorder factor levelstobc, wc, prof(seeSection 3.4). This last step isnt strictly necessary, but it makes the data analysis easier to follow.

I first fit an additive dummy regression to the Canadian prestige data, employing the model formulaprestigeincome + education + women + type. To do so, I selectStatistics > Fit models > Linear modelfrom theR Commandermenus, producing the dialog box inFigure 7.3. The automatically supplied model name isLinearModel.2, reflecting the fact that I have already fit a statistical model in the session,RegModel.1(inSection 7.1).

Most of the structure of theLinear Modeldialog is common to statistical modeling dialogs in theR Commander. If the response text box to the left of the in the model formula is empty, double-clicking on a variable name in the variable list box enters the name into the response box; thereafter, double-clicking on variable names enters the names into the right-hand side of the model formula, separated by +s (if no operator appears at the end of the partially completed formula). You can enter parentheses and operators like+and*into the formula using the toolbar in the dialog box.10You can also type directly into the model-formula text boxes. InFigure 7.3, I simply double-clicked successively onprestige, education, income, women, andtype.11ClickingOKproduces the output shown inFigure 7.4.

I already explained the general format of linear-model summary output inR. Whats new inFigure 7.4is the way in which the factortypeis handled in the linear model: Two dummy-variable regressors are automatically created for the three-level factortype. The first dummy regressor, labelledtype[T.wc]in the output, is coded 1 whentypeiswcand 0 otherwise; the second dummy regressor,type[T.prof], is coded 1 whentypeisprofand 0 otherwise. The first level oftypebcis therefore selected as thereferenceorbaseline level, coded 0 for both dummy regressors.12

Consequently, the intercept in the linear-model output is the intercept for thebcreference level oftype, and thecoefficients for the other levels give differences in the intercepts between each of these levels and the reference level. Because the slope coefficients for the numeric explanatory variableseducation, income, andwomenin this additive model do not vary by levels oftype, the dummy-variable coefficients are also interpretable as the average difference between each other level andbcforanyfixed values ofeducation, income, andwomen.

FIGURE 7.3:Linear Modeldialog completed to fit an additive dummy-variable regression ofprestigeon the numeric explanatory variableseducation, income, andwomen, and the factortype.

To illustrate a structurally more complex, nonadditive model, I respecify the Canadian occupational prestige regression model to include interactions betweentypeandeducationand betweentypeandincome, in the process removingwomenfrom the modelin the initial regression, the coefficient ofwomenis small with a large p-value.

13

TheLinear Modeldialog (not shown) reopens in its previous state, with the model name incremented toLinearModel.3. To fit the new model, I modify the formula to readprestige type*education + type*income. ClickingOKproduces the output in

Figure 7.5

.

With interactions in the model, there are different intercepts and slopes for each level oftype. The intercept in the outputalong with the coefficients foreducationandincomepertains to the baseline levelbcoftype. Other coefficients represent differences between each of the other levels and the baseline level. For example,type[T.wc]= 33.54 is the difference in intercepts between thewcandbclevels oftype;

14

similarly, the interaction coefficienttype[T.wc]:education= 4.291 is the difference ineducationslopes between thewcandbclevels. The complexity of the coefficients makes it difficult to understand what the model says about the data;

Section 7.6

shows how to visualize terms such as interactions in a complex linear model.

FIGURE 7.4: Output for the linear modelprestige income + education + women + typefit to thePrestigedata.

FIGURE 7.5: Output for the linear modelprestige type*education + type*incomefit to thePrestigedata.

TABLE 7.2: Contrast-regressor codings fortypegenerated bycontr.Treatment, contr.Sum, contr.poly,, andcontr.Helmert.

Levels of type

Function

Contrast Names

bc

wc

prof

contr.Treatment

type[T.wc]

0

1

0

type[T.prof]

0

0

1

contr.Sum

type[S.wc]

1

0

-1

type[S.prof]

0

1

-1

contr.poly

type.L

1/2

0

1/2

type.Q

1/6

2/6

1/6

contr.Helmert

type[H.1]

-1

1

0

type[H.2]

-1

-1

2

7.2.4 Dummy Variables and Other Contrasts for Factors

By default in theR Commander, factors in linear-model formulas are represented by 0/1 dummy-variable regressors generated by thecontr.Treatmentfunction in thecarpackage, picking the first level of a factor as the baseline level.

15

This contrast coding, along with some other choices, is shown in

Table 7.2

, using the factortypein thePrestigedata set as an example.

The functioncontr.Sumfrom thecargenerates so-called sigma-constrained or sum-to-zero contrast regressors, as are used in traditional treatments of analysis of variance.

16

The standardRfunctioncontr.polygenerates orthogonal-polynomial contrastsin this case, linear and quadratic terms for the three levels oftype; in theR Commander,contr.polyis the default choice for ordered factors. Finally,contr.Helmertgenerates Helmert contrasts, which compare each level to the average of those preceding it.

SelectingData > Manage variables in active data set > Define contrasts for a factorproduces the dialog box on the left of

Figure 7.6

. The factortypeis preselected in this dialog because its the only factor in the data set. You can use the radio buttons to choose among treatment, sum-to-zero, Helmert, and polynomial contrasts, or define customized contrasts by selectingOther, as Ive done here.

ClickingOKleads to the sub-dialog shown on the right of

Figure 7.6

. I change the default contrast names, .1and .2, to[bc.v.others]and[wc.v.prof], and then fill in the contrast coefficients (i.e., the values of the contrast regressors). This choice produces contrast regressors namedtype[bc.v.others]andtype[wc.v.prof], to be used when the factortypein thePrestigedata set appears in a linear-model formula. Contrasts defined directly in this manner must be linearly independent and are simplest to interpret if they obey two additional rules: (1) The coefficients for each contrast should sum to 0, and (2) each pair of contrasts should be orthogonal (i.e., the products of corresponding coefficients for each pair of contrasts sum to 0).

FIGURE 7.6: TheSet Contrasts for Factordialog box (left) and theSpecify Contrastssub-dialog (right), creating contrasts for the factortypein thePrestigedata set.

To see how these contrasts are reflected in the coefficients of the model, I refit the additive regression ofprestigeoneducation, income, women, andtype, producing the output inFigure 7.7. The first contrast fortypeestimates the difference betweenbcand the average of the other two levels oftype, holding the other explanatory variables constant, while the second contrast estimates the difference betweenwcandprof. This alternative contrast coding fortypeproduces different estimates for the intercept andtypecoefficients from the dummy-regressor coding fortypeinFigure 7.4(onpage 136), but the two models have the same fit to the data (e.g.,R2= 0.8349).17

FIGURE 7.7: Output for the linear modelprestige income + education + women + typefit to thePrestigedata, using customized contrasts fortype.

7.3 Fitting Regression Splines and Polynomials*

The second formula toolbar in theLinear Modeldialog makes it easy to add nonlinearpolynomial-regressionandregression-splineterms to a linear model.

7.3.1 Polynomial Terms

Some simple nonlinear relationships can be represented as low-order polynomials, such as a quadratic term, using regressorsxand x2for a numeric explanatory variable x, or a cubic term, using x, x2, and x3. The resulting model is nonlinear in the explanatory variablexbut linear in the parameters (thes).Rand theR Commandersupport both orthogonal and raw polynomials in linear model formulas.18

To add a polynomial term to the right-hand side of the model, single-click on a numeric variable in theVariableslist box, and then press the appropriate toolbar button (eitherorthogonal polynomialorraw polynomial, as desired). There is a spinner in theLinear Modeldialog for the degree of a polynomial term, and the default is 2 (i.e., a quadratic).

For example, inspection of the data (e.g., in a component-plus-residual plot, discussed inSection 7.8)19suggests that there may be a quadratic partial relationship between prestige and women in the regression of prestige on education, income, and women for the Canadian occupational prestige data.20I specify this quadratic relationship in theLinear Modeldialog inFigure 7.8, using a raw second-degree polynomial, and producing the output inFigure 7.9. The quadratic coefficient in the model turns out not to be statistically significant(p= 0.15).

7.3.2 Regression Splines

Regression splines are flexible functions capable of representing a wide variety of nonlinear patterns in a model that, like a regression polynomial, is linear in the parameters. BothB-splinesandnatural splinesare supported by theR CommanderLinear Modeldialog. Adding a spline term to the right-hand side of a linear model is similar to adding a polynomial prestige poly(women, degree=2, raw=TRUE) + ns(education, df=5) + ns(income, df=5), regressingprestigeon a quadratic inwomenand 5-df natural splines ineducationandincome. The output for the resulting regression model isnt shown because the model requires graphical interpretation (seeSection 7.6): The coefficient estimates for the regression splines are not simply interpretable.21

FIGURE 7.8:Linear Modeldialog with a polynomial (quadratic) term forwomenin the regression ofprestigeoneducation, income, andwomenusing thePrestigedata set.

FIGURE 7.9: Output from the regression of prestige on education, income, and a quadratic in women for thePrestigedata.

FIGURE 7.10:Linear Modeldialog showing regression-spline and polynomial terms for the regression ofprestigeoneducation, income, andwomenin thePrestigedata set.

7.4 Generalized Linear Models*

Briefly,generalized linear models(orGLMs), introduced in a seminal paper by Nelder and Wedderburn (1972), consist of three components:

1.Arandom componentspecifying the distribution of the responseyconditional on explanatory variables. Traditionally, the random component is a member of anexponential familythe Gaussian (normal), binomial, Poisson, gamma, or inverse Gaussian familiesbut both the theory of generalized linear models and their implementation inRare now more general: In addition to the traditional exponential families,Rprovides for quasi-binomial and quasi-Poisson families that accommodate over-dispersed binomial and count data.

2.Alinear predictor

i=0+1x1i+2x2i++kxki

on which the expectation of the response variablei=E(yi) for theith ofnindependent observations depends, where the regressorsxjiare prespecified functions of the explanatory variablesnumeric explanatory variables, dummy regressors representing factors, interaction regressors, and so on, exactly as in the linear model.

3.A prespecified invertiblelink function g(.) that transforms the expectation of the response to the linear predictor,g(i) =i, and thusi=g1(i).Rimplements identity, inverse, log, logit, probit, complementary log-log, square root, and inverse square links, with the applicable links varying by distributional family.

The most common GLM beyond the normal linear model (i.e., the Gaussian family paired with identity link) is the binomial logit model, suitable for dichotomous (two-category) response variables. For an illustration, Ill use data collected by Cowles and Davis (1987) on volunteering for a psychological experiment, where the subjects of the study were students in a university introductory psychology class.

The data for this example are contained in the data setCowlesin thecarpackage,

22

which includes the following variables:neuroticism, a personality dimension with integer scores ranging from 0 to 24;extraversion, another personality dimension, also with scores from 0 to 24;sex, a factor with levelsfemaleandmale; andvolunteer, a factor with levelsnoandyes.

In analyzing the data, Cowles and Davis performed a logistic regression of volunteering onsexand the linear-by-linear interaction betweenneuroticismandextraversion. To fit Cowles and Daviss model, I first read the data from thecarpackage in the usual manner, makingCowlesthe active data set in theR Commander. Then I selectStatistics > Fit models > Generalized linear model, producing the dialog box in

Figure 7.11

.

TheGeneralized Linear Modeldialog is very similar to theLinear Modeldialog of the preceding section: The name of the model at the top(GLM.7)is automatically generated, and you can change it if you wish. Double-clicking on a variable in the list box enters the variable into the model formula. There are toolbars for entering operators, regression splines, and polynomials into the model formula, and there are boxes for subsetting the data set and for specifying prior weights.

FIGURE 7.11:Genealized Linear Modeldialog box for Cowles and Daviss logistic regression.

Whats new in theGeneralized Linear Modeldialog are theFamilyandLink functionlist boxes, as are appropriate to a GLM. Families and links are coordinated: Double-clicking on a distributional family changes the available links. In each case, thecanonical linkfor a particular family is selected by default. The initial selections are thebinomialfamily and corresponding canonicallogitlink, which are coincidentally what I want for the example.

I proceed to complete the dialog by double-clicking onvolunteerin the variable list, making it the response variable; then double-clicking onsexand onneuroticism; clicking the*button in the toolbar; and finally double-clicking onneuroticismyielding the model formulavolunteer sex + neuroticism*extraversion. As in theLinear Modeldialog, an alternative is to type the formula directly.

Appropriate responses for a binomial logit model include two-level factors (such asvolunteerin the current example), logical variables (i.e., with valuesFALSEandTRUE), and numeric variables with two unique values (most commonly 0 and 1). In each case, the logit model is for the probability of thesecondof the two valuesthe probability thatvolunteerisyesin the example.

Clicking theOKbutton produces the output in

Figure 7.12

. TheGeneralized Linear Modeldialog uses theRglmfunction to fit the model. The summary output for a generalized linear model is very similar to that for a linear model, including a table of estimated coefficients along with their standard errors,zvalues(Wald statistics) for testing that the coefficients are 0, and the two-sidedp-values for these tests. For a logistic regression, theR Commanderalso prints the exponentiated coefficients, interpretable as multiplicative effects on the odds scalehere the odds of volunteering, Pr(yes)/Pr(no).

The Waldztests suggest a statistically significant interaction betweenneuroticismandextraversion, as Cowles and Davis expected, and a significantsexeffect, with men less likely to volunteer than women who have equivalent scores on the personality dimensions. Because its hard to grasp the nature of the interaction directly from the coefficient estimates, Ill return to this example in

Section 7.6

, where Ill plot the fitted model.

Although Ive developed just one example of a generalized linear model in this sectiona logit model for binary datatheR CommanderGeneralized Linear Modeldialog is more flexible:

The probit and complementary log-log(cloglog) link functions may also be used with binary data, as alternatives to the canonical logit link.

The binomial family may also be used when the value of the response variable for each case (orobservation) represents the proportion of successes in a given number of binomial trials, which may also vary by case. In this setting, the left-hand side of the model formula should give the proportion of successes, which could be computed assuccesses/trials(imagining that there are variables with these names in the active data set) directly in the left-hand box of the model formula, and the variable representing the number of trials for each observation (e.g.,trials)should be given in theWeightsbox.

Alternatively, for binomial data, the left-hand side of the model may be a two-column matrix specifying, respectively, the numbers of successes and failures for each observation, by typing, e.g.,cbind(successes, failures)(again, imagining that these variable are in the active data set) into the left-hand-side box of the model formula.

Other generalized linear models are specified by choosing a different family and corresponding link. For example, a Poisson regression model, commonly employed for count data, may be fit by selecting thepoissonfamily and canonicalloglink (or, to get typically more realistic coefficient standard errors, by selecting thequasipoissonfamily with theloglink).

FIGURE 7.12: Output from Cowles and Daviss logistic regression(volunteer sex + neuroticism*extraversion).

7.5 Other Regression Models*

In addition to linear regression, linear models, and generalized linear models, theR Commandercan fitmultinomial logit modelsfor categorical response variables with more than two categories (viaStatistics > Fit models > Multinomial logit model), andordinal regression modelsfor ordered multi-category responses, including theproportional-odds logit modeland theordered probit model (Statistics > Fit models > Ordinal regression model). Although I wont illustrate these models here, many of the menu items in theModelsmenu apply to these classes of models. Moreover (as I will show inChapter 9),R Commanderplug-in packages can introduce additional classes of statistical models.

7.6 Visualizing Linear and Generalized Linear Models*

Introduced by Fox (1987),effect plotsare graphs for visualizing complex regression models by focusing on particular explanatory variables or combinations of explanatory variables, holding other explanatory variables to typical values. One strategy is to focus successively on the explanatory variables in thehigh-order termsof the modelthat is, terms that arent marginal to others (seeSection 7.2.2).

In theR Commander, effect displays can be drawn for linear, generalized linear, and some other statistical models viaModels > Graphs > Effect plots.Figure 7.13shows the resulting dialog box for Cowles and Daviss logistic regression from the previous section,GLM.7, which is the current statistical model in theR C

August 19, 2022

August 19, 2022

August 19, 2022

August 19, 2022