What is the difference between Logistic Regression and Discriminant Analysis?

| | Comments (1) | TrackBacks (0)

What is Logistic Regression?
“Logistic regression allows one to predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, dichotomous, or a mix.” (Tabachnick and Fidell, 1996, p575)

What is Discriminant Analysis?
“The goal of the discriminant function analysis is to predict group membership from a set of predictors” (Tabachnick and Fidell, 1996, p507)

When/How to use Logistic Regression and Discriminant Analysis?
From the above definitions, it appears that the same research questions can be answered by both methods. The logistic regression may be better suitable for cases when the dependant variable is dichotomous such as Yes/No, Pass/Fail, Healthy/Ill, life/death, etc., while the independent variables can be nominal, ordinal, ratio or interval. The discriminant analysis might be better suited when the dependant variable has more than two groups/categories. However, the real difference in determining which one to use depends on the assumptions regarding the distribution and relationship among the independent variables and the distribution of the dependent variable.

So, what is the difference?
Well, for both methods the categories in the outcome (i.e. the dependent variable) must be mutually exclusive. One of the ways to determine whether to use logistic regression or discriminant analysis in the cases where there are more than two groups in the dependant variable is to analyze the assumptions pertinent to both methods. The logistic regression is much more relaxed and flexible in its assumptions than the discriminant analysis. Unlike the discriminant analysis, the logistic regression does not have the requirements of the independent variables to be normally distributed, linearly related, nor equal variance within each group (Tabachnick and Fidell, 1996, p575). Being free from the assumption of the discriminant analysis, posits the logistic regression as a tool to be used in many situations. However, “when [the] assumptions regarding the distribution of predictors are met, discriminant function analysis may be more powerful and efficient analytic strategy” (Tabachnick and Fidell, 1996, p579).

Even though the logistic regression does not have many assumptions, thus usable in more instances, it does require larger sample size, at least 50 cases per independent variable might be required for an accurate hypothesis testing, especially when the dependant variable has many groups (Grimm and Yarnold, p. 221). However, given the same sample size, if the assumptions of multivariate normality of the independent variables within each group of the dependant variable are met, and each category has the same variance and covariance for the predictors, the discriminant analysis might provide more accurate classification and hypothesis testing (Grimm and Yarnold, p.241). The rule of thumb though is to use logistic regression when the dependant variable is dichotomous and there are enough samples. [194:604]

References:
Grimm, L.G. & Yarnold, P.R. eds. (1995). Reading and Understanding Multivariate Statistics. Washington D.C.: American Psychological Association

Tabachnick, B.G. and Fidell, L.S. (1996). Using Multivariate Statistics. NY: HarperCollins

Similar entries:

- tacit vs. explicit theories: the impact on our thinking and 'theorizing' - Jun 10, 2008

- Qualitative study critique: open source requirements development - May 12, 2004

- the battle against spam goes on: spambayes vs. bogofilter - May 05, 2004

- actor-network theory or ANT ? - Apr 25, 2004

- Rice Virtual Lab in Statistics - Dec 12, 2003

0 TrackBacks

Listed below are links to blogs that reference this entry: What is the difference between Logistic Regression and Discriminant Analysis?.

TrackBack URL for this entry: http://www.kmentor.com/mtcgi/mt-tb.cgi/257

1 Comments

Shadi Sharif said:

Hi:
As you know in linear discriminant analysis(LDA),we want to classify datasets and provide a victor of coefficient(for independent variables)for predicting the result.in this way we need to define prior probabilities(equal or fit with sample size)and kind of distance(Mahanobis & Manhatan),at least we can predict percentage of truthment of this method with estimation erorr and it's clear that we want to decrease this erorr.I want to know that if we change the kind of prior probabilities and distance what is the influence on erorr??
Is there anybody to answer my question?
Thank U very much beforehand.
Shadi Sharif

Leave a comment


Type the characters you see in the picture above.

About this Entry

This page contains a single entry by Mentor Cana published on December 17, 2003 5:21 PM.

Scientific Research Backs Wisdom of Open Source was the previous entry in this blog.

open content, open communication everywhere! is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

August 2008

Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Powered by Movable Type 4.21-en
blog (author) = Mentor Cana, Ph.D. Candidate in Information Science at SCILS - Rutgers University.