Saturday, June 09, 2007

discriminant analysis

Chapter 18

Discriminate analysis is useful for analyzing data when the criterion or dependent variable is categorical and a predictor or independent variables are interval scaled. When the criterion variable has two categories, the technique is known as two-group discriminate analysis. Multiple discriminate analysis refers to the case when three or more categories are involved.

Conducting discriminate analysis is a five step procedure:
  1. first, formulating the discriminate problem requires identification of the objectives and the criterion and the predictor variables. The sample is divided into two parts. One part, the analysis sample, is used to estimate the discriminate function. The other part, the holdout sample, is reserved for validation.
  2. Estimation, the second step, involves developing a linear combination of the predictors, called discriminate functions, said that the groups differ as much as possible on the predictor values.
  3. Determination of statistical significance is the third step. It involves testing the null hypothesis that, in the population, the means of all discriminate functions in all groups are equal. If the null hypothesis is rejected, it is meaningful to interpret the results.
  4. The fourth step, the interpretation of discriminate weights or coefficients, it's similar to that in multiple regression analysis. Given the multicollinearity in the predictor variables, there is no unambiguous measure of the relative importance of the predictors and discriminating between the groups. However, some idea of the relative importance of the variables may be obtained by examining the absolute magnitude of the standardize discriminate function coefficients and by examining the structure correlations or discriminate loadings. These simple correlations between each predictor and the discriminate function represent the variance at the predictor shares with the function. Another aide to interpreting discriminate analysis results is to develop a characteristic profile for each group, based on the group means for the predictor variables.
  5. Validation, the fifth step, involves developing the classification matrix. The discriminate weights estimated by using the analysis sample are multiplied by the values of the predictor variables in the holdout sample to generate discriminate scores for the cases in the holdout sample. The cases are then assigned to groups based on their discriminate scores and an appropriate decision role. The percentage of cases correctly classified as determined and compared to the rate that would be expected by chance classification.
Two broad approaches are unavailable for estimating the coefficients. The direct method involves estimating the discriminate function so all the predictors are included simultaneously. An alternative is a stepwise method, in which the predictor variables are entered sequentially, based on their ability to discriminate among groups.

In multiple discriminate analysis, if there are G groups and k predictors, it is possible to estimate up to the smaller of G - 1 or k discriminate functions. The first function has the highest ratio of between group to within group sums of squares. The second function, uncorrelated with the first, has the second highest ratio, and so on.

Discriminate analysis -- a technique for analyzing marketing research data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature
discriminate functions -- the linear combination of independent variables developed by discriminate analysis that will best discriminate between the categories of the dependent variable
two-group discriminate analysis -- discriminate analysis technique where the criterion variable has two categories
multiple discriminate analysis -- discriminate analysis technique where the criterion variable involves three or more categories
discriminate analysis model -- the statistical model on which discriminate analysis is based
analysis sample -- part of the total sample that is used for estimation of the discriminate function
validation sample -- that part of the total sample used to check the results of the estimation sample
direct method -- an approach to discriminate analysis that involves estimating the discriminate function so that all the predictors are included simultaneously
stepwise discriminate analysis -- discriminate analysis in which the predictors are entered sequentially based on their ability to discriminate between the groups
characteristic profile -- an aide to interpreting discriminate analysis results by describing each group in terms of the group means for the predictor variables
hit ratio -- the percentage of cases correctly classified by the discriminate analysis
territorial map -- a tool for assessing discriminate analysis results that plots the group membership of each case on a graph
Mahalanobis procedure -- a stepwise procedure used in discriminate analysis to maximize a generalized measure of the distance between the two closest groups