Wednesday, May 30, 2007

data preparation

Chapter 14

Data preparation begins with a preliminary check of all questionnaires for completeness and interviewing quality. Then more thorough editing takes place. Editing consists of screening questionnaires to identify illegible, incomplete, inconsistent, or ambiguous responses. Such responses may be handled by returning questionnaires to the field, assigning missing values, or discarding the unsatisfactory respondents.

The next step is coding. A new work will or alphanumeric code is assigned to represent a specific response to a specific question, along with the column position that code will occupy. It is often helpful to prepare a codebook containing decoding instructions and the necessary information about the variables in the data set. The coded data are transcribed into disks or magnetic tapes were entered into computers via key punching. Mark sense forms, optical scanning, or computerized sensory analysis may also be used.

Cleaning the data requires consistency checks and treatment of missing responses. Options are available for treating missing responses include substitution of a neutral value such as the mean, substitution of an imputed response, case lies deletion, and pairwise deletion. Statistical adjustments such as weighting, variable re-specification, and scale transformations often and enhance the quality of data analysis. The selection of a data analysis strategy should be based on the earlier steps of the marketing research process, known characteristics of the data, properties of statistical techniques, and a background in philosophy of the researcher. Statistical techniques may be classified as univariate or multivariate.

Before analyzing the data in international marketing research, the researcher should ensure that the units of measurement are comparable across countries or cultural units.
The data analysis could be conducted at three levels:
  • individual
  • within country or cultural unit (intercultural analysis)
  • across countries or cultural units: pancultural or cross cultural analysis
Several ethical issues are related to data processing, particularly the discarding of unsatisfactory responses, violation of the assumptions underlying the data analysis techniques, and evaluation in interpretation of the results. The Internet and computers play a significant role in data preparation and analysis.

Editing -- a review of the questionnaires with the objective of increasing accuracy and precision
coding -- the assignment of a code to represent a specific response to a specific question along with the data record and column position that code will occupy
fixed-field codes -- a code in which the number of records for each respondent are the same, and the same data appear in the same columns for all respondents
codebook -- a book containing coding instructions and the necessary information about variables in the data set
data cleaning -- thorough and extensive checks for consistency and treatment of missing responses
consistency checks -- a part of the data cleaning process that identifies data that is out of range, logically inconsistent, or have extreme values. Data with values not defined by the coding scheme is inadmissible
missing responses -- values of a variable that are on men, as these respondents did not provide unambiguous answers to the question
casewise deletion -- a method for handling missing responses in which cases or respondents with any missing responses are discarded from the analysis
pairwise deletion -- a method of handling missing values in which all cases, or respondents, with any missing values are not automatically discarded, rather, for each calculation only the cases or respondents with complete responses are considered
weighting -- a statistical adjustment to the data in which each case or respondents in the database is assigned a weight to reflect its importance relative to other cases or respondents
variable respecification -- the transformation of data to create new variables or the modification of existing variables set that they are more consistent with the objectives of the study
dummy variables -- a respecification procedure using variables that take aren't only two values, usually zero or one
scale transformation -- and manipulation of scale values to ensure comparability with other scales or otherwise make the data suitable for analysis
standardization -- the process of correcting data to reduce them to the same scale by subtracting the sample mean and dividing by the standard deviation
univariate techniques -- statistical techniques appropriate for analyzing data when there is a single measurement of each element in the sample or, if there are several measurements on each element, each variable is analyzed in isolation
multivariate techniques -- statistical techniques suitable for analyzing data when there are two or more measurements on each element in the variables are analyzed simultaneously. Multivariate techniques are concerned with the simultaneous relationships among two or more phenomena
metric data -- data that is interval or ratio in nature
nonmetric data -- data derived from a nominal or ordinal scale
Independent -- the samples are independent if they are drawn randomly from different populations
paired -- the samples are paired when the data for the two samples relate to the same group of respondents
dependence techniques -- multivariate techniques appropriate when one or more of the variables can be identified as dependent variables and the remaining as independent variables
interdependence techniques -- multivariate statistical techniques that attempt to group data based on underlying similarity, and does allow for interpretation of the data structures. No distinction is made as to which variables are dependent and which are independent
intracultural analysis -- within country analysis of international data
pancultural analysis -- across countries analysis in which the data for all respondents from all the countries are pooled and analyzed
cross-cultural analysis -- a type of a cross countries analysis in which the data could be aggregated for each country and these aggregate statistics analyzed