Tuesday, October 17, 2006

Statistics Chapter 1 -- data collection

Statistics -- the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions
descriptive statistics -- consists of organizing and summarizing the information collected
inferential statistics -- methods that takes results obtained from a sample, extends them to the population, and measures the reliability of the result
qualitative or categorical variables -- allow for classification of individuals based on some attribute or characteristic
quantitative variables -- provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results
variables -- the characteristics of the individuals within the population
approach -- a way to look at and organize a problem so that it can be solved
discrete variable -- a quantitative variable that has either a finite number of possible values or a countable number of possible values. The term countable means that the values result from counting, such as 0, 1, 2, 3, and so on
continuous variable -- is a quantitative variable that has an infinite number of possible values that are not countable
Census -- a list of all individuals in a population along with certain characteristics of each individual
observational study -- measures the characteristics of the population by studying individuals in the sample, but does not attempt to manipulate or influence the variables of interest
designed experiment -- applies a treatment to individuals (referred to as experimental units or subjects) and attempts to isolate the effects of the treatment on a response variable
lurking variables -- characteristics that may be related to an outcome but not identified in the study
stratified sample -- obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous (or similar) in some way
systematic sample -- obtained by selecting every kth individual from the population. The first individuals selected corresponds to a random number between one and k
cluster sample -- obtained by selecting all individuals within a randomly selected collection or group of individuals
convenience sampling -- a sample in which the individuals are easily obtained
nonsampling errors -- errors is that result from the survey process. They are due to the nonresponse of individuals selected to be in the survey, to an accurate responses, too poorly worded questions, to bias in the selection of individuals to be given the survey, and so on
sampling error -- error that results from using sampling to estimate information regarding a population. This type of error occurs because a sample gives incomplete information about the population
designed experiment -- a controlled study conducted to determine the effect that varying one or more explanatory variables house on a response variable. The explanatory variables are often called factors. The response Venerable represents the veritable of interest. Control, manipulation, randomization, and replication by the key ingredients of a well designed experiment
treatment -- any combination of the values of each factor
experimental unit/subject -- person, object, or some other well defined item to which a treatment is applied
double-blind -- neither the experimental unit nor the experimenter knows what treatment is being administered to the experimental unit
placebo -- innocuous medication with no medicinal value
completely randomized design -- 1 in which each experimental unit is randomly assigned to a treatment
matched pairs design -- an experimental design in which the experimental units are paired up. The pairs are matched up so that they are somehow related. There are only two levels of treatment in a matched pair design
block -- each group of homogeneous individuals
blocking -- grouping similar homogeneous experimental units together and then randomizing the experimental units within each group to a treatment
confounding -- occurs when the effect of two factors on the response variable cannot be distinguished
randomized block design -- used when the experimental units are divided into homogeneous groups called blocks. Within each block, the experimental units are randomly assigned to treatments


Summary
we defined statistics of the science in which data are collected, organized, summarized, and analyzed to and for characteristics regarding a population. Descriptive statistics consists of organizing and summarizing information, while inferential statistics consists of drawing conclusions about population based on results obtained from a sample. The population is a collection of individuals on which the study is made, and the sample is a subset of the population.

Data are the observations of a variable. Data can either be qualitative or quantitative. Quantitative data are either discrete or continuous.

Data can be obtained from four sources: a census, existing sources, survey sampling, or a designed experiment. A census will list all of the individuals in the population, along with certain characteristics. Do to the costs of obtaining a census, most researchers opts for obtaining a sample. In observational studies, the veritable of interest has already been established. For this reason, they are often referred to as ex post facto studies. Designed experiments are used when control of the individuals in the study is desired to isolate the affect of a certain treatment on the response variable.

Five sampling methods:
simple random sampling
stratified sampling
systematic sampling
cluster sampling
convenience sampling

Convenience sampling typically leads to an on representative sample and biased results.