2 Week 1 More Vocabulary Reading
Type of Statistics
When we talk about statistics, there are two major subdivisions: descriptive and inferential. A descriptive statistic is a statistic that describes just the data collected. Thinking about our diving Emperor penguins, the dive depths from 125 penguins were recorded and an average dive depth of 56.8 meters was found. In this case, 56.8 meters is a descriptive statistic if it is used to describe JUST those 125 penguins. Now, if the average of 56.8 meters was then stated as the average dive depth of ALL Emperor penguins, then it becomes an inferential statistic. Inferential statistics are statistics that have been generalized beyond just the data collected. The average of 56.8 meters was collected from only 125 penguins, but then generalized to ALL Emperor penguins.
Types of Research
When doing statistical research, there are several types that can be conducted. A descriptive study is a study that focuses on a single variable, and does not generalize the results beyond the data that was actually collected/studied. This type of study merely describes situations. Another type of research is a correlational study. In a correlational study, you are looking for a relationship between variables. Two variables are studied in a correlational study (they can be quantitative, qualitative, or one of each). Cause and effect can NOT be determined from correlational research. The independent variable of correlational research may be qualitative, two groups, that were pre-existing or otherwise NOT randomly assigned. Someone might be interested in whether vitamin E supplements are effective in preventing disease. Subjects for the study are asked if they regularly take vitamin E supplements or not. The researcher notices that the subjects who take vitamin E exhibit better health on average than those who do not. Does this prove that vitamin E is effective in disease prevention? NO. Subjects in the study merely reported whether they regularly used vitamin E or not; the researcher did not randomly assign subjects to take vitamin E or not. Without random assignment to groups, this study cannot draw cause and effect conclusions. The people who take vitamin E regularly may be taking other steps to improve their health: diet, exercise, other vitamins, not smoking etc… Any of these other factors (lurking variables) could be influencing the subjects’ health.
If you do wish to explore cause and effect relationships, you will have to conduct an experiment. The independent variable MUST contain at least two qualitative groups where subjects have to have been randomly assigned. A study is not considered to be an actual ‘experiment’ if the independent variable groups were not randomly assigned. The groups that make up the independent variable are often called treatments. Thinking about the vitamin E example, if subjects were randomly assigned to either take vitamin E regularly or not, then it would be possible to show that vitamin E is key to disease prevention, i.e. show causation. In this situation, the treatments would consist of the control group, those not taking the vitamin E, and the actual treatment group, those taking vitamin E. To have more confidence in your experimental results, it is important to repeat the study to confirm the results. This replication of results is critical when you wish to generalize a cause and effect relationship.
Later in this course, we will conduct both correlational research and experiments via statistical hypothesis testing.
Sampling Methods
Sampling is the selection of a portion or subset of a larger population, where we use that portion (the sample) to gain information about the population. Sampling methods are very important in statistics, so much so that we could dedicate an entire course to the topic! For now, we’ll briefly introduce several of the most typical approaches to sampling. To begin, a very common sampling method is the simple random sample. In this type of sample, each subject in the population has an equal chance of being chosen for the sample. One way to conduct a simple random sample is to assign all subjects a number, then use a random number generator to select the numbers corresponding to the number of subjects that you want for your study. This is often a preferred method, when possible, in order to minimize sampling bias and to obtain a representative sample.
Systematic sampling is a non-random sample, in which you ‘line up’ the subjects and select every k-th subject off the list. If you were doing a study related to the heights of FLC students, you might obtain an alphabetical list of students, then select every tenth student to be part of your sample. It’s important to note that systematic sampling may not be random and can lead to bias whenever the population is arranged according to a pattern. For example, in our alphabetical list of FLC students, two siblings won’t be selected if they have the same last name, whereas in a random sample, there is always a chance they are both chosen.
Another common sampling method that is not random at all is a sample of convenience. Maybe you would like to know about the average height of trees on campus, without actually measuring all of them, and you walk through campus and measure the first 15 trees that you come to. Super easy! But not necessarily a representative sample of all trees on campus.
Cluster sampling and stratified sampling are similar methods in which the population is first divided into clusters or strata. When doing cluster sampling, the population is divided into groups, often geographically, like when comparing temperature between the eastern US vs the western United States. When cluster sampling, the goal is to select clusters that individually represent the population reasonably well – this can be difficult to achieve in practice! For example if your clusters are, say, schools, they may not represent the variation that is found in the population. For stratified sampling, the strata are usually selected by some characteristic. Fort Lewis College students are divided up by classification (freshmen, sophomore, junior, senior, other) and then their attitude about recycling is recorded. In general, to use stratified sampling, several conditions must first be met. For example, we must be sure that subjects don’t fall into more than one strata (this isn’t a problem when dividing FLC students by classification)!
To hear a little more about the types of sampling methods, as well as unbiased and biased samples, check out Sampling: Simple Random, Convenience, Systematic, Cluster, Stratified video [4:53] and What are the Types of Sampling Techniques in Statistics [3:37].
Student Course Learning Objectives
- Define basic statistics vocabulary (e.g., levels of measurement (nominal, ordinal, interval, ratio), discrete vs. continuous variables, descriptive vs. inferential statistics, sample vs. population, independent vs. dependent variable, explanatory vs. response variable, confounding variables, experimental vs. observational)
Attributions
Adapted from “Week 1 More Vocabulary Reading” by Sherri Spriggs and Sandi Dang is licensed under CC BY-NC-SA 4.0.