Quiz 1
- The balls are mixed before sampling to get a random sample
- Sampling variation explains why the 33 groups of friends did not all have the same numbers of balls that were red out of 50, and hence different proportions red.
Quiz 2
- Sampling 50 balls and 60% were red is not likely
- Sampling 50 balls and 40% were red is likely
- Sampling 50 balls and 5% were red if impossible
- Sampling 50 balls and 110% were red is impossible
Quiz 3
- Population is a collection of individuals or observations that you are interested in
- Population parameter is a numerical summary quantity about the population that is unknown, but you want to know
- Census is when you count all individuals or observations in the population in order to compute the population parameter’s value exactly
- Sampling is the act of collecting a sample from the population when you don’t have the means to perform a census
- Point estimate is a summary statistic computed from a sample that estimates an unknown population parameter
- Representative sampling is one that roundly looks like the population
- Sample is generalizable if any result based on the sample can generalize to the population
- Sample is biased when certain individuals or observations in a population had a higher chance of being included in a sample than others
- Sample is unbiased when every observation in a population had an equal chance of being sampled
- sample is random when we sample randomly from the population in an unbiased fashion
- infer is to deduce or conclude information from evidence and reasoning in your sampling activities
Quiz 4
ggplot(virtual_prop_red_26, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 26 balls that were red", title = "26")
ggplot(virtual_prop_red_57, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 57 balls that were red", title = "57")
ggplot(virtual_prop_red_110, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 110 balls that were red", title = "26")
calculate Standard deviation
# A tibble: 1 x 1
sd
<dbl>
1 0.0909
# A tibble: 1 x 1
sd
<dbl>
1 0.0651
# A tibble: 1 x 1
sd
<dbl>
1 0.0443
- The distribution with sample size, n = 110, has the smallest standard deviation (spread) around the estimated proportion of red balls