Solved Exercises on Comparing Data Distributions in Grade 8

Master comparing data distributions: mean, median, range, and distribution shapes through these 5 detailed exercises.

Solution: Exercises 1 to 3
1 Comparing Means
Exercise 1
Compare the mean scores of two classes: Class A: 75, 80, 85, 90, 95; Class B: 65, 70, 80, 85, 90. Which class has a higher mean?
Definition:

Mean: The average value calculated by adding all values and dividing by the number of values

Mean comparison method:
  1. Calculate the mean for each data set separately
  2. Compare the resulting means
  3. Draw conclusions about which distribution has higher values on average
Class A
75,80,85,90,95
Class B
65,70,80,85,90
Means
A:85, B:78
Step 1: Calculate mean for Class A

Sum = 75 + 80 + 85 + 90 + 95 = 425

Mean = 425 ÷ 5 = 85

Step 2: Calculate mean for Class B

Sum = 65 + 70 + 80 + 85 + 90 = 390

Mean = 390 ÷ 5 = 78

Step 3: Compare the means

85 > 78, so Class A has a higher mean

Class A mean = 85, Class B mean = 78
Final answer:

Class A has a higher mean score (85) compared to Class B (78)

Applied rules:

Formula: Mean = Sum of all values ÷ Number of values

Comparison: Higher mean indicates higher average values

Sensitivity: Mean is affected by outliers

2 Comparing Medians
Exercise 2
Compare the median scores of two data sets: Set 1: 12, 15, 18, 20, 25; Set 2: 10, 14, 16, 19, 30. Which has the higher median?
Definition:

Median: The middle value when data is arranged in order from least to greatest

Set 1
12,15,18,20,25
Set 2
10,14,16,19,30
Medians
Set 1:18, Set 2:16
Step 1: Identify middle values

Set 1: Data is already ordered, middle value is 18

Set 2: Data is already ordered, middle value is 16

Step 2: Compare medians

18 > 16, so Set 1 has the higher median

Step 3: Interpret the result

Set 1 has a higher median, indicating its middle value is greater

Set 1 median = 18, Set 2 median = 16
Final answer:

Set 1 has a higher median (18) compared to Set 2 (16)

Applied rules:

Ordering: Always arrange data from least to greatest

Robustness: Median is less affected by outliers

Position: Median represents the 50th percentile

3 Comparing Ranges
Exercise 3
Compare the ranges of two data sets: Set X: 5, 10, 15, 20, 25; Set Y: 8, 12, 16, 20, 24. Which data set has greater variability?
Definition:

Range: The difference between the maximum and minimum values in a data set

Set X
Min:5, Max:25
Set Y
Min:8, Max:24
Ranges
X:20, Y:16
Step 1: Find minimum and maximum for Set X

Min = 5, Max = 25

Range = 25 - 5 = 20

Step 2: Find minimum and maximum for Set Y

Min = 8, Max = 24

Range = 24 - 8 = 16

Step 3: Compare the ranges

20 > 16, so Set X has greater variability

Set X range = 20, Set Y range = 16
Final answer:

Set X has greater variability with a range of 20 compared to Set Y's range of 16

Applied rules:

Formula: Range = Maximum - Minimum

Variability: Larger range indicates greater spread

Limitation: Range only considers extremes, not internal distribution

Distribution Comparison Fundamentals
Mean = \(\frac{\sum x_i}{n}\)
Mean Formula
Mean Comparison
Average values
Central tendency measure
Median Comparison
Middle values
Robust central measure
Range Comparison
Spread measure
Variability indicator
Key definitions:

Data Distribution: The pattern of how data values are arranged across different values

Central Tendency: Measures that represent the center of a data distribution

Spread/Variability: Measures that describe how data values are dispersed

Shape: The pattern of the distribution (symmetric, skewed, uniform)

Outlier: A data point that is significantly different from others

Skewness: Asymmetry in the distribution of data

Data Distribution Comparison Process:
  1. Organize data: Order data sets from least to greatest
  2. Calculate measures: Find means, medians, and ranges
  3. Compare central tendencies: See which distribution has higher values
  4. Compare spreads: Determine which distribution has more variability
  5. Identify patterns: Look for outliers, skewness, or clustering
  6. Draw conclusions: Interpret what differences mean in context
Tip 1: Always order data before comparing medians.
Tip 2: Use median when distributions have outliers or are skewed.
Tip 3: Range gives basic spread info but ignores internal distribution.
Tip 4: Consider context when interpreting distribution differences.
Common errors: Forgetting to order data, confusing mean/median/range, not considering outliers.
Exam preparation: Practice calculating all measures, comparing distributions, interpreting results.
Solution: Exercises 4 to 5
4 Comparing Shapes
Exercise 4
Compare the shapes of two distributions: Set A: 1, 2, 3, 4, 5, 6, 7; Set B: 1, 1, 1, 2, 6, 7, 7. Describe their shapes.
Definition:

Distribution Shape: The pattern formed by data points when plotted, indicating symmetry or skewness

Set A
Uniform, Symmetric
Set B
Bimodal, Symmetric
Shape
Different
Step 1: Analyze Set A distribution

Set A: 1, 2, 3, 4, 5, 6, 7 - values are evenly spaced and increase uniformly

This creates a uniform distribution that is symmetric

Step 2: Analyze Set B distribution

Set B: 1, 1, 1, 2, 6, 7, 7 - values cluster at low and high ends

This creates a bimodal distribution with peaks at 1 and 7

Step 3: Compare the shapes

Set A is uniform and symmetric, Set B is bimodal with gaps in the middle

Set A: Uniform, Set B: Bimodal
Final answer:

Set A has a uniform distribution shape, while Set B has a bimodal distribution with clustering at the extremes

Applied rules:

Shape identification: Look for patterns, clusters, gaps, and peaks

Symmetry: Distribution balanced on both sides of center

Modality: Number of peaks (unimodal, bimodal, multimodal)

5 Comprehensive Comparison
Exercise 5
Compare two quiz score distributions: Class Alpha: 70, 75, 80, 85, 90, 95; Class Beta: 60, 70, 80, 80, 90, 100. Compare all measures.
Definition:

Comprehensive Analysis: Examining all aspects of data distributions simultaneously

Class Alpha
Mean:82.5, Median:82.5
Class Beta
Mean:80, Median:80
Ranges
Alpha:25, Beta:40
Step 1: Calculate Class Alpha measures

Mean = (70+75+80+85+90+95)÷6 = 495÷6 = 82.5

Median = (80+85)÷2 = 82.5 (average of two middle values)

Range = 95-70 = 25

Step 2: Calculate Class Beta measures

Mean = (60+70+80+80+90+100)÷6 = 480÷6 = 80

Median = (80+80)÷2 = 80 (average of two middle values)

Range = 100-60 = 40

Step 3: Compare all measures

Alpha has higher mean (82.5 vs 80) and median (82.5 vs 80)

Beta has larger range (40 vs 25), indicating more variability

Step 4: Analyze distribution shapes

Alpha: Evenly distributed scores, symmetric

Beta: Has mode at 80, wider spread suggesting more variation

Alpha: Mean=82.5, Median=82.5, Range=25
Beta: Mean=80, Median=80, Range=40
Final answer:

Class Alpha has higher average scores (mean=82.5, median=82.5) but less variability (range=25). Class Beta has lower average scores (mean=80, median=80) but more variability (range=40).

Applied rules:

Multiple measures: Consider mean, median, and range together

Context: Higher mean doesn't necessarily mean better performance

Variability: Larger range indicates less consistency

Data Distribution Comparison Summary
Range = Max - Min
Range Formula
Key definitions:

Data Distribution: The pattern of how data values are arranged across different values

Central Tendency: Measures (mean, median, mode) that represent the center of a distribution

Spread/Variability: Measures (range, interquartile range) that describe data dispersion

Shape: The pattern of the distribution (symmetric, skewed, uniform, bimodal)

Outlier: A data point significantly different from others

Skewness: Asymmetry in the distribution of data

Modality: Number of peaks in a distribution (unimodal, bimodal, multimodal)

Complete Distribution Comparison:
  1. Data organization: Order each distribution from least to greatest
  2. Central tendency: Calculate means and medians for each distribution
  3. Variability measures: Calculate ranges for each distribution
  4. Shape analysis: Examine patterns, clusters, gaps, and outliers
  5. Comparison: Compare corresponding measures between distributions
  6. Interpretation: Explain what differences mean in context
Tip 1: Mean is sensitive to outliers; median is more robust.
Tip 2: Range only considers extremes, not internal distribution.
Tip 3: Consider all measures together for complete understanding.
Tip 4: Always interpret results in the context of the situation.
Applications: Used in education, business, science, and research to compare groups.
Limitations: Range doesn't capture internal distribution; mean is affected by outliers.
Essential Concepts:

Central tendency: Mean (affected by outliers), Median (robust)

Variability: Range (simple but limited), Interquartile range (better measure)

Shape: Symmetric, skewed left/right, uniform, bimodal

Comparison: Higher mean/median indicates generally higher values

Questions & Answers

Question: When comparing two data sets, which measure should I look at first - mean, median, or range?

Answer: Start with the median when comparing distributions:

  • Median first: It's robust and gives you a sense of the center without being affected by outliers
  • Then mean: Compare with median to see if outliers or skewness affect the average
  • Finally range: Assess the spread and variability of each distribution

If median and mean are similar, the distribution is likely symmetric. If they differ significantly, there may be outliers or skewness that affects the mean.

This sequence gives you a complete picture: center, potential outliers/skewness, and spread.

Question: What if one data set has a higher mean but the other has a higher median? What does that tell me?

Answer: This indicates the presence of outliers or skewness in the distributions:

  • Higher mean, lower median: The distribution has high outliers or is skewed right
  • Lower mean, higher median: The distribution has low outliers or is skewed left

For example, if Distribution A has mean=90 and median=85, but Distribution B has mean=85 and median=90, then Distribution A likely has some unusually high values pulling the mean up, while Distribution B likely has some unusually low values pulling the mean down.

The median is more reliable for comparing centers when distributions have outliers.

Question: How do I know if the difference between two means is significant or just due to chance?

Answer: At the grade 8 level, we assess significance by:

  1. Size of difference: How large is the difference relative to the data values?
  2. Sample size: More data points make differences more meaningful
  3. Spread of data: Large ranges might explain differences
  4. Context: Does the difference make practical sense?

For example, a difference of 2 points between means of 80 and 82 might not be significant if the ranges are large (20-30 points), but a difference of 10 points between means of 70 and 80 would be more significant.

Advanced statistical tests are needed for formal significance testing.

Question: What's the difference between comparing distributions and just comparing means? Why is it important to look at more than one measure?

Answer: Comparing distributions gives a complete picture, while comparing only means provides limited information:

  • Means only: Tells you about average values but not consistency or spread
  • Complete comparison: Reveals center, spread, shape, and outliers

For example, two classes might have the same average test score (mean), but one class might have consistent scores around the mean while the other has very high and very low scores that average out. The range and shape would reveal this important difference in consistency.

Looking at multiple measures prevents misleading conclusions based on a single statistic.

Question: Can two distributions have the same mean but different shapes? Give an example.

Answer: Yes, absolutely! Here's an example:

Distribution A: 70, 75, 80, 85, 90 (evenly spaced)

Distribution B: 60, 70, 80, 90, 100 (same mean of 80, but wider spread)

Both have mean = 80, but:
Distribution A: More consistent scores, smaller range (20)
Distribution B: More variable scores, larger range (40)

Another example: Distribution C = 78, 79, 80, 81, 82 (clustered around mean) vs Distribution D = 60, 60, 80, 100, 100 (same mean of 80 but very different shapes).

This shows why looking at multiple measures is crucial for understanding distributions.