Comparing Data Distributions | Solved Exercises

Solution: Exercises 1 to 3

1 Comparing Means

Exercise 1

Compare the mean scores of two classes: Class A: 75, 80, 85, 90, 95; Class B: 65, 70, 80, 85, 90. Which class has a higher mean?

Definition:

Mean: The average value calculated by adding all values and dividing by the number of values

Mean comparison method:

Calculate the mean for each data set separately
Compare the resulting means
Draw conclusions about which distribution has higher values on average

Class A

75,80,85,90,95

Class B

65,70,80,85,90

Means

A:85, B:78

Step 1: Calculate mean for Class A

Sum = 75 + 80 + 85 + 90 + 95 = 425

Mean = 425 ÷ 5 = 85

Step 2: Calculate mean for Class B

Sum = 65 + 70 + 80 + 85 + 90 = 390

Mean = 390 ÷ 5 = 78

Step 3: Compare the means

85 > 78, so Class A has a higher mean

Class A mean = 85, Class B mean = 78

Final answer:

Class A has a higher mean score (85) compared to Class B (78)

Applied rules:

• Formula: Mean = Sum of all values ÷ Number of values

• Comparison: Higher mean indicates higher average values

• Sensitivity: Mean is affected by outliers

2 Comparing Medians

Exercise 2

Compare the median scores of two data sets: Set 1: 12, 15, 18, 20, 25; Set 2: 10, 14, 16, 19, 30. Which has the higher median?

Definition:

Median: The middle value when data is arranged in order from least to greatest

Set 1

12,15,18,20,25

Set 2

10,14,16,19,30

Medians

Set 1:18, Set 2:16

Step 1: Identify middle values

Set 1: Data is already ordered, middle value is 18

Set 2: Data is already ordered, middle value is 16

Step 2: Compare medians

18 > 16, so Set 1 has the higher median

Step 3: Interpret the result

Set 1 has a higher median, indicating its middle value is greater

Set 1 median = 18, Set 2 median = 16

Final answer:

Set 1 has a higher median (18) compared to Set 2 (16)

Applied rules:

• Ordering: Always arrange data from least to greatest

• Robustness: Median is less affected by outliers

• Position: Median represents the 50th percentile

3 Comparing Ranges

Exercise 3

Compare the ranges of two data sets: Set X: 5, 10, 15, 20, 25; Set Y: 8, 12, 16, 20, 24. Which data set has greater variability?

Definition:

Range: The difference between the maximum and minimum values in a data set

Set X

Min:5, Max:25

Set Y

Min:8, Max:24

Ranges

X:20, Y:16

Step 1: Find minimum and maximum for Set X

Min = 5, Max = 25

Range = 25 - 5 = 20

Step 2: Find minimum and maximum for Set Y

Min = 8, Max = 24

Range = 24 - 8 = 16

Step 3: Compare the ranges

20 > 16, so Set X has greater variability

Set X range = 20, Set Y range = 16

Final answer:

Set X has greater variability with a range of 20 compared to Set Y's range of 16

Applied rules:

• Formula: Range = Maximum - Minimum

• Variability: Larger range indicates greater spread

• Limitation: Range only considers extremes, not internal distribution

Distribution Comparison Fundamentals

Mean = \(\frac{\sum x_i}{n}\)

Mean Formula

Mean Comparison

Average values

Central tendency measure

Median Comparison

Middle values

Robust central measure

Range Comparison

Spread measure

Variability indicator

Key definitions:

Data Distribution: The pattern of how data values are arranged across different values

Central Tendency: Measures that represent the center of a data distribution

Spread/Variability: Measures that describe how data values are dispersed

Shape: The pattern of the distribution (symmetric, skewed, uniform)

Outlier: A data point that is significantly different from others

Skewness: Asymmetry in the distribution of data

Data Distribution Comparison Process:

Organize data: Order data sets from least to greatest
Calculate measures: Find means, medians, and ranges
Compare central tendencies: See which distribution has higher values
Compare spreads: Determine which distribution has more variability
Identify patterns: Look for outliers, skewness, or clustering
Draw conclusions: Interpret what differences mean in context

Tip 1: Always order data before comparing medians.

Tip 2: Use median when distributions have outliers or are skewed.

Tip 3: Range gives basic spread info but ignores internal distribution.

Tip 4: Consider context when interpreting distribution differences.

Common errors: Forgetting to order data, confusing mean/median/range, not considering outliers.

Exam preparation: Practice calculating all measures, comparing distributions, interpreting results.

Solution: Exercises 4 to 5

4 Comparing Shapes

Exercise 4

Compare the shapes of two distributions: Set A: 1, 2, 3, 4, 5, 6, 7; Set B: 1, 1, 1, 2, 6, 7, 7. Describe their shapes.

Definition:

Distribution Shape: The pattern formed by data points when plotted, indicating symmetry or skewness

Set A

Uniform, Symmetric

Set B

Bimodal, Symmetric

Shape

Different

Step 1: Analyze Set A distribution

Set A: 1, 2, 3, 4, 5, 6, 7 - values are evenly spaced and increase uniformly

This creates a uniform distribution that is symmetric

Step 2: Analyze Set B distribution

Set B: 1, 1, 1, 2, 6, 7, 7 - values cluster at low and high ends

This creates a bimodal distribution with peaks at 1 and 7

Step 3: Compare the shapes

Set A is uniform and symmetric, Set B is bimodal with gaps in the middle

Set A: Uniform, Set B: Bimodal

Final answer:

Set A has a uniform distribution shape, while Set B has a bimodal distribution with clustering at the extremes

Applied rules:

• Shape identification: Look for patterns, clusters, gaps, and peaks

• Symmetry: Distribution balanced on both sides of center

• Modality: Number of peaks (unimodal, bimodal, multimodal)

5 Comprehensive Comparison

Exercise 5

Compare two quiz score distributions: Class Alpha: 70, 75, 80, 85, 90, 95; Class Beta: 60, 70, 80, 80, 90, 100. Compare all measures.

Definition:

Comprehensive Analysis: Examining all aspects of data distributions simultaneously

Class Alpha

Mean:82.5, Median:82.5

Class Beta

Mean:80, Median:80

Ranges

Alpha:25, Beta:40

Step 1: Calculate Class Alpha measures

Mean = (70+75+80+85+90+95)÷6 = 495÷6 = 82.5

Median = (80+85)÷2 = 82.5 (average of two middle values)

Range = 95-70 = 25

Step 2: Calculate Class Beta measures

Mean = (60+70+80+80+90+100)÷6 = 480÷6 = 80

Median = (80+80)÷2 = 80 (average of two middle values)

Range = 100-60 = 40

Step 3: Compare all measures

Alpha has higher mean (82.5 vs 80) and median (82.5 vs 80)

Beta has larger range (40 vs 25), indicating more variability

Step 4: Analyze distribution shapes

Alpha: Evenly distributed scores, symmetric

Beta: Has mode at 80, wider spread suggesting more variation

Alpha: Mean=82.5, Median=82.5, Range=25
Beta: Mean=80, Median=80, Range=40

Final answer:

Class Alpha has higher average scores (mean=82.5, median=82.5) but less variability (range=25). Class Beta has lower average scores (mean=80, median=80) but more variability (range=40).

Applied rules:

• Multiple measures: Consider mean, median, and range together

• Context: Higher mean doesn't necessarily mean better performance

• Variability: Larger range indicates less consistency

Data Distribution Comparison Summary

Range = Max - Min

Range Formula

Key definitions:

Data Distribution: The pattern of how data values are arranged across different values

Central Tendency: Measures (mean, median, mode) that represent the center of a distribution

Spread/Variability: Measures (range, interquartile range) that describe data dispersion

Shape: The pattern of the distribution (symmetric, skewed, uniform, bimodal)

Outlier: A data point significantly different from others

Skewness: Asymmetry in the distribution of data

Modality: Number of peaks in a distribution (unimodal, bimodal, multimodal)

Complete Distribution Comparison:

Data organization: Order each distribution from least to greatest
Central tendency: Calculate means and medians for each distribution
Variability measures: Calculate ranges for each distribution
Shape analysis: Examine patterns, clusters, gaps, and outliers
Comparison: Compare corresponding measures between distributions
Interpretation: Explain what differences mean in context

Tip 1: Mean is sensitive to outliers; median is more robust.

Tip 2: Range only considers extremes, not internal distribution.

Tip 3: Consider all measures together for complete understanding.

Tip 4: Always interpret results in the context of the situation.

Applications: Used in education, business, science, and research to compare groups.

Limitations: Range doesn't capture internal distribution; mean is affected by outliers.

Essential Concepts:

• Central tendency: Mean (affected by outliers), Median (robust)

• Variability: Range (simple but limited), Interquartile range (better measure)

• Shape: Symmetric, skewed left/right, uniform, bimodal

• Comparison: Higher mean/median indicates generally higher values

Solved Exercises on Comparing Data Distributions in Grade 8

Questions & Answers