Comparing Distributions | Solved Exercises

Solution: Exercises 1 to 3

1 Center Comparison

Exercise 1

Compare the centers of these two data sets: Set A: 10, 12, 14, 16, 18; Set B: 8, 10, 12, 14, 16. Which set has a higher center?

Definition:

Center of distribution: A measure of the typical value in a data set, usually the mean or median.

Method:

Calculate the mean of each data set
Compare the means to determine which center is higher
Alternatively, find the median if data is ordered

Step 1: Calculate mean of Set A

Mean A = (10 + 12 + 14 + 16 + 18) ÷ 5 = 70 ÷ 5 = 14

Step 2: Calculate mean of Set B

Mean B = (8 + 10 + 12 + 14 + 16) ÷ 5 = 60 ÷ 5 = 12

Step 3: Compare the centers

Mean A = 14, Mean B = 12, so Set A has a higher center

Set A: (10+12+14+16+18) ÷ 5

= 14

Set B: (8+10+12+14+16) ÷ 5

= 12

Comparison: 14 vs 12

Set A is higher

Stat Set A Set B Mean 14 12 Median 14 12

Set A has higher center (mean = 14)

Final answer:

Set A has a higher center with a mean of 14 compared to Set B's mean of 12.

Applied rules:

• Mean calculation: Sum of values ÷ Number of values

• Center comparison: Higher mean indicates higher center

• Central tendency: Mean represents typical value

2 Spread Comparison

Exercise 2

Compare the spread of these two data sets: Set C: 5, 10, 15, 20, 25; Set D: 10, 12, 14, 16, 18. Which set has a larger spread?

Definition:

Spread of distribution: How spread out the data values are from the center, measured by range or interquartile range.

Step 1: Calculate range of Set C

Range C = 25 - 5 = 20

Step 2: Calculate range of Set D

Range D = 18 - 10 = 8

Step 3: Compare the ranges

Range C = 20, Range D = 8, so Set C has a larger spread

Set C: Max - Min = 25 - 5

= 20

Set D: Max - Min = 18 - 10

= 8

Comparison: 20 vs 8

Set C has larger spread

Measure Set C Set D Range 20 8 Mean 15 14

Set C has larger spread (range = 20)

Final answer:

Set C has a larger spread with a range of 20 compared to Set D's range of 8.

Applied rules:

• Range calculation: Maximum - Minimum

• Spread comparison: Larger range indicates greater spread

• Variability measure: Range shows overall spread

3 Shape Comparison

Exercise 3

Compare the shapes of these data sets: Set E: 2, 3, 4, 5, 6, 7, 8; Set F: 1, 2, 2, 3, 8, 8, 9. Describe the shape of each and how they differ.

Definition:

Shape of distribution: The pattern of how data is distributed, describing symmetry, skewness, and clustering.

Step 1: Analyze Set E

Set E: 2, 3, 4, 5, 6, 7, 8 - Values are evenly spaced, symmetrical around the center

Step 2: Analyze Set F

Set F: 1, 2, 2, 3, 8, 8, 9 - Values cluster at low and high ends, creating a bimodal shape

Step 3: Compare the shapes

Set E is symmetric and uniform, Set F is bimodal with clustering at extremes

Set E: 2, 3, 4, 5, 6, 7, 8

Symmetric, uniform

Set F: 1, 2, 2, 3, 8, 8, 9

Bimodal, clustered

Comparison: Shape difference

Set E symmetric, Set F bimodal

Set Data Points Shape E 2, 3, 4, 5, 6, 7, 8 Symmetric F 1, 2, 2, 3, 8, 8, 9 Bimodal

Set E: Symmetric, Set F: Bimodal

Final answer:

Set E has a symmetric, uniform distribution, while Set F has a bimodal distribution with clustering at the extremes.

Applied rules:

• Shape analysis: Examine clustering and symmetry of values

• Pattern recognition: Look for peaks, gaps, and overall distribution

• Distribution types: Symmetric, skewed, bimodal, uniform

Solution: Exercises 4 to 6

4 Outlier Detection

Exercise 4

Identify any outliers in these data sets: Set G: 12, 14, 15, 16, 17, 18, 50; Set H: 8, 9, 10, 11, 12, 13, 14.

Definition:

Outlier: A data point that is significantly different from other values in the data set.

Step 1: Examine Set G

Set G: 12, 14, 15, 16, 17, 18, 50 - The value 50 is much higher than the others

Step 2: Examine Set H

Set H: 8, 9, 10, 11, 12, 13, 14 - All values are close together

Step 3: Identify outliers

Set G has an outlier at 50, Set H has no outliers

Set G: 12, 14, 15, 16, 17, 18, 50

Outlier: 50

Set H: 8, 9, 10, 11, 12, 13, 14

No outliers

Conclusion: Outlier impact

Set G mean increased significantly

Set Data Points Outliers Effect on Mean G 12, 14, 15, 16, 17, 18, 50 50 Increases mean H 8, 9, 10, 11, 12, 13, 14 None Normal mean

Set G: Outlier at 50, Set H: No outliers

Final answer:

Set G has an outlier at 50, which significantly affects the mean. Set H has no outliers.

Applied rules:

• Outlier identification: Values that are much higher or lower than others

• Impact on statistics: Outliers can significantly affect mean

• Visual inspection: Look for gaps or extreme values

5 Comparing Means and Medians

Exercise 5

Compare the mean and median of these data sets: Set I: 5, 10, 15, 20, 25; Set J: 1, 2, 3, 4, 50. How do outliers affect the measures of center?

Definition:

Robust measure: A statistic that is not heavily affected by outliers, like the median.

Step 1: Calculate mean and median for Set I

Mean I = (5+10+15+20+25) ÷ 5 = 15, Median I = 15

Step 2: Calculate mean and median for Set J

Mean J = (1+2+3+4+50) ÷ 5 = 12, Median J = 3

Step 3: Compare and analyze

Set I: Mean = Median (symmetric), Set J: Mean ≠ Median (affected by outlier)

Set I: Mean = (5+10+15+20+25) ÷ 5

= 15

Set I: Median (middle value)

= 15

Set J: Mean = (1+2+3+4+50) ÷ 5

= 12

Set J: Median (middle value)

= 3

Set Data Mean Median Outlier Present I 5, 10, 15, 20, 25 15 15 No J 1, 2, 3, 4, 50 12 3 Yes (50)

Set I: Mean = Median = 15, Set J: Mean (12) ≠ Median (3)

Final answer:

Set I has equal mean and median due to symmetry. Set J's mean (12) is much higher than its median (3) due to the outlier.

Applied rules:

• Mean sensitivity: Affected by extreme values

• Median robustness: Less affected by outliers

• Skewed distributions: Mean pulled toward outlier

6 Interquartile Range Comparison

Exercise 6

Compare the interquartile ranges of these data sets: Set K: 10, 20, 30, 40, 50, 60, 70; Set L: 15, 25, 35, 45, 55, 65, 75.

Definition:

Interquartile range (IQR): The range of the middle 50% of the data, calculated as Q3 - Q1.

Step 1: Find Q1 and Q3 for Set K

Q1 = 20, Q3 = 60, IQR = 60 - 20 = 40

Step 2: Find Q1 and Q3 for Set L

Q1 = 25, Q3 = 65, IQR = 65 - 25 = 40

Step 3: Compare the IQRs

Both sets have the same IQR of 40, indicating similar spread in the middle half

Set K: Q1 = 20, Q3 = 60

IQR = 40

Set L: Q1 = 25, Q3 = 65

IQR = 40

Comparison: Both IQRs

Equal at 40

Set Data Q1 Q3 IQR K 10, 20, 30, 40, 50, 60, 70 20 60 40 L 15, 25, 35, 45, 55, 65, 75 25 65 40

Both sets have equal IQR = 40

Final answer:

Both data sets have the same interquartile range of 40, indicating that the middle 50% of both distributions have the same spread.

Applied rules:

• Quartile calculation: Q1 = median of lower half, Q3 = median of upper half

• IQR formula: IQR = Q3 - Q1

• Spread measure: IQR represents middle 50% of data

Distribution Comparison Visual Guide

Mean, Median, Mode, Range, IQR

Key Statistical Measures

Mean

Average value

Median

Middle value

Mode

Most frequent value

Range

Max - Min

Distribution Comparison Process:

Step 1: Calculate center measures (mean, median)

Step 2: Calculate spread measures (range, IQR)

Step 3: Analyze shape (symmetry, skewness)

Step 4: Identify outliers

Step 5: Compare the distributions systematically

Tip 1: Always compare the same statistical measures between distributions.

Tip 2: Use both center and spread measures for complete comparison.

Tip 3: Consider how outliers affect the mean differently than the median.

Common errors: Comparing different measures, ignoring outliers, focusing only on center.

Success strategies: Systematic approach, multiple measures, visual inspection.

Essential concepts:

• Center: Mean, median, mode indicate typical values

• Spread: Range, IQR indicate variability

• Shape: Symmetry, skewness, modality describe pattern

• Outliers: Extreme values that stand apart

Solution: Exercises 7 to 10

7 Real-World Data Comparison

Exercise 7

Compare the test scores of two classes: Class A: 78, 82, 85, 88, 90, 92, 95; Class B: 65, 70, 75, 80, 85, 90, 95. Which class performed better overall?

Definition:

Performance comparison: Evaluating which group has better overall results using statistical measures.

Step 1: Calculate mean for Class A

Mean A = (78+82+85+88+90+92+95) ÷ 7 = 610 ÷ 7 ≈ 87.1

Step 2: Calculate mean for Class B

Mean B = (65+70+75+80+85+90+95) ÷ 7 = 560 ÷ 7 = 80

Step 3: Compare centers and spreads

Class A: Mean ≈ 87.1, Range = 17; Class B: Mean = 80, Range = 30

Class A mean: (78+82+...+95) ÷ 7

≈ 87.1

Class B mean: (65+70+...+95) ÷ 7

= 80

Comparison: 87.1 vs 80

Class A performed better

Class Mean Median Range Overall Performance A 87.1 88 17 Better B 80 80 30 Lower

Class A performed better (mean = 87.1 vs 80)

Final answer:

Class A performed better overall with a mean score of approximately 87.1 compared to Class B's mean of 80.

Applied rules:

• Mean comparison: Higher mean indicates better performance

• Spread consideration: Class A also has less variability

• Performance evaluation: Compare multiple measures

8 Temperature Data Analysis

Exercise 8

Compare temperatures in two cities: City X: 65°, 68°, 70°, 72°, 75°; City Y: 55°, 60°, 70°, 80°, 85°. Which city has more consistent temperatures?

Definition:

Consistency: How close the data values are to each other, measured by the spread of the distribution.

Step 1: Calculate range for City X

Range X = 75° - 65° = 10°

Step 2: Calculate range for City Y

Range Y = 85° - 55° = 30°

Step 3: Compare consistency

City X has a smaller range (10°) than City Y (30°), so City X has more consistent temperatures

City X: 65° to 75°

Range = 10°

City Y: 55° to 85°

Range = 30°

Consistency comparison

City X more consistent

City Temperatures Mean Range Consistency X 65°, 68°, 70°, 72°, 75° 70° 10° More consistent Y 55°, 60°, 70°, 80°, 85° 70° 30° Less consistent

City X: More consistent (range = 10°)

Final answer:

City X has more consistent temperatures with a range of 10° compared to City Y's range of 30°.

Applied rules:

• Consistency measure: Smaller range indicates more consistency

• Spread comparison: Compare ranges directly

• Equal means: Despite same mean, spreads differ significantly

9 Speed Comparison Analysis

Exercise 9

Two runners' speeds in mph: Runner A: 5, 6, 7, 8, 9; Runner B: 4, 5, 6, 7, 10. Compare their average speeds and consistency.

Definition:

Performance metrics: Average speed and consistency are key measures of athletic performance.

Step 1: Calculate mean speed for each runner

Runner A: (5+6+7+8+9) ÷ 5 = 35 ÷ 5 = 7 mph

Runner B: (4+5+6+7+10) ÷ 5 = 32 ÷ 5 = 6.4 mph

Step 2: Calculate range for each runner

Runner A: 9 - 5 = 4 mph, Runner B: 10 - 4 = 6 mph

Step 3: Compare and conclude

Runner A is faster (7 mph vs 6.4 mph) and more consistent (range 4 vs 6)

Runner A mean: (5+6+7+8+9) ÷ 5

= 7 mph

Runner B mean: (4+5+6+7+10) ÷ 5

= 6.4 mph

Consistency: A range = 4, B range = 6

A more consistent

Runner Speeds Mean Speed Range Performance A 5, 6, 7, 8, 9 7 mph 4 mph Better B 4, 5, 6, 7, 10 6.4 mph 6 mph Less consistent

Runner A: Faster (7 mph) and more consistent (range = 4)

Final answer:

Runner A has a higher average speed (7 mph vs 6.4 mph) and is more consistent (range of 4 vs 6).

Applied rules:

• Mean calculation: Sum of values ÷ Count

• Range calculation: Maximum - Minimum

• Performance evaluation: Compare both center and spread

10 Comprehensive Distribution Analysis

Exercise 10

Analyze and compare these data sets: Set M: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; Set N: 1, 1, 2, 2, 8, 8, 9, 9, 10, 10. Discuss center, spread, shape, and outliers.

Definition:

Comprehensive analysis: Examining all aspects of a distribution: center, spread, shape, and unusual features.

Step 1: Calculate center measures

Set M: Mean = 5.5, Median = 5.5; Set N: Mean = 5.5, Median = 5.5

Step 2: Calculate spread measures

Set M: Range = 9, IQR = 5; Set N: Range = 9, IQR = 8

Step 3: Analyze shapes

Set M: Uniform distribution; Set N: Bimodal with clustering at extremes

Step 4: Compare distributions

Same center, different spread and shape

Centers: Both means = 5.5

Equal centers

Spreads: M range = 9, N range = 9

Equal ranges

Shapes: M = uniform, N = bimodal

Different shapes

Aspect Set M Set N Mean 5.5 5.5 Median 5.5 5.5 Range 9 9 IQR 5 8 Shape Uniform Bimodal

Same center, different spread and shape

Final answer:

Both sets have the same center (mean and median = 5.5) and range (9), but Set N has a larger IQR (8 vs 5) and a bimodal shape compared to Set M's uniform shape.

Applied rules:

• Multiple measures: Compare center, spread, and shape

• IQR sensitivity: IQR captures middle 50% spread better than range

• Shape importance: Different shapes indicate different patterns

Comprehensive Summary: Comparing Distributions

Core Concepts & Definitions:

Distribution: The pattern of variation in a set of data, showing how data values are arranged.

Center: A typical or representative value in a data set, measured by mean, median, or mode.

Spread: How spread out the data values are, measured by range, interquartile range (IQR), or standard deviation.

Shape: The overall pattern of the data distribution, including symmetry, skewness, and modality.

Mean: The average of all values in a data set, calculated by adding all values and dividing by the count.

Median: The middle value when data is arranged in order; less affected by outliers than the mean.

Mode: The value that appears most frequently in a data set.

Range: The difference between the maximum and minimum values in a data set.

Interquartile Range (IQR): The range of the middle 50% of the data, calculated as Q3 - Q1.

Outlier: A data point that is significantly different from other values in the data set.

Core Rules & Principles:

Essential Principles:

Compare the same statistical measures between distributions
Consider both center and spread when comparing distributions
Identify and discuss outliers when present
Analyze the shape of distributions to understand patterns

Key Formulas:

Mean = (Sum of all values) ÷ (Number of values)
Range = Maximum value - Minimum value
IQR = Third quartile (Q3) - First quartile (Q1)
Median: Middle value in ordered data set

Step-by-Step Comparison Process:

Calculate center measures: Find mean and median for each distribution
Calculate spread measures: Find range and IQR for each distribution
Analyze shape: Look for symmetry, skewness, and modality
Identify outliers: Look for values significantly different from others
Compare systematically: Contrast each measure between distributions
Draw conclusions: Summarize similarities and differences

Examples & Applications:

Simple Comparison Example:

Set A: 10, 12, 14, 16, 18 (Mean = 14, Range = 8)
Set B: 8, 10, 12, 14, 16 (Mean = 12, Range = 8)
Conclusion: Set A has higher center but same spread

Outlier Impact Example:

Set C: 5, 10, 15, 20, 25 (Mean = 15, Median = 15)
Set D: 5, 10, 15, 20, 50 (Mean = 20, Median = 15)
Conclusion: Outlier increases mean but not median

Shape Difference Example:

Set E: 1, 2, 3, 4, 5 (Uniform/symmetric)
Set F: 1, 1, 1, 5, 5 (Bimodal)
Conclusion: Different shapes despite same range

Tips, Tricks & Common Mistakes:

Tips & Tricks:

Always calculate multiple measures (center and spread) for complete comparison
Look for outliers before calculating mean, as they can skew results
Use median instead of mean when outliers are present
Consider the context when interpreting results
Visualize data when possible to see patterns more clearly

Common Mistakes:

Only comparing centers and ignoring spread
Not identifying outliers that affect the mean
Comparing different statistical measures between distributions
Forgetting to consider the shape of distributions

Key Notes for Memorization:

Mean is sensitive to outliers, median is not
Range is affected by outliers, IQR is not
Always compare like with like (mean vs mean, not mean vs median)
Center tells you about typical values
Spread tells you about variability
Shape tells you about the pattern of distribution
IQR focuses on the middle 50% of data

Additional Distribution Comparison Practice

Mean, Median, Range, IQR

Key Statistical Measures

Key definitions:

Distribution comparison: Systematically analyzing and contrasting two or more data sets.

Statistical measures: Quantitative values that describe characteristics of data sets.

Data analysis: The process of examining, cleaning, and interpreting data.

Comparison methodology:

Collect: Organize data from each distribution
Calculate: Compute relevant statistical measures
Compare: Contrast measures between distributions
Interpret: Draw meaningful conclusions
Communicate: Clearly explain findings

Tip 1: Always include units in your final answers.

Tip 2: Round decimal answers to appropriate precision.

Tip 3: Consider how outliers affect different measures of center.

Tip 4: Use both numerical and visual methods to analyze data.

Common errors: Mismatched comparisons, calculation mistakes, ignoring outliers.

Success strategies: Systematic approach, verification, contextual interpretation.

Essential concepts:

• Mean: Arithmetic average

• Median: Middle value

• Mode: Most frequent value

• Range: Max - Min

• IQR: Q3 - Q1

Questions & Answers

MathStudent7th

Grade 7 Student

Question: When should I use mean versus median to compare distributions?

TeachMath7

Middle School Mathematics Specialist

Answer: Use these guidelines:

Use mean: When data is symmetric and without significant outliers
Use median: When data is skewed or has outliers that would distort the mean
Compare both: When mean and median differ significantly, it indicates skewness

If the mean is much higher than the median, the data is skewed right. If the mean is much lower than the median, the data is skewed left.

JennyMath

Grade 7 Student

Question: How do I know if a value is an outlier?

ProfEducator

PhD in Mathematics Education

Answer: A common method is the 1.5×IQR rule:

Calculate Q1 (first quartile) and Q3 (third quartile)
Find IQR = Q3 - Q1
Calculate lower bound = Q1 - 1.5×IQR
Calculate upper bound = Q3 + 1.5×IQR
Any value below the lower bound or above the upper bound is considered an outlier

This provides a mathematical criterion for identifying outliers.

AlexLearning

Middle School Level

Question: What's the difference between range and interquartile range? When should I use each?

ScienceExpert

Mathematics & Science Educator

Answer: The key differences are:

Range: Maximum - Minimum (sensitive to outliers)
IQR: Q3 - Q1 (focuses on middle 50%, not affected by outliers)
Use range: When you want the overall spread including extremes
Use IQR: When you want to focus on the core of the data

IQR is more robust and often preferred for comparing the central tendency of spread.

Solved Exercises on Comparing Data Sets in Grade 7

Questions & Answers