Comparing Data Sets | Solved Exercises

Solution: Exercises 1 to 3

1 Comparing Centers

Exercise 1

Two classes took the same test. Class A scores: 75, 80, 85, 90, 95. Class B scores: 70, 75, 80, 85, 90. Compare the centers (mean and median) of both classes. Which class performed better?

Definition:

Center of Data: A measure that represents the middle or typical value of a data set.

Mean: The average value of the data set.

Median: The middle value when data is arranged in order.

Comparison Method:

Calculate mean and median for each data set
Compare the measures of center
Consider the context of the comparison
Make conclusions based on the comparison

Class A

Mean=85, Med=85

Class B

Mean=80, Med=80

Better Class

Class A

Step 1: Calculate Class A measures

Class A: 75, 80, 85, 90, 95

Mean = (75+80+85+90+95)÷5 = 425÷5 = 85

Median = middle value = 85

Step 2: Calculate Class B measures

Class B: 70, 75, 80, 85, 90

Mean = (70+75+80+85+90)÷5 = 400÷5 = 80

Median = middle value = 80

Step 3: Compare centers

Class A mean: 85, Class B mean: 80

Class A median: 85, Class B median: 80

Both measures show Class A has higher center

Step 4: Draw conclusion

Class A performed better as both mean and median are higher

Class A: Mean=85, Median=85
Class B: Mean=80, Median=80
Class A performed better

Final Answer:

Class A performed better than Class B. Class A has a mean of 85 and median of 85, while Class B has a mean of 80 and median of 80.

Applied Rules:

• Mean Calculation: Sum of values divided by count

• Median Calculation: Middle value when sorted

• Comparison: Higher center indicates better performance

2 Comparing Spreads

Exercise 2

Compare the spreads of the following data sets using range: Set X: 10, 15, 20, 25, 30 and Set Y: 5, 15, 20, 25, 35. Which set has more variability?

Definition:

Spread: How spread out the values are in a data set.

Range: The difference between maximum and minimum values.

Variability: The degree to which data points differ from each other.

Set X

Range = 20

Set Y

Range = 30

More Variable

Set Y

Step 1: Calculate range for Set X

Set X: 10, 15, 20, 25, 30

Maximum: 30, Minimum: 10

Range = 30 - 10 = 20

Step 2: Calculate range for Set Y

Set Y: 5, 15, 20, 25, 35

Maximum: 35, Minimum: 5

Range = 35 - 5 = 30

Step 3: Compare ranges

Set X range: 20

Set Y range: 30

Set Y has a larger range

Step 4: Interpret results

Set Y has more variability since its range is larger

Set Y values are more spread out than Set X values

Set X Range = 20
Set Y Range = 30
Set Y has more variability

Final Answer:

Set X has a range of 20, Set Y has a range of 30. Set Y has more variability since it has a larger range.

Applied Rules:

• Range Formula: Range = Maximum - Minimum

• Spread Comparison: Larger range indicates more variability

• Data Interpretation: Range measures data spread

3 Comprehensive Comparison

Exercise 3

Compare the following data sets: Team A: 12, 15, 18, 20, 22, 25, 28 and Team B: 10, 14, 16, 18, 20, 24, 30. Calculate and compare mean, median, and range for both teams.

Definition:

Comprehensive Comparison: Analyzing multiple aspects of data sets including center and spread.

Multiple Measures: Using several statistical measures for complete comparison.

Team A

Mean=19.4, Med=20, Range=16

Team B

Mean=18.9, Med=18, Range=20

Comparison

Team A has higher center, Team B has more spread

Step 1: Calculate Team A measures

Team A: 12, 15, 18, 20, 22, 25, 28

Mean = (12+15+18+20+22+25+28)÷7 = 140÷7 = 20

Median = middle value (4th) = 20

Range = 28-12 = 16

Step 2: Calculate Team B measures

Team B: 10, 14, 16, 18, 20, 24, 30

Mean = (10+14+16+18+20+24+30)÷7 = 132÷7 = 18.86 ≈ 18.9

Median = middle value (4th) = 18

Range = 30-10 = 20

Step 3: Compare centers

Team A mean: 20, Team B mean: 18.9

Team A median: 20, Team B median: 18

Team A has higher center values

Step 4: Compare spreads

Team A range: 16, Team B range: 20

Team B has greater spread

Step 5: Overall comparison

Team A has higher performance (center)

Team B has more variability (spread)

Team A: Mean=20, Median=20, Range=16
Team B: Mean=18.9, Median=18, Range=20

Final Answer:

Team A: Mean=20, Median=20, Range=16. Team B: Mean=18.9, Median=18, Range=20. Team A has higher center values but less variability than Team B.

Applied Rules:

• Multiple Measures: Calculate center and spread for comprehensive comparison

• Systematic Approach: Calculate each measure for both sets

• Contextual Interpretation: Consider what each measure represents

Rules and methods, laws,...

$\text{Mean} = \frac{\sum x}{n}, \quad \text{Range} = \max - \min$

Key Formulas

Measures of Center

Mean, Median, Mode

Represent typical values

Measures of Spread

Range, IQR

Measure variability

Comparison Method

Calculate and compare

Systematic approach

Key Definitions:

Comparing Data Sets: Analyzing and contrasting different data sets using statistical measures

Measures of Center: Mean, median, and mode that represent typical values

Measures of Spread: Range, interquartile range that measure variability

Center Comparison: Comparing typical values between data sets

Spread Comparison: Comparing variability between data sets

Systematic Approach: Following a structured method for comparison

Complete Methodology:

Identify Data Sets: Clearly define the data sets to compare
Calculate Measures: Compute center and spread measures for each set
Compare Centers: Analyze mean, median, and mode
Compare Spreads: Analyze range and other spread measures
Draw Conclusions: Make informed statements about the differences
Interpret Results: Consider the context of the comparison

Tip 1: Always calculate multiple measures for comprehensive comparison.

Tip 2: Consider both center and spread when comparing data sets.

Tip 3: Look for outliers that might affect comparisons.

Tip 4: Use visual representations to support your analysis.

Tip 5: Consider the context when interpreting differences.

Common Errors: Forgetting to calculate multiple measures, not considering outliers, misinterpreting results.

Exam Preparation: Practice with various data sets, memorize formulas, understand interpretation.

Solution: Exercises 4 to 5

4 Effect of Outliers

Exercise 4

Compare the following data sets: Set C: 10, 12, 14, 16, 18 and Set D: 10, 12, 14, 16, 50. Explain how the outlier in Set D affects the comparison.

Definition:

Outlier: A value significantly different from other values in the data set.

Outlier Effect: How extreme values impact statistical measures.

Set C

Mean=14, Med=14, Range=8

Set D

Mean=20.4, Med=14, Range=40

Outlier Effect

Mean and range affected, median not

Step 1: Calculate Set C measures

Set C: 10, 12, 14, 16, 18

Mean = (10+12+14+16+18)÷5 = 70÷5 = 14

Median = middle value = 14

Range = 18-10 = 8

Step 2: Calculate Set D measures

Set D: 10, 12, 14, 16, 50

Mean = (10+12+14+16+50)÷5 = 102÷5 = 20.4

Median = middle value = 14

Range = 50-10 = 40

Step 3: Compare the sets

Means: Set C (14) vs Set D (20.4) - large difference

Medians: Set C (14) vs Set D (14) - no difference

Ranges: Set C (8) vs Set D (40) - large difference

Step 4: Explain outlier effect

Value 50 is an outlier in Set D

Outlier increases mean significantly

Outlier increases range significantly

Median remains unchanged

Step 5: Draw conclusions

Mean and range are sensitive to outliers

Median is resistant to outliers

Set C: Mean=14, Median=14, Range=8
Set D: Mean=20.4, Median=14, Range=40

Final Answer:

Set C: Mean=14, Median=14, Range=8. Set D: Mean=20.4, Median=14, Range=40. The outlier (50) in Set D increases the mean and range significantly, but does not affect the median.

Applied Rules:

• Outlier Sensitivity: Mean and range are sensitive to outliers

• Resistance: Median is resistant to outliers

• Impact Analysis: Different measures are affected differently

5 Real-World Application

Exercise 5

A company tracks daily sales for two stores: Store A: $1200, $1300, $1400, $1500, $1600 and Store B: $1100, $1300, $1400, $1500, $1700. Compare the performance of both stores using statistical measures.

Definition:

Real-World Application: Applying statistical concepts to practical business situations.

Performance Metrics: Statistical measures used to evaluate performance.

Store A

Mean=$1400, Med=$1400, Range=$400

Store B

Mean=$1400, Med=$1400, Range=$600

Comparison

Same center, different spread

Step 1: Calculate Store A measures

Store A: $1200, $1300, $1400, $1500, $1600

Mean = ($1200+$1300+$1400+$1500+$1600)÷5 = $7000÷5 = $1400

Median = middle value = $1400

Range = $1600-$1200 = $400

Step 2: Calculate Store B measures

Store B: $1100, $1300, $1400, $1500, $1700

Mean = ($1100+$1300+$1400+$1500+$1700)÷5 = $7000÷5 = $1400

Median = middle value = $1400

Range = $1700-$1100 = $600

Step 3: Compare centers

Store A mean: $1400, Store B mean: $1400

Store A median: $1400, Store B median: $1400

Both stores have identical center values

Step 4: Compare spreads

Store A range: $400, Store B range: $600

Store B has greater variability in sales

Step 5: Interpret results

Both stores have same average performance

Store B has more variable sales (higher highs and lower lows)

Store A has more consistent sales

Step 6: Business implications

Store A: More predictable revenue stream

Store B: More potential for high sales days but also low sales days

Both stores: Mean=$1400, Median=$1400
Store A Range=$400, Store B Range=$600

Final Answer:

Both stores have identical average performance (mean=$1400, median=$1400), but Store B has greater variability in sales (range=$600) compared to Store A (range=$400). Store A has more consistent sales while Store B has more variable performance.

Applied Rules:

• Real-World Context: Apply statistical concepts to practical situations

• Multiple Measures: Consider both center and spread for complete picture

• Business Interpretation: Relate statistical findings to practical implications

Detailed Summary: Comparing Data Sets Fundamentals

$\text{Mean} = \frac{\sum x}{n}, \quad \text{Median} = \text{middle value}, \quad \text{Range} = \max - \min$

Core Formulas

Key definitions:

Comparing Data Sets: The process of analyzing and contrasting different collections of data using statistical measures to identify similarities and differences

Measures of Center: Statistical measures (mean, median, mode) that represent the typical or central value of a data set

Measures of Spread: Statistical measures (range, interquartile range) that describe how spread out the values are in a data set

Center Comparison: Comparing the typical values between different data sets

Spread Comparison: Comparing the variability between different data sets

Outlier: A value that is significantly different from other values in the data set

Complete methodology:

Data Set Identification: Clearly define the data sets to be compared
Center Calculation: Calculate mean, median, and mode for each set
Spread Calculation: Calculate range and other spread measures for each set
Systematic Comparison: Compare corresponding measures between sets
Pattern Recognition: Identify similarities and differences
Contextual Interpretation: Understand the meaning of differences in context

Tip 1: Always calculate multiple measures (center and spread) for comprehensive comparison.

Tip 2: Consider both center and spread when evaluating data sets.

Tip 3: Be aware that outliers affect different measures differently.

Tip 4: Use visual representations to support your analysis.

Tip 5: Consider the context when interpreting statistical differences.

Common errors: Forgetting to calculate multiple measures, not considering outliers, misinterpreting results, focusing only on center and ignoring spread.

Exam preparation: Practice with various data sets, memorize formulas, understand how to interpret differences, know the effect of outliers on different measures.

Formulas to know by heart:

• Mean Formula: Mean = (Sum of all values) ÷ (Number of values)

• Median Rule: Middle value when data is sorted in order

• Range Formula: Range = Maximum - Minimum

• Comparison Method: Calculate measures for each set and compare

• Outlier Impact: Mean and range are sensitive, median is resistant

Exercise with Visualization: Data Set Comparison

Exercise 6: Visual Data Analysis

Compare the following data sets visually:
Set A: 5, 10, 15, 20, 25 (symmetric)
Set B: 2, 4, 6, 8, 30 (with outlier)
Set C: 18, 19, 20, 21, 22 (tight cluster)

Analysis: The visualization shows different data characteristics.

Set A: Moderate spread, symmetric distribution
Set B: High spread due to outlier, skewed distribution
Set C: Low spread, tight clustering around center

Solved Exercises on Comparing Data Sets in Grade 7

Questions & Answers