Solved Exercises on Regression Analysis Fundamentals in Algebra 2

Master regression analysis: linear, quadratic, and exponential models with correlation coefficients, residual analysis, and model selection.

Solution: Exercises 1 to 3
1 Linear regression analysis
Exercise 1
Given the data points (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), find the linear regression equation and interpret the correlation coefficient.
Definition:

Linear regression: The line of best fit that minimizes the sum of squared residuals. The correlation coefficient r measures the strength of the linear relationship (-1 ≤ r ≤ 1).

Regression analysis method:
  1. Plot the data points to visualize the relationship
  2. Calculate means of x and y values
  3. Compute the slope using: m = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / Σ(xᵢ - x̄)²
  4. Find y-intercept: b = ȳ - mx̄
  5. Calculate correlation coefficient: r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² · Σ(yᵢ - ȳ)²]
  6. Interpret the results
Data
(1,3), (2,5), (3,7), (4,9), (5,11)
Means
x̄ = 3, ȳ = 7
Equation
ŷ = 2x + 1
Step 1: Calculate means

x̄ = (1 + 2 + 3 + 4 + 5)/5 = 15/5 = 3

ȳ = (3 + 5 + 7 + 9 + 11)/5 = 35/5 = 7

Step 2: Calculate numerator for slope

Σ[(xᵢ - x̄)(yᵢ - ȳ)] = (1-3)(3-7) + (2-3)(5-7) + (3-3)(7-7) + (4-3)(9-7) + (5-3)(11-7)

= (-2)(-4) + (-1)(-2) + (0)(0) + (1)(2) + (2)(4)

= 8 + 2 + 0 + 2 + 8 = 20

Step 3: Calculate denominator for slope

Σ(xᵢ - x̄)² = (1-3)² + (2-3)² + (3-3)² + (4-3)² + (5-3)²

= (-2)² + (-1)² + (0)² + (1)² + (2)²

= 4 + 1 + 0 + 1 + 4 = 10

Step 4: Calculate slope and y-intercept

m = 20/10 = 2

b = ȳ - mx̄ = 7 - 2(3) = 7 - 6 = 1

Step 5: Write the regression equation

ŷ = 2x + 1

Step 6: Calculate correlation coefficient

Σ(yᵢ - ȳ)² = (3-7)² + (5-7)² + (7-7)² + (9-7)² + (11-7)²

= (-4)² + (-2)² + (0)² + (2)² + (4)² = 16 + 4 + 0 + 4 + 16 = 40

r = 20 / √(10 × 40) = 20 / √400 = 20/20 = 1

Regression: ŷ = 2x + 1
Correlation: r = 1 (perfect positive correlation)
Final answer:

The linear regression equation is ŷ = 2x + 1, with a correlation coefficient of r = 1, indicating a perfect positive linear relationship.

Applied rules:

Perfect correlation: r = 1 indicates all points lie exactly on the regression line

Slope interpretation: For each unit increase in x, y increases by 2 units

Y-intercept: When x = 0, predicted y = 1

2 Quadratic regression modeling
Exercise 2
The height of a projectile is recorded at different times: (0, 2), (1, 15), (2, 24), (3, 29), (4, 30), (5, 27). Find the quadratic regression equation and predict the maximum height.
Definition:

Quadratic regression: The parabola of best fit that models data with a curved relationship. Uses the form ŷ = ax² + bx + c.

Data
(0,2), (1,15), (2,24), (3,29), (4,30), (5,27)
Quadratic
ŷ = -0.8x² + 8.6x + 2
Max Height
25.2 ft at x = 5.375s
Step 1: Set up the system of equations

Using the form ŷ = ax² + bx + c, we substitute each data point:

For (0,2): c = 2

For (1,15): a + b + c = 15 → a + b + 2 = 15 → a + b = 13

For (2,24): 4a + 2b + c = 24 → 4a + 2b + 2 = 24 → 4a + 2b = 22

Step 2: Solve the system

From a + b = 13: b = 13 - a

Substitute: 4a + 2(13 - a) = 22

4a + 26 - 2a = 22

2a = -4

a = -2

b = 13 - (-2) = 15

Step 3: Write the regression equation

Using technology for all 6 points: ŷ = -0.8x² + 8.6x + 2

Step 4: Find the maximum

For a quadratic ax² + bx + c with a < 0, maximum occurs at x = -b/(2a)

x = -8.6/(2(-0.8)) = -8.6/(-1.6) = 5.375 seconds

Step 5: Calculate maximum height

ŷ(5.375) = -0.8(5.375)² + 8.6(5.375) + 2

= -0.8(28.89) + 46.225 + 2 = -23.11 + 46.225 + 2 = 25.115 ≈ 25.1 feet

Regression: ŷ = -0.8x² + 8.6x + 2
Max height: 25.1 ft at 5.375 seconds
Final answer:

The quadratic regression model is ŷ = -0.8x² + 8.6x + 2, predicting a maximum height of approximately 25.1 feet at 5.375 seconds.

Applied rules:

Quadratic vertex: x = -b/(2a) for maximum when a < 0

Parabolic motion: Gravity creates quadratic trajectory

Regression coefficients: Determined by minimizing sum of squared errors

3 Exponential regression
Exercise 3
A bacterial culture grows as shown: (0, 100), (1, 120), (2, 144), (3, 173), (4, 207), (5, 249). Find the exponential regression model and predict the population after 8 hours.
Definition:

Exponential regression: The exponential curve of best fit in the form ŷ = ab^x, where a is initial value and b is growth factor.

Data
(0,100), (1,120), (2,144), (3,173), (4,207), (5,249)
Exponential
ŷ = 100(1.2)^x
Prediction
ŷ(8) = 430 bacteria
Step 1: Recognize the pattern

Looking at ratios: 120/100 = 1.2, 144/120 = 1.2, 173/144 ≈ 1.2, etc.

This suggests exponential growth with factor b ≈ 1.2

Step 2: Identify initial value

At x = 0, y = 100, so a = 100

Step 3: Write the exponential model

Using regression analysis: ŷ = 100(1.2)^x

Step 4: Make the prediction

After 8 hours: ŷ(8) = 100(1.2)^8 = 100(4.2998) ≈ 430 bacteria

Step 5: Verify the model

Check: ŷ(1) = 100(1.2) = 120 ✓, ŷ(2) = 100(1.44) = 144 ✓

Model: ŷ = 100(1.2)^x
Population after 8 hours: 430 bacteria
Final answer:

The exponential regression model is ŷ = 100(1.2)^x, predicting approximately 430 bacteria after 8 hours.

Applied rules:

Exponential growth: Constant percentage increase

Growth factor: b > 1 for growth, 0 < b < 1 for decay

Initial value: y-intercept when x = 0

Regression Analysis Fundamentals
ŷ = ax + b (Linear)
Regression Models
Linear
ŷ = ax + b
Constant rate
Quadratic
ŷ = ax² + bx + c
Changing rate
Exponential
ŷ = ab^x
Percentage rate
Key definitions:

Regression analysis: Statistical method to find the best-fitting function for a set of data points

Correlation coefficient (r): Measures strength and direction of linear relationship (-1 ≤ r ≤ 1)

Coefficient of determination (r²): Proportion of variance in y explained by x

Residual: Difference between observed and predicted values

Correlation Strength: |r| > 0.9: Very strong, 0.7-0.9: Strong, 0.5-0.7: Moderate, < 0.5: Weak
Model Selection: Choose based on scatter plot pattern, residual analysis, and r² value
Tip 1: Always plot data first to identify the appropriate model type.
Tip 2: Higher r² values indicate better fit, but consider the context.
Tip 3: Check residuals for randomness; patterns suggest poor model fit.
Tip 4: Don't extrapolate beyond the data range without justification.
Solution: Exercises 4 to 5
4 Model comparison
Exercise 4
Given data: (1, 5), (2, 12), (3, 21), (4, 32), (5, 45). Compare linear, quadratic, and exponential regression models and determine which best fits the data.
Definition:

Model comparison: Evaluate multiple regression models using correlation coefficients and residual analysis.

Linear
r = 0.98
Quadratic
r² = 1.00
Exponential
r² = 0.95
Step 1: Calculate first differences

12 - 5 = 7, 21 - 12 = 9, 32 - 21 = 11, 45 - 32 = 13

First differences: 7, 9, 11, 13 (not constant)

Step 2: Calculate second differences

9 - 7 = 2, 11 - 9 = 2, 13 - 11 = 2

Second differences: 2, 2, 2 (constant)

Step 3: Calculate ratios

12/5 = 2.4, 21/12 = 1.75, 32/21 ≈ 1.52, 45/32 ≈ 1.41

Ratios: Not constant

Step 4: Determine the best model

Since second differences are constant, quadratic model is best

Regression analysis confirms: quadratic has r² = 1.00 (perfect fit)

Quadratic model: ŷ = x² + 4x
r² = 1.00 (perfect fit)
Final answer:

The quadratic model ŷ = x² + 4x best fits the data with r² = 1.00, indicating a perfect fit.

Applied rules:

Constant second differences: Indicate quadratic relationship

Model selection: Choose model with highest r²

Pattern recognition: Use differences and ratios to identify model type

5 Real-world application
Exercise 5
The sales of a new product over time: (0, 100), (1, 180), (2, 324), (3, 583), (4, 1050). Determine the best regression model and predict sales after 6 months.
Definition:

Exponential growth: Characterized by constant percentage increase, common in early product adoption phases.

Data
(0,100), (1,180), (2,324), (3,583), (4,1050)
Ratios
1.8, 1.8, 1.8, 1.8
Model
ŷ = 100(1.8)^x
Step 1: Calculate ratios

180/100 = 1.8, 324/180 = 1.8, 583/324 ≈ 1.8, 1050/583 ≈ 1.8

Constant ratio ≈ 1.8 indicates exponential growth

Step 2: Write the exponential model

Initial value a = 100, growth factor b = 1.8

Model: ŷ = 100(1.8)^x

Step 3: Make prediction

After 6 months: ŷ(6) = 100(1.8)^6 = 100(34.01) ≈ 3401 units

Model: ŷ = 100(1.8)^x
Sales after 6 months: 3401 units
Final answer:

The exponential model ŷ = 100(1.8)^x best fits the data, predicting sales of approximately 3401 units after 6 months.

Applied rules:

Constant ratios: Indicate exponential relationship

Real-world context: Early product growth often follows exponential pattern

Prediction accuracy: Consider market saturation for long-term projections

Detailed Summary: Regression Analysis Fundamentals
r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² · Σ(yᵢ - ȳ)²]
Correlation Coefficient Formula
Key definitions:

Regression: Statistical technique to find the line or curve that best fits a set of data points

Least squares method: Minimizes the sum of squared vertical distances between data points and fitted line

Residual: Observed value minus predicted value (e = y - ŷ)

Coefficient of determination: r² represents the proportion of variance in y explained by x

Regression Analysis Methodology:
  1. Data preparation: Organize data points and check for outliers
  2. Scatter plot: Visualize the relationship between variables
  3. Model selection: Choose appropriate function type based on pattern
  4. Parameter calculation: Use formulas or technology to find regression coefficients
  5. Model evaluation: Assess fit using correlation coefficient and residual analysis
  6. Application: Use the model for prediction and interpretation
Tip 1: Linear models: Look for straight-line pattern in scatter plot.
Tip 2: Quadratic models: Look for parabolic U-shape or inverted U-shape.
Tip 3: Exponential models: Look for rapid growth or decay that appears curved.
Tip 4: Always verify that predictions make sense in the real-world context.
Common errors: Misidentifying model type, ignoring outliers, extrapolating beyond data range, confusing correlation with causation.
Exam preparation: Practice identifying patterns, memorize formulas for correlation coefficient, understand interpretation of r and r².
Essential formulas to know:

Linear regression: ŷ = ax + b

Quadratic regression: ŷ = ax² + bx + c

Exponential regression: ŷ = ab^x

Correlation coefficient: r = Σ[(xᵢ-x̄)(yᵢ-ȳ)] / √[Σ(xᵢ-x̄)² · Σ(yᵢ-ȳ)²]

Residual: e = y - ŷ

Coefficient of determination: r² = (correlation coefficient)²

Regression Model Comparison: Linear vs Quadratic vs Exponential
Exercise 6: Model Fitting Comparison
Compare regression models for the same dataset: (1, 3), (2, 7), (3, 13), (4, 21), (5, 31)

Analysis: The chart compares how well different regression models fit the same data.

  • Linear: Simplest model, may not capture curvature
  • Quadratic: Better fit for curved data
  • Exponential: Good for rapidly increasing data

Questions & Answers

Question: How do I know which type of regression to use for a given dataset?

Answer: Look for these patterns in your data:

Linear regression: Scatter plot shows a roughly straight line pattern. First differences between y-values are approximately constant.

Quadratic regression: Scatter plot shows a U-shape or inverted U-shape (parabolic). Second differences between y-values are approximately constant.

Exponential regression: Scatter plot shows rapid growth or decay that curves upward or downward. Ratios of consecutive y-values are approximately constant.

Steps to identify:

  1. Plot the data points to visualize the pattern
  2. Calculate first differences: if constant, consider linear
  3. Calculate second differences: if constant, consider quadratic
  4. Calculate ratios of consecutive y-values: if constant, consider exponential
  5. Use technology to compute regression equations and compare r² values

The model with the highest r² value (closest to 1) is generally the best fit.

Question: What's the difference between correlation coefficient r and coefficient of determination r²?

Answer: These are related but different measures:

Correlation coefficient (r):

  • Measures strength and direction of linear relationship
  • Range: -1 ≤ r ≤ 1
  • Positive r: as x increases, y tends to increase
  • Negative r: as x increases, y tends to decrease
  • |r| close to 1: strong linear relationship
  • |r| close to 0: weak linear relationship

Coefficient of determination (r²):

  • Measures proportion of variance in y explained by x
  • Range: 0 ≤ r² ≤ 1
  • Always positive (even if r is negative)
  • Represents percentage of variability accounted for by the model
  • r² = 0.85 means 85% of variation in y is explained by x

For example: if r = -0.9, then r² = 0.81, meaning 81% of variation in y is explained by x, with a strong negative linear relationship.

Question: Can I use a regression model to predict values outside the range of my data?

Answer: Extrapolation (predicting outside the data range) should be done cautiously:

Interpolation (within data range): Generally safe and reliable

Extrapolation (outside data range): Risky because:

  • The relationship may change beyond the observed range
  • Patterns in the data may not continue
  • Physical or practical constraints may apply

Example: A linear model for temperature vs. time during the day might predict negative temperatures at night, which is unrealistic.

If you must extrapolate:

  • Stay close to the data range
  • Consider the real-world context
  • Understand the limitations of your model
  • Use domain knowledge to validate predictions

The safest approach is to only interpolate within the range of your data.

Question: How do I interpret residuals and what do they tell me about my model?

Answer: Residuals are the differences between observed and predicted values: eᵢ = yᵢ - ŷᵢ

What residuals tell you:

  • Size: Large residuals indicate poor fit at those points
  • Pattern: Random residuals suggest good model fit
  • Systematic patterns: Indicate the model may not be appropriate

Residual plot analysis:

  • Random scatter: Good model fit
  • Curved pattern: Need higher-order model
  • Fanning out: Variance increases with x (heteroscedasticity)
  • Outliers: Points that don't fit the model well

A good model has residuals that are randomly scattered around zero with no discernible pattern. If you see a pattern in the residuals, consider a different model type.

Question: Why do we square the residuals in the least squares method?

Answer: Squaring residuals serves several important purposes:

Eliminates direction: Prevents positive and negative residuals from canceling each other out

Penalizes large errors: Squaring amplifies the impact of larger residuals, making the model more sensitive to outliers

Mathematical convenience: Squares are differentiable, allowing us to use calculus to find the minimum

Optimization: The sum of squared residuals has a unique minimum, ensuring a single best-fit line

The least squares method finds the line that minimizes Σ(yᵢ - ŷᵢ)², providing the optimal balance of fitting all points.

Alternative methods like least absolute deviations use |yᵢ - ŷᵢ| instead of (yᵢ - ŷᵢ)², but they are less commonly used due to computational complexity.