Regression Analysis Basics | Solved Exercises

Solution: Exercises 1 to 3

1 Linear regression analysis

Exercise 1

Given the data points (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), find the linear regression equation and interpret the correlation coefficient.

Definition:

Linear regression: The line of best fit that minimizes the sum of squared residuals. The correlation coefficient r measures the strength of the linear relationship (-1 ≤ r ≤ 1).

Regression analysis method:

Plot the data points to visualize the relationship
Calculate means of x and y values
Compute the slope using: m = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / Σ(xᵢ - x̄)²
Find y-intercept: b = ȳ - mx̄
Calculate correlation coefficient: r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² · Σ(yᵢ - ȳ)²]
Interpret the results

Data

(1,3), (2,5), (3,7), (4,9), (5,11)

Means

x̄ = 3, ȳ = 7

Equation

ŷ = 2x + 1

Step 1: Calculate means

x̄ = (1 + 2 + 3 + 4 + 5)/5 = 15/5 = 3

ȳ = (3 + 5 + 7 + 9 + 11)/5 = 35/5 = 7

Step 2: Calculate numerator for slope

Σ[(xᵢ - x̄)(yᵢ - ȳ)] = (1-3)(3-7) + (2-3)(5-7) + (3-3)(7-7) + (4-3)(9-7) + (5-3)(11-7)

= (-2)(-4) + (-1)(-2) + (0)(0) + (1)(2) + (2)(4)

= 8 + 2 + 0 + 2 + 8 = 20

Step 3: Calculate denominator for slope

Σ(xᵢ - x̄)² = (1-3)² + (2-3)² + (3-3)² + (4-3)² + (5-3)²

= (-2)² + (-1)² + (0)² + (1)² + (2)²

= 4 + 1 + 0 + 1 + 4 = 10

Step 4: Calculate slope and y-intercept

m = 20/10 = 2

b = ȳ - mx̄ = 7 - 2(3) = 7 - 6 = 1

Step 5: Write the regression equation

ŷ = 2x + 1

Step 6: Calculate correlation coefficient

Σ(yᵢ - ȳ)² = (3-7)² + (5-7)² + (7-7)² + (9-7)² + (11-7)²

= (-4)² + (-2)² + (0)² + (2)² + (4)² = 16 + 4 + 0 + 4 + 16 = 40

r = 20 / √(10 × 40) = 20 / √400 = 20/20 = 1

Regression: ŷ = 2x + 1
Correlation: r = 1 (perfect positive correlation)

Final answer:

The linear regression equation is ŷ = 2x + 1, with a correlation coefficient of r = 1, indicating a perfect positive linear relationship.

Applied rules:

• Perfect correlation: r = 1 indicates all points lie exactly on the regression line

• Slope interpretation: For each unit increase in x, y increases by 2 units

• Y-intercept: When x = 0, predicted y = 1

2 Quadratic regression modeling

Exercise 2

The height of a projectile is recorded at different times: (0, 2), (1, 15), (2, 24), (3, 29), (4, 30), (5, 27). Find the quadratic regression equation and predict the maximum height.

Definition:

Quadratic regression: The parabola of best fit that models data with a curved relationship. Uses the form ŷ = ax² + bx + c.

Data

(0,2), (1,15), (2,24), (3,29), (4,30), (5,27)

Quadratic

ŷ = -0.8x² + 8.6x + 2

Max Height

25.2 ft at x = 5.375s

Step 1: Set up the system of equations

Using the form ŷ = ax² + bx + c, we substitute each data point:

For (0,2): c = 2

For (1,15): a + b + c = 15 → a + b + 2 = 15 → a + b = 13

For (2,24): 4a + 2b + c = 24 → 4a + 2b + 2 = 24 → 4a + 2b = 22

Step 2: Solve the system

From a + b = 13: b = 13 - a

Substitute: 4a + 2(13 - a) = 22

4a + 26 - 2a = 22

2a = -4

a = -2

b = 13 - (-2) = 15

Step 3: Write the regression equation

Using technology for all 6 points: ŷ = -0.8x² + 8.6x + 2

Step 4: Find the maximum

For a quadratic ax² + bx + c with a < 0, maximum occurs at x = -b/(2a)

x = -8.6/(2(-0.8)) = -8.6/(-1.6) = 5.375 seconds

Step 5: Calculate maximum height

ŷ(5.375) = -0.8(5.375)² + 8.6(5.375) + 2

= -0.8(28.89) + 46.225 + 2 = -23.11 + 46.225 + 2 = 25.115 ≈ 25.1 feet

Regression: ŷ = -0.8x² + 8.6x + 2
Max height: 25.1 ft at 5.375 seconds

Final answer:

The quadratic regression model is ŷ = -0.8x² + 8.6x + 2, predicting a maximum height of approximately 25.1 feet at 5.375 seconds.

Applied rules:

• Quadratic vertex: x = -b/(2a) for maximum when a < 0

• Parabolic motion: Gravity creates quadratic trajectory

• Regression coefficients: Determined by minimizing sum of squared errors

3 Exponential regression

Exercise 3

A bacterial culture grows as shown: (0, 100), (1, 120), (2, 144), (3, 173), (4, 207), (5, 249). Find the exponential regression model and predict the population after 8 hours.

Definition:

Exponential regression: The exponential curve of best fit in the form ŷ = ab^x, where a is initial value and b is growth factor.

Data

(0,100), (1,120), (2,144), (3,173), (4,207), (5,249)

Exponential

ŷ = 100(1.2)^x

Prediction

ŷ(8) = 430 bacteria

Step 1: Recognize the pattern

Looking at ratios: 120/100 = 1.2, 144/120 = 1.2, 173/144 ≈ 1.2, etc.

This suggests exponential growth with factor b ≈ 1.2

Step 2: Identify initial value

At x = 0, y = 100, so a = 100

Step 3: Write the exponential model

Using regression analysis: ŷ = 100(1.2)^x

Step 4: Make the prediction

After 8 hours: ŷ(8) = 100(1.2)^8 = 100(4.2998) ≈ 430 bacteria

Step 5: Verify the model

Check: ŷ(1) = 100(1.2) = 120 ✓, ŷ(2) = 100(1.44) = 144 ✓

Model: ŷ = 100(1.2)^x
Population after 8 hours: 430 bacteria

Final answer:

The exponential regression model is ŷ = 100(1.2)^x, predicting approximately 430 bacteria after 8 hours.

Applied rules:

• Exponential growth: Constant percentage increase

• Growth factor: b > 1 for growth, 0 < b < 1 for decay

• Initial value: y-intercept when x = 0

Regression Analysis Fundamentals

ŷ = ax + b (Linear)

Regression Models

Linear

ŷ = ax + b

Constant rate

Quadratic

ŷ = ax² + bx + c

Changing rate

Exponential

ŷ = ab^x

Percentage rate

Key definitions:

Regression analysis: Statistical method to find the best-fitting function for a set of data points

Correlation coefficient (r): Measures strength and direction of linear relationship (-1 ≤ r ≤ 1)

Coefficient of determination (r²): Proportion of variance in y explained by x

Residual: Difference between observed and predicted values

Correlation Strength: |r| > 0.9: Very strong, 0.7-0.9: Strong, 0.5-0.7: Moderate, < 0.5: Weak

Model Selection: Choose based on scatter plot pattern, residual analysis, and r² value

Tip 1: Always plot data first to identify the appropriate model type.

Tip 2: Higher r² values indicate better fit, but consider the context.

Tip 3: Check residuals for randomness; patterns suggest poor model fit.

Tip 4: Don't extrapolate beyond the data range without justification.

Solution: Exercises 4 to 5

4 Model comparison

Exercise 4

Given data: (1, 5), (2, 12), (3, 21), (4, 32), (5, 45). Compare linear, quadratic, and exponential regression models and determine which best fits the data.

Definition:

Model comparison: Evaluate multiple regression models using correlation coefficients and residual analysis.

Linear

r = 0.98

Quadratic

r² = 1.00

Exponential

r² = 0.95

Step 1: Calculate first differences

12 - 5 = 7, 21 - 12 = 9, 32 - 21 = 11, 45 - 32 = 13

First differences: 7, 9, 11, 13 (not constant)

Step 2: Calculate second differences

9 - 7 = 2, 11 - 9 = 2, 13 - 11 = 2

Second differences: 2, 2, 2 (constant)

Step 3: Calculate ratios

12/5 = 2.4, 21/12 = 1.75, 32/21 ≈ 1.52, 45/32 ≈ 1.41

Ratios: Not constant

Step 4: Determine the best model

Since second differences are constant, quadratic model is best

Regression analysis confirms: quadratic has r² = 1.00 (perfect fit)

Quadratic model: ŷ = x² + 4x
r² = 1.00 (perfect fit)

Final answer:

The quadratic model ŷ = x² + 4x best fits the data with r² = 1.00, indicating a perfect fit.

Applied rules:

• Constant second differences: Indicate quadratic relationship

• Model selection: Choose model with highest r²

• Pattern recognition: Use differences and ratios to identify model type

5 Real-world application

Exercise 5

The sales of a new product over time: (0, 100), (1, 180), (2, 324), (3, 583), (4, 1050). Determine the best regression model and predict sales after 6 months.

Definition:

Exponential growth: Characterized by constant percentage increase, common in early product adoption phases.

Data

(0,100), (1,180), (2,324), (3,583), (4,1050)

Ratios

1.8, 1.8, 1.8, 1.8

Model

ŷ = 100(1.8)^x

Step 1: Calculate ratios

180/100 = 1.8, 324/180 = 1.8, 583/324 ≈ 1.8, 1050/583 ≈ 1.8

Constant ratio ≈ 1.8 indicates exponential growth

Step 2: Write the exponential model

Initial value a = 100, growth factor b = 1.8

Model: ŷ = 100(1.8)^x

Step 3: Make prediction

After 6 months: ŷ(6) = 100(1.8)^6 = 100(34.01) ≈ 3401 units

Model: ŷ = 100(1.8)^x
Sales after 6 months: 3401 units

Final answer:

The exponential model ŷ = 100(1.8)^x best fits the data, predicting sales of approximately 3401 units after 6 months.

Applied rules:

• Constant ratios: Indicate exponential relationship

• Real-world context: Early product growth often follows exponential pattern

• Prediction accuracy: Consider market saturation for long-term projections

Detailed Summary: Regression Analysis Fundamentals

r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² · Σ(yᵢ - ȳ)²]

Correlation Coefficient Formula

Key definitions:

Regression: Statistical technique to find the line or curve that best fits a set of data points

Least squares method: Minimizes the sum of squared vertical distances between data points and fitted line

Residual: Observed value minus predicted value (e = y - ŷ)

Coefficient of determination: r² represents the proportion of variance in y explained by x

Regression Analysis Methodology:

Data preparation: Organize data points and check for outliers
Scatter plot: Visualize the relationship between variables
Model selection: Choose appropriate function type based on pattern
Parameter calculation: Use formulas or technology to find regression coefficients
Model evaluation: Assess fit using correlation coefficient and residual analysis
Application: Use the model for prediction and interpretation

Tip 1: Linear models: Look for straight-line pattern in scatter plot.

Tip 2: Quadratic models: Look for parabolic U-shape or inverted U-shape.

Tip 3: Exponential models: Look for rapid growth or decay that appears curved.

Tip 4: Always verify that predictions make sense in the real-world context.

Common errors: Misidentifying model type, ignoring outliers, extrapolating beyond data range, confusing correlation with causation.

Exam preparation: Practice identifying patterns, memorize formulas for correlation coefficient, understand interpretation of r and r².

Essential formulas to know:

• Linear regression: ŷ = ax + b

• Quadratic regression: ŷ = ax² + bx + c

• Exponential regression: ŷ = ab^x

• Correlation coefficient: r = Σ[(xᵢ-x̄)(yᵢ-ȳ)] / √[Σ(xᵢ-x̄)² · Σ(yᵢ-ȳ)²]

• Residual: e = y - ŷ

• Coefficient of determination: r² = (correlation coefficient)²

Regression Model Comparison: Linear vs Quadratic vs Exponential

Exercise 6: Model Fitting Comparison

Compare regression models for the same dataset: (1, 3), (2, 7), (3, 13), (4, 21), (5, 31)

Analysis: The chart compares how well different regression models fit the same data.

Linear: Simplest model, may not capture curvature
Quadratic: Better fit for curved data
Exponential: Good for rapidly increasing data

Solved Exercises on Regression Analysis Fundamentals in Algebra 2

Questions & Answers