Simple Linear Regression
Let’s explore simple linear regression with a classic example: predicting a car’s stopping distance (\(y\)) based on its speed (\(x\)) using R’s cars dataset. Imagine we’re driving—the faster we go, the longer it takes to stop. This relationship isn’t perfect (other factors like road conditions matter too), but we can model the average trend. Simple linear regression does this by fitting a straight line through the data points. The line’s equation looks like \[\text{distance} = \text{intercept} + \text{slope} \times \text{speed} + \text{random error.}\] Here, the intercept is where the line starts when speed is 0, and the slope tells us how much distance increases per mph. We estimate these to minimize prediction errors, called residuals.
For the cars data, the fitted line is \(\text{distance} = -17.58 + 3.93 \times \text{speed}\). This means, on average, stopping distance increases by ~3.93 feet per mph. But before trusting this, we check if the data “behaves” like a straight-line relationship: plot the points, see if they scatter evenly around the line, and ensure residuals aren’t patterned (like curves or funnels). If all looks good, we can estimate stopping distances for new speeds, like predicting ~61 feet at 20 mph. The key takeaway: regression helps us model trends while acknowledging real-world randomness!