Feature selection in linear regression primarily aims to:
Increase the number of features used for prediction
Ensure that all features have a statistically significant p-value
Improve model performance and generalization by focusing on the most relevant predictors
Make the model more complex and harder to interpret
What does the assumption of independence in linear regression refer to?
Independence between the observations
Independence between the coefficients of the regression model
Independence between the independent and dependent variables
Independence between the errors and the dependent variable
What does a high R-squared value indicate?
The independent variables are not correlated with the dependent variable.
The model is a perfect fit for the data.
The model is not a good fit for the data.
A large proportion of the variance in the dependent variable is explained by the independent variables.
In the context of linear regression, what is an error term?
The difference between the observed value of the dependent variable and the predicted value.
The variation in the independent variable.
The difference between the slope and the intercept of the regression line.
A mistake made in collecting or entering data.
If a Durbin-Watson test statistic is close to 2, what does it suggest about the residuals?
They are homoscedastic
They exhibit a linear pattern
They are independent
They are normally distributed
What is the purpose of the coefficient of determination (R-squared) in linear regression?
To determine the statistical significance of the model.
To measure the proportion of variation in the dependent variable explained by the independent variable(s).
To assess the linearity assumption of the model.
To identify the presence of outliers in the data.
How does the Mean Squared Error (MSE) penalize larger errors compared to smaller errors?
It uses a logarithmic scale to compress larger errors.
It doesn't; all errors are penalized equally.
It squares the errors, giving more weight to larger deviations.
It takes the absolute value of the errors, ignoring the sign.
Which of the following situations might make feature selection particularly important?
Having a very large dataset with only a few features
Having a small dataset with a very large number of features
When all features are highly correlated with the target variable
When computational resources are unlimited
Which of the following is NOT a benefit of feature selection in linear regression?
Reduced computational cost
Improved model interpretability
Potential for better generalization to new data
Increased risk of overfitting
If the coefficient of determination (R-squared) for a linear regression model is 0.8, what does this indicate?
There is a weak relationship between the independent and dependent variables.
80% of the variation in the dependent variable is explained by the independent variable.
The model is a poor fit for the data.
20% of the variation in the dependent variable is explained by the independent variable.