Which of the following is NOT a benefit of feature selection in linear regression?
Reduced computational cost
Potential for better generalization to new data
Improved model interpretability
Increased risk of overfitting
How does the Mean Squared Error (MSE) penalize larger errors compared to smaller errors?
It takes the absolute value of the errors, ignoring the sign.
It doesn't; all errors are penalized equally.
It squares the errors, giving more weight to larger deviations.
It uses a logarithmic scale to compress larger errors.
Why is normality of errors an important assumption in linear regression?
It is necessary for the calculation of the regression coefficients
It ensures the linearity of the relationship between variables
It validates the use of hypothesis testing for the model's coefficients
It guarantees the homoscedasticity of the errors
What is the purpose of the coefficient of determination (R-squared) in linear regression?
To determine the statistical significance of the model.
To identify the presence of outliers in the data.
To assess the linearity assumption of the model.
To measure the proportion of variation in the dependent variable explained by the independent variable(s).
Feature selection in linear regression primarily aims to:
Ensure that all features have a statistically significant p-value
Improve model performance and generalization by focusing on the most relevant predictors
Make the model more complex and harder to interpret
Increase the number of features used for prediction
What does a correlation coefficient of 0 indicate?
No linear relationship
A strong negative linear relationship
A strong positive linear relationship
A perfect linear relationship
What type of visualization tool is commonly used to initially assess the relationship between two continuous variables in linear regression?
Histogram
Scatter plot
Pie chart
Bar chart
Which of the following situations might make feature selection particularly important?
Having a small dataset with a very large number of features
Having a very large dataset with only a few features
When computational resources are unlimited
When all features are highly correlated with the target variable
Which Python library is primarily used for numerical computing and provides support for arrays and matrices, essential for Linear Regression calculations?
NumPy
matplotlib
pandas
scikit-learn
What is the method used in linear regression to estimate the model parameters that minimize the sum of squared errors?
Maximum Likelihood Estimation
Least Squares Estimation
Method of Moments
Bayesian Estimation