What is a potential drawback of using a purely automated feature selection technique (like forward selection or backward elimination) without careful consideration?
It can sometimes overlook features that might be important in combination with others.
It completely eliminates the need for domain expertise in model building.
It guarantees the most interpretable model.
It can lead to models that are less accurate than using all available features.
Which of the following is the general equation for a simple linear regression model?
y = b0 + b1*x + e
y = b0 * x^b1
y = b0 + b1x1 + b2x2 + ... + bn*xn
y = e^(b0 + b1*x)
Feature selection in linear regression primarily aims to:
Increase the number of features used for prediction
Improve model performance and generalization by focusing on the most relevant predictors
Ensure that all features have a statistically significant p-value
Make the model more complex and harder to interpret
Which of the following is NOT an assumption of linear regression?
Normality of residuals
Multicollinearity
Linearity
Homoscedasticity
Can the R-squared value be negative?
No, it is always positive.
No, it always ranges between 0 and 1.
Yes, if the model fits the data worse than a horizontal line.
Yes, if there is a perfect negative correlation between the variables.
What distinguishes simple linear regression from multiple linear regression?
Simple linear regression has one independent variable, while multiple linear regression has two or more.
Simple linear regression uses a curved line, while multiple linear regression uses a straight line.
There is no difference; the terms are interchangeable.
Simple linear regression analyzes categorical data, while multiple linear regression analyzes numerical data.
If the coefficient of determination (R-squared) for a linear regression model is 0.8, what does this indicate?
There is a weak relationship between the independent and dependent variables.
The model is a poor fit for the data.
20% of the variation in the dependent variable is explained by the independent variable.
80% of the variation in the dependent variable is explained by the independent variable.
What is the primary goal of feature selection in linear regression?
Introduce bias into the model
Improve the model's interpretability and reduce overfitting
Maximize the number of features used in the model
Increase the complexity of the model
Backward elimination in linear regression involves removing features based on what criterion?
The feature that contributes the least to multicollinearity
The feature with the lowest p-value
The feature that results in the smallest decrease in model performance
The feature with the highest correlation with the target variable
What is the method used in linear regression to estimate the model parameters that minimize the sum of squared errors?
Maximum Likelihood Estimation
Bayesian Estimation
Least Squares Estimation
Method of Moments