What is the purpose of splitting the dataset into training and testing sets in Linear Regression?
To reduce the dimensionality of the data.
To handle missing values in the dataset.
To evaluate the model's performance on unseen data.
To visualize the relationship between variables.
What is the main difference between forward selection and backward elimination in linear regression?
Forward selection is used for classification, while backward elimination is used for regression.
There is no difference; both techniques achieve the same outcome.
Forward selection starts with no features and adds one by one, while backward elimination starts with all features and removes one by one.
Forward selection starts with all features and removes one by one, while backward elimination starts with no features and adds one by one.
In the context of linear regression, what is an error term?
The variation in the independent variable.
The difference between the slope and the intercept of the regression line.
The difference between the observed value of the dependent variable and the predicted value.
A mistake made in collecting or entering data.
What does the assumption of independence in linear regression refer to?
Independence between the coefficients of the regression model
Independence between the errors and the dependent variable
Independence between the independent and dependent variables
Independence between the observations
If a Durbin-Watson test statistic is close to 2, what does it suggest about the residuals?
They are normally distributed
They are homoscedastic
They exhibit a linear pattern
They are independent
Feature selection in linear regression primarily aims to:
Make the model more complex and harder to interpret
Ensure that all features have a statistically significant p-value
Increase the number of features used for prediction
Improve model performance and generalization by focusing on the most relevant predictors
Backward elimination in linear regression involves removing features based on what criterion?
The feature with the highest correlation with the target variable
The feature that results in the smallest decrease in model performance
The feature that contributes the least to multicollinearity
The feature with the lowest p-value
Which of the following is the general equation for a simple linear regression model?
y = b0 + b1x1 + b2x2 + ... + bn*xn
y = b0 * x^b1
y = e^(b0 + b1*x)
y = b0 + b1*x + e
Why is a residual plot useful in evaluating a linear regression model?
To predict future values of the dependent variable.
To check for non-linearity and other violations of the linear regression assumptions.
To calculate the R-squared value.
To determine the slope of the regression line.
What does a high R-squared value indicate?
The independent variables are not correlated with the dependent variable.
A large proportion of the variance in the dependent variable is explained by the independent variables.
The model is a perfect fit for the data.
The model is not a good fit for the data.