Which of these is NOT a recommended approach for dealing with outliers in linear regression?
Automatically removing all outliers without investigation
Investigating the cause of the outlier and correcting errors if possible
Transforming the data to reduce the outlier's influence
Using robust regression methods less sensitive to outliers
What does the adjusted R-squared value tell you in multiple linear regression?
The statistical significance of the overall model.
The presence of outliers in the data.
The accuracy of the model's predictions.
The proportion of variance in the outcome explained by the predictors, adjusted for the number of predictors in the model.
How does Lasso Regression differ from Ridge Regression in terms of feature selection?
Neither Lasso nor Ridge Regression performs feature selection; they only shrink coefficients.
Lasso Regression can shrink coefficients to exactly zero, effectively performing feature selection.
Ridge Regression tends to shrink all coefficients towards zero but rarely sets them exactly to zero.
Both Lasso and Ridge Regression can shrink coefficients to zero, but Lasso does it more aggressively.
What is a potential drawback of removing a highly correlated independent variable to deal with multicollinearity?
It may result in a loss of valuable information and reduce the model's accuracy.
It may improve the model's overall fit but reduce its interpretability.
It may lead to an increase in the model's complexity.
It has no drawbacks and is always the best solution.
Elastic Net Regression combines the penalties of which two regularization techniques?
Linear Regression and Ridge Regression
Ridge Regression and Polynomial Regression
Lasso Regression and Polynomial Regression
Lasso Regression and Ridge Regression
Which robust regression technique is particularly well-suited for handling datasets with a high proportion of outliers?
Theil-Sen estimator
RANSAC (Random Sample Consensus)
Ordinary Least Squares (OLS) regression
Huber regression
You are working with a dataset that has a skewed distribution of errors. Which metric would be a more appropriate measure of model performance?
RMSE, as it is less sensitive to skewed distributions.
MAE, as it is less influenced by extreme values in a skewed distribution.
Adjusted R-squared, as it is not affected by the distribution of errors.
R-squared, as it provides a standardized measure of fit.
Which metric penalizes large errors more heavily than smaller errors, making it particularly sensitive to outliers?
Root Mean Squared Error (RMSE)
Adjusted R-squared
R-squared
Mean Absolute Error (MAE)
The Theil-Sen estimator is known for its robustness and non-parametric nature. What does 'non-parametric' imply in this context?
It does not have any parameters that need to be estimated from the data
It does not require a linear relationship between the variables
It does not require a dependent variable for model fitting
It does not require assumptions about the distribution of the data
You are comparing two linear regression models for predicting house prices. Model A has a lower RMSE than Model B. What does this imply about their predictive performance?
Model A is guaranteed to make better predictions on all new data points.
Model A has a higher R-squared value than Model B.
Model A, on average, has smaller prediction errors than Model B.
Model B is definitely overfitting the data.