Which robust regression technique is particularly well-suited for handling datasets with a high proportion of outliers?
Theil-Sen estimator
RANSAC (Random Sample Consensus)
Ordinary Least Squares (OLS) regression
Huber regression
Why is evaluating the model on a separate test set crucial in Polynomial Regression?
To estimate the model's performance on unseen data and assess its generalization ability.
To calculate the model's complexity and determine the optimal degree of the polynomial.
To visualize the residuals and check for any non-linear patterns.
To fine-tune the model's hyperparameters and improve its fit on the training data.
Why might centering or scaling independent variables be insufficient to completely resolve multicollinearity?
It doesn't address the fundamental issue of high correlations between the variables.
It requires a large sample size to be effective.
It only works for linear relationships between variables.
It can make the model more complex and harder to interpret.
What is the primary role of a link function in a Generalized Linear Model?
It transforms the predictor variables to follow a normal distribution.
It establishes a connection between the linear predictor and the mean of the response variable.
It calculates the residuals between the observed and predicted values.
It determines the optimal number of predictor variables to include in the model.
What does a Variance Inflation Factor (VIF) value greater than 10 generally suggest?
Perfect multicollinearity
No multicollinearity
Heteroscedasticity
Severe multicollinearity
How do polynomial features help in capturing non-linear relationships in data?
They introduce non-linear terms, allowing the model to fit curved relationships.
They convert categorical variables into numerical variables.
They reduce the impact of outliers on the regression line.
They make the model less complex and easier to interpret.
Which of the following is a potential drawback of using robust regression methods?
They are not applicable to datasets with categorical variables
They can be computationally more expensive than OLS regression
They always result in models with lower predictive accuracy than OLS regression
They always require data normalization before model fitting
What does heteroscedasticity in a residual plot typically look like?
A U-shape or inverted U-shape
A straight line with non-zero slope
A funnel shape, widening or narrowing along the x-axis
A random scattering of points
Which of these is NOT a recommended approach for dealing with outliers in linear regression?
Transforming the data to reduce the outlier's influence
Investigating the cause of the outlier and correcting errors if possible
Automatically removing all outliers without investigation
Using robust regression methods less sensitive to outliers
Which metric is in the same units as the dependent variable, making it easier to interpret directly?
MAE
RMSE
Adjusted R-squared
R-squared