How do polynomial features help in capturing non-linear relationships in data?
They make the model less complex and easier to interpret.
They convert categorical variables into numerical variables.
They introduce non-linear terms, allowing the model to fit curved relationships.
They reduce the impact of outliers on the regression line.
Which of the following is NOT a valid approach to address multicollinearity?
Transforming the independent variables (e.g., using principal component analysis)
Removing one or more of the highly correlated independent variables
Increasing the sample size of the dataset
Centering or scaling the independent variables
What does the adjusted R-squared value tell you in multiple linear regression?
The statistical significance of the overall model.
The presence of outliers in the data.
The proportion of variance in the outcome explained by the predictors, adjusted for the number of predictors in the model.
The accuracy of the model's predictions.
Poisson regression, another type of GLM, is particularly well-suited for analyzing which kind of data?
Continuous measurements
Count data of rare events
Proportions or percentages
Ordinal data with a specific order
How does stepwise selection work in feature selection?
It transforms the original features into a lower-dimensional space while preserving important information.
It ranks features based on their correlation with the target variable and selects the top-k features.
It iteratively adds or removes features based on a statistical criterion, aiming to find the best subset.
It uses L1 or L2 regularization to shrink irrelevant feature coefficients to zero.
What advantage does Polynomial Regression offer over simple Linear Regression when dealing with non-linear relationships between variables?
It introduces polynomial terms, enabling the model to fit curved relationships in the data.
It always results in a better fit regardless of the data distribution.
It simplifies the model, making it easier to interpret.
It reduces the need for feature scaling.
What is a common consequence of autocorrelation in linear regression?
Heteroscedasticity
Biased coefficient estimates
Inflated standard errors of coefficients
Reduced model fit
Which of the following is a method for detecting outliers in linear regression?
Leverage values
Residual plots
All of the above
Cook's distance
The performance of the Theil-Sen estimator can be sensitive to which characteristic of the data?
The presence of categorical variables
The presence of heteroscedasticity (unequal variances of errors)
The presence of multicollinearity (high correlation between independent variables)
The non-normality of the residuals
You are working with a dataset that has a skewed distribution of errors. Which metric would be a more appropriate measure of model performance?
Adjusted R-squared, as it is not affected by the distribution of errors.
MAE, as it is less influenced by extreme values in a skewed distribution.
RMSE, as it is less sensitive to skewed distributions.
R-squared, as it provides a standardized measure of fit.