Who is credited with developing the foundational principles of linear regression?
Sir Francis Galton
Albert Einstein
Marie Curie
Isaac Newton
What is the main difference between forward selection and backward elimination in linear regression?
Forward selection is used for classification, while backward elimination is used for regression.
Forward selection starts with no features and adds one by one, while backward elimination starts with all features and removes one by one.
There is no difference; both techniques achieve the same outcome.
Forward selection starts with all features and removes one by one, while backward elimination starts with no features and adds one by one.
In the context of linear regression, what is an error term?
The difference between the observed value of the dependent variable and the predicted value.
The variation in the independent variable.
The difference between the slope and the intercept of the regression line.
A mistake made in collecting or entering data.
Which of the following is NOT an assumption of linear regression?
Homoscedasticity
Multicollinearity
Linearity
Normality of residuals
Which Python library is primarily used for numerical computing and provides support for arrays and matrices, essential for Linear Regression calculations?
matplotlib
scikit-learn
pandas
NumPy
In forward selection, what criteria is typically used to decide which feature to add at each step?
The feature that is least correlated with the other features
The feature with the highest p-value
The feature that results in the smallest increase in R-squared
The feature that results in the largest improvement in model performance
How does the Mean Squared Error (MSE) penalize larger errors compared to smaller errors?
It doesn't; all errors are penalized equally.
It squares the errors, giving more weight to larger deviations.
It uses a logarithmic scale to compress larger errors.
It takes the absolute value of the errors, ignoring the sign.
Why is normality of errors an important assumption in linear regression?
It is necessary for the calculation of the regression coefficients
It guarantees the homoscedasticity of the errors
It ensures the linearity of the relationship between variables
It validates the use of hypothesis testing for the model's coefficients
What is the purpose of splitting the dataset into training and testing sets in Linear Regression?
To handle missing values in the dataset.
To reduce the dimensionality of the data.
To evaluate the model's performance on unseen data.
To visualize the relationship between variables.
What type of visualization tool is commonly used to initially assess the relationship between two continuous variables in linear regression?
Scatter plot
Bar chart
Histogram
Pie chart