What does a high R-squared value indicate?
The independent variables are not correlated with the dependent variable.
A large proportion of the variance in the dependent variable is explained by the independent variables.
The model is not a good fit for the data.
The model is a perfect fit for the data.
Why is normality of errors an important assumption in linear regression?
It validates the use of hypothesis testing for the model's coefficients
It ensures the linearity of the relationship between variables
It guarantees the homoscedasticity of the errors
It is necessary for the calculation of the regression coefficients
Which method in pandas is used to read a CSV file containing the dataset for Linear Regression?
read_csv()
from_csv()
load()
loadtxt()
What does a correlation coefficient of 0 indicate?
A perfect linear relationship
A strong positive linear relationship
No linear relationship
A strong negative linear relationship
Which of these is a common visual tool for diagnosing heteroscedasticity?
Scatter plot of residuals vs. predicted values
Box plot
Normal probability plot
Histogram
Who is credited with developing the foundational principles of linear regression?
Isaac Newton
Sir Francis Galton
Albert Einstein
Marie Curie
What is the purpose of splitting the dataset into training and testing sets in Linear Regression?
To visualize the relationship between variables.
To handle missing values in the dataset.
To reduce the dimensionality of the data.
To evaluate the model's performance on unseen data.
What is the primary goal of feature selection in linear regression?
Introduce bias into the model
Increase the complexity of the model
Maximize the number of features used in the model
Improve the model's interpretability and reduce overfitting
If a Durbin-Watson test statistic is close to 2, what does it suggest about the residuals?
They are homoscedastic
They are normally distributed
They are independent
They exhibit a linear pattern
What does the linearity assumption in linear regression imply?
The dependent variable must have a normal distribution.
The relationship between the dependent and independent variables can be best represented by a straight line.
The data points are evenly distributed around the regression line.
The independent variables are unrelated to each other.