Which method in pandas is used to read a CSV file containing the dataset for Linear Regression?
Explanation:
The read_csv()
function in pandas is the standard way to import data from a CSV file into a pandas DataFrame, suitable for use in Linear Regression.
If the coefficient of determination (R-squared) for a linear regression model is 0.8, what does this indicate?
Explanation:
R-squared, ranging from 0 to 1, represents the proportion of variance in the dependent variable that is predictable from the independent variable(s). An R-squared of 0.8 indicates a good fit, with 80% of the variation explained.
What is the purpose of the coefficient of determination (R-squared) in linear regression?
Explanation:
R-squared quantifies the goodness of fit of the model, indicating how well the independent variable(s) explain the variation in the dependent variable.
What is a potential drawback of using a purely automated feature selection technique (like forward selection or backward elimination) without careful consideration?
Explanation:
Automated techniques may miss interactions between features. A feature unimportant on its own could be highly relevant when combined with others.
What does the 'fit_intercept' parameter in 'LinearRegression()' control?
Explanation:
The 'fit_intercept' parameter in 'LinearRegression()' determines whether the model should fit an intercept term to the linear equation.
What graphical tool is commonly used to visualize the relationship between two continuous variables in linear regression?
Explanation:
Scatter plots are the go-to visualization for exploring the relationship between two continuous variables, helping us assess linearity, direction, and strength of the association.
Backward elimination in linear regression involves removing features based on what criterion?
Explanation:
In backward elimination, features are removed one by one, selecting the feature at each step that, when removed, causes the least degradation to the model's performance.
Who is credited as a pioneer in developing the method of least squares, a foundational element of linear regression?
Explanation:
Carl Friedrich Gauss, a prominent mathematician, is widely recognized for his contributions to the development of the least squares method, a fundamental technique employed in linear regression to estimate the best-fitting line through a set of data points.
What does a residual represent in linear regression?
Explanation:
Residuals measure the vertical distance between each data point and the regression line. A positive residual means the model underpredicted, while a negative residual means it overpredicted.
Which of these is a common visual tool for diagnosing heteroscedasticity?
Explanation:
A scatter plot of residuals against predicted values is a standard diagnostic tool for checking homoscedasticity. If the spread of residuals is relatively constant across the range of predicted values, it suggests homoscedasticity. A funnel-shaped pattern often indicates heteroscedasticity.
Which assumption of linear regression ensures that the relationship between the independent and dependent variables is linear?
Explanation:
Linearity is the fundamental assumption of linear regression. It assumes that the relationship between the independent and dependent variables can be best represented by a straight line.
Which of the following is NOT a benefit of feature selection in linear regression?
Explanation:
Feature selection actually helps to reduce the risk of overfitting. By removing irrelevant features, we prevent the model from learning noise and fitting too closely to the training data.
A positive coefficient of the independent variable in a simple linear regression model indicates what?
Explanation:
A positive coefficient signifies a positive linear relationship. As the independent variable increases, the dependent variable also tends to increase.
What type of visualization tool is commonly used to initially assess the relationship between two continuous variables in linear regression?
Explanation:
Scatter plots are ideal for visualizing the relationship between two continuous variables, giving a visual indication of whether a linear relationship exists.
Which of the following indicates a strong positive correlation between two variables?
Explanation:
A correlation coefficient close to 1 signifies a strong positive linear relationship, where both variables increase together.