Skip to content

Linear Regression

In today’s discussion learned Linear regression which is a statistical approach allows us study relationship between two independent variables.

mathematically we can write the expression for linear regression is as follows:

Y=α+β1X

where Y is a dependent variable and X is an independent variable.

As per the Diabetes Dataset provided, the variables included in this particular dataset as follows %Obesity and %Inactivity.

which X represents %Obesity and Y represents %Inactivity as per the Equation and Dataset Provided.

Linear Regression helps us to predict the Diabetes disease by using the variables %obesity and %Inactivity in a linear way.

There are 1370 data points available for % inactivity, and all of these data points also have corresponding data for % diabetes. Descriptive statistics were calculated for the 1370 data points of % diabetes for which % inactivity data is available. The % diabetes data exhibits a slight skewness with a kurtosis of about 4, indicating some departure from a normal distribution. Similarly, % inactivity data is slightly skewed in the opposite direction with a kurtosis of about 2.
Correlation: Pearson’s correlation coefficient (Pearson’s R) was calculated between % diabetes and % inactivity, resulting in a value of approximately 0.442. This suggests a moderate positive correlation between the two variables.

Scatterplot: A scatterplot was created to visualize the relationship between % diabetes and % inactivity, and it was confirmed that the data pairs are correctly aligned by rows.
so, this analysis explores the relationship between % diabetes and % inactivity using available CDC data. The data indicates a moderate positive correlation between the two variables, and both variables exhibit some departure from a normal distribution. The dataset’s limited size may pose challenges for predictive modeling.

Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *