Regression Modeling in Practice - Week2 | Basics of Linear Regression

This is the second assignment or the third course (of five) Data Analysis and Interpretation Specialization detailed information about it can be seeing here.

Index

Variables

Details of my project can be seeing here, to get easier,I made a summary bellow:

Variable Name Description Type
Income moderator: GPD per capita (1) Explanatory Quantitative
Life Explanatory Variable: Life Expectancy (2) Response Quantitative

(1) 2010 Gross Domestic Product per capita in constant 2000 US$”

(2) 2011 life expectancy at birth (years)

Center the Explanatory Variable

Though my explanatory variable is quantitative, there is much observation around zero value, even so, I center it to attend the requisites. the code bellow shows the procedure, first I calculated the mean and then a new variable (income_center) that represents the new values for var “income”, now with new mean.

# Measures for center and graph
measures = OrderedDict()
measures['Mean'] = data1.income.mean()

# New var for center the mean
data1['income_center'] = data1.income-measures['Mean']

measures['Center'] = data1.income_center.mean()
measures['cMin'] = data1.income_center.min()
measures['cMax'] = data1.income_center.max()
measures['Min life'] = data1.life.min()
measures['Max life'] = data1.life.max()

# Table shows measures
print (tabulate([measures], tablefmt='grid', headers='keys'))
Mean Center cMin cMax Min life Max life
7327.44 -1.1007e-12 -7223.67 44974.1 47.794 83.394

This table shows the measures, Mean is the original mean and the center is the new mean, other measures (min and max of center, min and max of life) was calculated for help in graph plot.

For see the code, click here.

Summary

print ("OLS regression model")
reg1 = smf.ols('life ~ income', data=data1).fit()
print (reg1.summary())

OLS regression model
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   life   R-squared:                       0.362
Model:                            OLS   Adj. R-squared:                  0.358
Method:                 Least Squares   F-statistic:                     98.65
Date:                Sat, 24 Sep 2016   Prob (F-statistic):           1.07e-18
Time:                        13:57:02   Log-Likelihood:                -610.14
No. Observations:                 176   AIC:                             1224.
Df Residuals:                     174   BIC:                             1231.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------
Intercept        69.6547      0.588    118.550      0.000        68.495    70.814
income_center     0.0006   5.58e-05      9.932      0.000         0.000     0.001
==============================================================================
Omnibus:                       19.382   Durbin-Watson:                   1.948
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               23.222
Skew:                          -0.877   Prob(JB):                     9.06e-06
Kurtosis:                       2.698   Cond. No.                     1.05e+04
==============================================================================

The results of the linear regression model shows high F-statistic (98.05) and a very small p-value (1.07e-18), considerably less than our alpha level of 0.05. This indicated that life expectancy was significantly and positively associated with number level income.

The coefficient for income is 0.0006 ande the intercept is 65.5966. So now we know that our equation for the best fit line of this graph is:

life = 69.6547 + 0.0006 * income

The OLS regression results shows the R-Squared = 0.36 (The proprotion of the variance in the response variable that can be explained by the Explanatory variable), indicating that this model accounts for about 36% of the variability we see in response variable, life.

For see the code, click here.

graph

plot

See the entire code for this week, here.

Written on September 24, 2016