Determine The Coefficient Of Correlation

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Course project: Part C

A scatter plot is a table of prearranged pair off (x, y) of facts containing the independent variable x along with the dependent variable, y.

As for the scatter plot it is apparent that the slope of the ‘best fit’ line is optimistic/positive, which specify that Credit Balance varies straightforwardly with Size. Since Size amplify, Credit Balance increases as well as vice versa. We utilize a "line of best fit" to construct predictions based on past information. There are numerous complex statistical formulas we could make use of to come across this line, although for at the present we will just estimate it through drawing a line throughout the points on the graph that seem like it fits the drift of the data.

Q2 ) Determine the equation of the "best fit" line, which describes the relationship between CREDIT BALANCE and SIZE.

Credit Balance ($) = 2591 + 403.2 Size

Q3 ) Determine the coefficient of correlation. Interpret.

As the graph has revealed that as the credit balance increases the size is also increased.

Ho: p=0 (There is no relationship among the two variable)

H1: p# 0 (There is a strong relationship among this two variable)

Correlations: Credit Balance($), Size

Pearson correlation of Credit Balance($) and Size = 0.752

P-Value = 0.000

From the table we are able to determine that the Pearson correlation of size along with credit balance is 0.752 which is larger than the significance level. Consequently, there is no connection among credit balance as well as size.

Q4) Determine the coefficient of determination. Interpret.

By using the excel we come out with the r –square that is R2 = 0.5668. Higher than .95 reveals a very strong correlation and anything below .50 reveals there is no co-relation and between 50 to 70 percentage consider the very weak correlation. Therefore, credit balance and size has a very weak correlation.

Q5) Test the utility of this regression model (use a two tail test with α =. 05). Interpret your results, including the p-value.

The grounds for doing regression is normally to make forecast So, one should inquire in which circumstances the resulting predictable regression line y= b0 + b1x is helpful. The hypotheses intended for the model utility test.

H0 : β1 = 0

Vs

HA : β1≠ 0

α =. 05 n=50 , so df n- 2 = 48 (as two parameters β0 as well as β1 have been predictable to obtain this far)

Rejection Region therefore , =2.011

Regression Analysis: Credit Balance($) versus Size

Predictor Coef SE Coef T P

Constant 2591.4 195.1 13.29 0.000

Size 403.22 50.95 7.91 0.000

Analysis of Variance

Source DF SS MS F P

Regression 1 24092210 24092210 62.64 0.000

Residual Error 48 18460853 384601

Total 49 42553062

As from the table above we are able to determine that t=7. 91 , therefore it falls in the rejection region and we can reject the null hypothesis and β1 is not equal to 0. We also can conclude our assumption based on the p-value which reveals that p value=0. 000 is less than significant level .05 that’s why we can reject the null hypothesis.

Q6) Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT BALANCE? Explain.

Since, the linear regression model Credit Balance (Y) = 2591.4+403.22*Size (X) is statistically significant with a reasonably high explanatory power (R2 =56.6%), we conclude that the model is useful for predicting the Credit balance using the variable Size.

Q7 ) Compute the 95% confidence interval for beta-1 (the population slope).  Interpret this interval.

The 95% confidence interval for the slope is (403.22-2.0096*50.95, 403.22+2.0096*50.95) = (300.7875, 505.6552). This interval means that, with 95% confidence we can say that the true value of the slope will be between (300.7875, 505.6552).

Q8) Using an interval, estimate the average credit balance for customers that have a household size of 5. Interpret this interval.

The 95% confidence interval for the mean credit balance for a customer that has a household size of 5 is (4368.2, 4846.9). This confidence interval means that with 95% confidence, we can say that the true value of the mean credit balance for a customer that has a household size of 5 will be within the interval ($4368.2, $4846.9)

Q9 ) Using an interval, predict the credit balance for a customer that has a household size of 5. Interpret this interval.

The 95% confidence interval for the mean credit balance for a customer that has a household size of 5 is (4368.2, 4846.9) as stated in question 8. The predicted value of the credit balance of a customer that has a household size of 5 is (4368.2+4846.9)/2 = $4607.5. This confidence interval means that with 95% confidence, we can say that the true value of the mean credit balance for a customer that has a household size of 5 will be within the interval ($4368.2, $4846.9)

Q10 ) What can we say about the credit balance for a customer that has a household size of 10? Explain your answer.

The 95% confidence interval for the mean credit balance for a customer that has a household size of 10 is (5927.0, 7320.4). The predicted value of the credit balance of a customer that has a household size of 10 is $6623.7. This confidence interval means that, with 95% confidence, we can say that the true value of the mean credit balance for a customer that has a household size of 10 will be within the interval ($5927.0, $7320.4).

From the data given, it can be seen that the household size 10 not in the range of data used to compute the regression line. Hence, this estimate of credit balance is not reliable.

Predicted Values for New Observations

New

Obs Fit SE Fit 95% CI 95% PI

1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2)

Values of Predictors for New Observations

New Obs Size 1 5.00

Predicted Values for New Observations

New

Obs Fit SE Fit 95% CI 95% PI

1 6623.7 346.5 (5927.0, 7320.4) (5195.3, 8052.0)XX

XX denotes a point that is an extreme outlier in the predictors.

Values of Predictors for New Observations

New Obs Size 1 10.0

Q11: Using MINITAB run the multiple regression analysis using the variables INCOME, SIZE and YEARS predict CREDIT BALANCE. State the equation for this multiple regression model.

The formula equation for a multivariate regression analysis is

y = b0 + (b1 x INCOME) + (b2 x SIZE) + (b3 x YEARS) + e

MiniTab provides the following basic descriptive analysis of the four variables.

Descriptive Statistics: Income ($1000), Size, Years, Credit Balance($)

Variable Mean StDev Minimum Q1 Median Q3 Maximum

Income ($1,000) 43.74 14.64 21.00 30.00 43.00 55.00 67.00

Size 3.420 1.739 1.000 2.000 3.000 5.000 7.000

Years 12.260 5.086 1.000 9.000 13.000 16.000 20.000

Credit Balance($) 3970 932 1864 3109 4090 4748 5678

Further MiniTab analysis of correlation between all 4 statistics provides the following.

Correlations: Income ($1000), Size, Years, Credit Balance($)

Income ($1000) Size Years

Size 0.198

Years -0.206 0.107

Credit Balance($ 0.627 0.752 0.008

Cell Contents: Pearson correlation

The Credit Balance / Income Pearson at 0.627 and the Credit Balance / Family Size at 0.752 are the most significant correlations seen in this data sample.

Now running the MiniTab regression analysis tool we obtain:- Regression Analysis:

Credit Balance: - v's: Income, Household Size & Years in residence

The regression equation is

Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years

Predictor Coef SE Coef T P

Constant 1276.0 273.6 4.66 0.000

Income ($1000) 32.272 4.348 7.42 0.000

Size 346.85 36.03 9.63 0.000

Years 7.88 12.34 0.64 0.526

S = 424.715 R-Sq = 80.5% R-Sq(adj) = 79.2%

Analysis of Variance

Source DF SS MS F P

Regression 3 34255444 11418481 63.30 0.000

Residual Error 46 8297619 180383

Total 49 42553062

Source DF Seq SS

Income ($1000) 1 16703393

Size 1 17478430

Years 1 73620

Residual Error 46 8297619 180383

Total 49 42553062

We can note the value of R2 = 80.5% which indicates we have a fairly close approximation. R2 is the proportion of variation in y that is ‘explained’ by the model. ie 19.5% is not explained.

Q12: The Global Test for Utility.

The value F is defined as

Explained variance / Unexplained variance

In the F-test we try to determine if the statistics we have done in fact justify our model.We define a null Hypothesis such that the Credit Balance is in no way determined by any of the proposed parameters Income, Size of household or Years in residence.

Ho: bi = 0 for all i > 0.

Ha: bI ¹ 0 for at least one value of i .

We ignore b0

The F-test statistic is defined as

MSR / MSE :

MSR: Mean of squares of variation explained by the model

MSE: Mean of squares of variation not explained by the model

Or

(SSR/ k) / ((SSE / (n-k-1))

SSR "Regression sum of squares: Si=1, n ( yî - y̅ )2

The sum of the squares of the differences between the predicted value of y and the mean value of y.SSE Residual or Error sum of squares.

Si=1, n ( Yi - yî ))

The sum of the squares of the differences between the value of you found in the sample data and the predicted value.

k = number of coefficients. In our case k=3: Income, family size and years in residence.

n = number of data samples. In our case n=50.

The Minitab printout provides the solution to this arithmetic F= 63.30.In order to evaluate the utility of out model we now need to compare the derived F-value with the critical values of F for a 95% confidence level, n=50 and k=3.

Using a calculator I found on the net I found this value to be 2.807. Since F is very large compared to this value we can say that there is significant correlation between our derived model given by the b estimates and the results found in the data. Minitab also gives p-values for each coefficient, all are 0 except for YEARS which is 0.526 which indicates that there is no strong correlation between years in residence and credit balance.

Q13: T-tests

Regression Analysis: Credit Balance($) versus Income ($1000)

The regression equation is

Credit Balance($) = 2226 + 39.9 Income ($1000)

Predictor Coef SE Coef T P

Constant 2226.0 330.0 6.75 0.000

Income ($1000) 39.882 7.161 5.57 0.000

S = 733.849 R-Sq = 39.3% R-Sq(adj) = 38.0%

Analysis of Variance

Source DF SS MS F P

Regression 1 16703393 16703393 31.02 0.000

Residual Error 48 25849669 538535

Total 49 42553062

Regression Analysis: Credit Balance ($) versus Size

The regression equation is Credit Balance($) = 2591 + 403 Size

Predictor Coef SE Coef T P

Constant 2591.4 195.1 13.29 0.000

Size 403.22 50.95 7.91 0.000

S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7%

Analysis of Variance

Source DF SS MS F P

Regression 1 24092210 24092210 62.64 0.000

Residual Error 48 18460853 384601

Total 49 42553062

000

S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7%

Regression Analysis: Credit Balance ($) versus Years

The regression equation is Credit Balance($) = 3952 + 1.5 Years

From the information provided we can see that the p-value for Years in residence v's Credit balance is 0.955 which indicates there is little correlation between these figures.If we proceed to eliminate this parameter and use only family size and income to predict the credit balance we obtain the following:

Regression Analysis: Credit Balance ($) versus Income ($1000), Size

The regression equation is Credit Balance($) = 1389 + 31.6 Income ($1000) + 350 Size

S = 422.033 R-Sq = 80.3% R-Sq (adj) = 79.5%

Analysis of Variance

Source DF SS MS F P

Regression 2 34181824 17090912 95.96 0.000

Residual Error 47 8371239 178111

Total 49 42553062

Q14: Is this multiple regression model better than the linear model that we generated in parts 1-10? Explain.

In the Q1-10 Credit balance was examined for correlation with family size. Here we have also analyzed the predictive impact of family income and the number of years in residence at the current address. We have found that there is no significant correlation found in the sample data between Credit Balance and years in residence but there is a significant correlation with family income. This multi-variate regression equation can therefore be expected to provide better predictive results.

Credit Balance($) = 31.6 x Family Income($) + $350 x Family Size + $1,389.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now