The Usefulness And Applicability Of Statistical

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Course Title: Business statistics II (BMKT 22122)

Course Coordinator: Mr. D. Wasantha Kumara

2st year 2nd Semester

Department of Marketing Management

University of Kelaniya

T Distribution

A type of probability distribution that is theoretical and resembles a normal distribution. A T distribution differs from the normal distribution by its degrees of freedom. The higher the degrees of freedom, the closer that distribution will resemble a standard normal distribution with a mean of 0, and a standard deviation of 1.

The use of a T distribution is precluded by the standard deviation of the population parameter being unknown and allows the analyst to approximate probabilities, based on the mean of the sample, the population, the standard deviation of the sample and the sample's degrees of freedom. As the sample's degrees of freedom approaches 50, the T distribution will virtually be identical to the normal distribution.

When to Use the T-Distribution vs. the Normal Distribution for Confidence Interval and Hypothesis Testing Problems for Means

Main Point to Remember:

You must use the t-distribution table when working problems when the population standard deviation (σ) is not known and the sample size is small (n<30).

General Correct Rule:

If σ is not known, then using t-distribution is correct. If σ is known, then using the normal distribution is correct.

What is Most Common Practice:

Since people often prefer to use the normal, and since the t-distribution becomes equivalent to the normal when the number of cases becomes large, common practice often is:

If σ known, then use normal.

If σ not known:

If n is large, then use normal.

If n is small, then use t-distribution.

What is Another Common Way Textbooks Teach This:

Textbooks often simplify this to "large-sample" vs. "small-sample" methods; use normal distribution with large samples and t-distribution with small samples. This is right almost all the time, because in real sampling problems we seldom have a basis for knowing σ. However, there can be some situations when we do have a basis for assuming a value for σ, such as using a σ based on past data, and in those situations even if sample size is small the correct procedure would be to use the normal distribution, so the simplified "large-sample" vs. "small sample" approach would lead to an error.

DEFINITION

The t distribution is a theoretical probability distribution. It is symmetrical, bell-shaped, and similar to the standard normal curve. It differs from the standard normal curve, however, in that it has an additional parameter, called degrees of freedom, which changes its shape.

DEGREES OF FREEDOM

Degrees of freedom, usually symbolized by df, is a parameter of the t distribution which can be any real number greater than zero (0.0). Setting the value of df defines a particular member of the family of t distributions. A member of the family of t distributions with a smaller df has more area in the tails of the the distribution than one with a larger df.

The effect of df on the t distribution is illustrated in the four t distributions below.

http://www.psychstat.missouristate.edu/introbook/sbgraph/tdist01.gif

Note that the smaller the df, the flatter the shape of the distribution, resulting in greater area in the tails of the distribution.

RELATIONSHIP TO THE NORMAL CURVE

The smart reader will no doubt observe that the t distribution looks similar to the normal curve. As the df increase, the t distribution approaches the standard normal distribution (http://www.psychstat.missouristate.edu/introbook/sbgraph/mu.gif=0.0, http://www.psychstat.missouristate.edu/introbook/sbgraph/sigma.gif=1.0). The standard normal curve is a special case of the t distribution when df=http://www.psychstat.missouristate.edu/introbook/sbgraph/inf.gif. For practical purposes, the t distribution approaches the standard normal distribution relatively quickly, such that when df=30 the two are almost identical.

According to the central limit theorem, the sampling distribution of a statistic (like a sample mean) will follow a normal distribution, as long as the sample size is sufficiently large. Therefore, when we know the standard deviation of the population, we can compute a z-score, and use the normal distribution to evaluate probabilities with the sample mean.

But sample sizes are sometimes small, and often we do not know the standard deviation of the population. When either of these problems occur, statisticians rely on the distribution of the t statistic (also known as the t score.

where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, and n is the sample size. The t distribution allows us to conduct statistical analyses on certain data sets that are not appropriate for analysis, using the normal distribution.

How are we to test a hypothesis of a sample Mean when the http://mips.stanford.edu/courses/stats_data_analsys/lesson_4/sigma.gifis unknown and the sample size is less than 30?

Enter the t-Distribution.

http://mips.stanford.edu/courses/stats_data_analsys/lesson_4/t_curv.gif 

http://mips.stanford.edu/courses/stats_data_analsys/lesson_4/yellowdot.gifProperties of the t-Distribution

If a population is essentially normal, then the distribution is:

http://mips.stanford.edu/courses/stats_data_analsys/lesson_4/t_dist.gif

This is the equation for the Student t-Distribution, or simply t-Distribution, for all samples of size n  less than 30.

To find the Rejection Region, we can use the t-Distribution Table. This table merely requires knowledge of the sample size, which allows us to calculate the Degrees of Freedom, and the Significance Level http://mips.stanford.edu/courses/stats_data_analsys/lesson_4/alpha.gif.

Degrees of Freedom = n - 1

A paired t-test is usually used when the two samples are dependent- this happens when each individual observation of one sample has a unique relationship with a particular member of the other sample.

A Paired T Test is a statistical test of signficance, testing the averages of two things or objects and how they relate to each other over time.

A Paired T Test may be used when there is;

One Measurement Variable

Two Nominal Variables

Generally, one of the nominal variables in the T Test will have one or two variables as an ending result. Meaning, that it will lead to one or two variables, yes/no, blue/red, taco/pizza, car/bike, and so forth.

A Paired T Test is used to compute whether the differences between two means is equal to 0. The Paired T Test is intended to study the differences between the means over a period of time, t.

For example we may wish to test if a newly developed intervention program for disadvantaged students is useful. For this, we need to obtain scores from say 22 students in a standardized test before administering the program. After the program is over, the same test needs to be administered again on the same group of 22 students and scores obtained.

The two samples- the sample of prior intervention scores and the sample of post intervention scores are related as each student has two scores. The samples are therefore dependent. The paired t-test can is applicable in such scenarios.

The t-distribution importancy

The t-distribution is one of the most useful statistics available to a behavioral scientist. A t-test is also used to assess if correlation coefficients and regression coefficients are significantly different from zero.In short, a behavioral scientist must have good working knowledge of the various uses of the t-statistic because its use is so prevalent. Part of this good working knowledge is understanding the assumptions that one makes when using a particular t-test.

For example, to use the independent samples t-test presented above, it must be assumed that the variance in Population 1 is equal to the variance in Population

A good researcher will check the validity of important assumptions associated with the statistical test before testing the hypothesis & How researchers use the observed t-values to make decisions about hypotheses.

In marketing research researcher is not very much interested in making comments about single variable against a known or given standard For example The market share for a new product will exceed 18% percent of dealers will prefer the new pricing policy .These statements can be translated to null hypothesis that can be tested using aone – sample test,such as (t) test for a single mean,the researcher is interested in testing weather the population mean conforms to a given hypothesis.

 The Chi Square Statistic

A chi square (X2) statistic is used to investigate whether distributions of categorical variables differ from one another.

Basically categorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no." In contrast, responses to such questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences between these two variables.

Another way to describe the Chi-square test is that it tests the null hypothesis that the variables are independent. The test compares the observed data to a model that distributes the data according to the expectation that the variables are independent. Wherever the observed data doesn't fit the model, the likelihood that the variables are dependent becomes stronger, thus proving the null hypothesis incorrect

The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.)

Chi-Square Test Requirements

1. Quantitative data.

2. One or more categories.

3. Independent observations.

4. satisfactory sample size (at least 10).

5. Simple random sample.

6. Data in frequency form.

7. Chi-square requires that you use numerical values, not percentages or ratios.

It is also important that you have enough data to perform a viable Chi-square test. If the estimated data in any given cell is below 5, then there is not enough data to perform a Chi-square test.

Chi Square test Formula

http://www.ndsu.edu/pubweb/%7Emcclean/plsc431/mendel/2-fig11a.gif

Expected Frequencies When you find the value for chi square, you determine whether the observed frequencies differ significantly from the expected frequencies.

DEGREES OF FREEDOM (As discussed in earlier)

Once you calculate a Chi-square value, you use this number and the degrees of freedom to decide the probability, or p-value, of independence. This is the crucial result of a Chi-square test, which means that knowing the degrees of freedom is critical Degrees of freedom are important in the chi square test because they factor in your calculations for the possibility of independence.

A Chi-square test can tell you information based on how you divide up the data. However, it cannot tell you whether the categories you constructed are meaningful.

For example, if you are working with data on groups of people, you can divide them into age groups (18-25, 26-40, 41-60...) or income level, but the Chi-square test will treat the divisions between those categories exactly the same as the divisions between male and female, or alive and dead!

It's up to you to assess whether your categories make sense, and whether the difference (for example) between age 25 and age 26 is enough to make the categories 18-25 and 26-40 meaningful. This does not mean that categories based on age are a bad idea, but only that you need to be aware of the control you have over organizing data of that sort.

http://filebox.vt.edu/users/jamonroe/5116/week7/xdist.gif

The critical region, as before, is the upper blue region shown in the graph, and the boundary of the critical region is called the critical value. For a 5% level of significance, the critical value is written as χ2.

Steps for using and interpreting chi-square

 

 1. State the null and research/alternative hypotheses.

 2. Specify the decision rule and the level of statistical significance for the test, i.e., .05, .01, or .001. (A significance level of .01 would mean that the probability of the chi-square value must be .01 or less to reject the null hypothesis, a more stringent criterion than .05.)

3. Compute the expected values.

 4. Compute the chi-square statistic.

 5. Determine the degrees of freedom for the table. Then identify the critical value of chi-square at the specified level of significance and appropriate degrees of freedom.

6. Compare the computed chi-square statistic with the critical value of chi-square; reject the null hypothesis if the chi-square is equal to or larger than the critical value; accept the null hypothesis if the chi-square is less than the critical value.

7. State a substantive conclusion, i.e., describe the meaning and importance of the test results in terms of the historical problem under investigation.

Chi Square test to measure whether subjects from the high-tech community have more internet informational concerns than individuals from the small town do. The research team defines information issues as concerns about content legitimacy, privacy, security, spam, and others. The team reports statistically significant results as follows:

 

 

Rural (n = 24)

High-tech (n = 24)

Have information concerns

13

22

Don’t have information concerns

11

2

 

The figure above appears to illustrate a valid application of the Chi Square test. The categorical independent variable, community type, is compared against a categorical dependent variable, concern about internet information. Thus, the data parameters fit the Chi Square parameter requirements.

Chi Square is not the correct statistical test, however, because it is applicable when each of the cell values is x > 5, and the ‘High-tech’/’Don’t have information concerns’ cell has a value of 2.

The Exact would have been the appropriate test because it does not have a minimum cell value requirement.

A quick run of the Exact test shows the study results were still statistically significant when it was applied.

The chi-square test Applicability

The chi-square test is most widely used to conduct tests of hypothesis that involve data that can be presented in a 2×2 table.

Indeed, this tabular format is a feature of the case-control study design that is commonly used in public health research. Within this contingency table,

we could denote the observed counts as shown in Table 1. Under the null hypothesis of no association between the two variables, the expected number in each cell under the null hypothesis is calculated from the observed values using the formula outlined in Table 2.

The use of the chi-square test can be illustrated by using hypothetical data from a study investigating public health research.

chi-square test is to examine issues of fairness and cheating in games of chance, such as cards, dice, and roulette. Since such games usually involve wagering, there is significant incentive for people to try to rig the games and allegations of missing cards.

One-way analysis of variance (ANOVA)

Analysis of variance (ANalysis Of VAriance) is a general method for studying sampled-data relationships

The method enables the difference between two or more sample means to be analysed, achieved by subdividing the total sum of squares. One way ANOVA is the simplest case. The purpose is to test for significant differences between class means, and this is done by analysing the variances. Incidentally, if we are only comparing two different means then the method is the same as the $t-test$for independent samples. The basis of ANOVA is the partitioning of sums of squares into between-class ($SS_{b}$) and within-class ($SS_{w}$). It enables all classes to be compared with each other simultaneously rather than individually; it assumes that the samples are normally distributed. The one way analysis is calculated in three steps, first the sum of squares for all samples, then the within class and between class cases. For each stage the degrees of freedom are also determined, where degree of freedom is the number of independent `pieces of information' that go into the estimate of a parameter. These calculations are used via the Fisher statistic to analyse the null hypothesis.

The null hypothesis that we are going to test is based upon the assumption that there is no significant difference among the means of the different among the means of different populations .

The null hypothesis states that there are no differences between means of different classes, suggesting that the variance of the within-class samples should be identical to that of the between-class samples (resulting in no between-class discrimination capability).

The alternate hypothesis will state that at least two means are different from each other .in order to accept the null hypothesis ,all means must be equal. Even if one mean is not equal to the others ,then we cannot accept the null hypothesis .the simultaneous comparison of several populations means is called ANOVA.

Assumptions

The methodology based on the the following assumptions

1.Each sample size is drawn randomly and each sample is independent of the other sample

2.Normally distributed population.

3.The populations from which the samples are drawn have equal variance.

Analysis of Variance

1.one factor One way analysis of variance

2.More than one factor N-way analysis of variance

In anova ,a particular combination of factor levels or catergories.

One way analysis of variance

Marketing researchers are often interested in examining the difference between the mean values of dependent variable for several categories of a single independent variable or factor.

Only the sample means of each group are used when computing the between group variance. In other words, we don't look at the actual data in each group, only the summary statistics.

In the between group variation, each data value in the group is assumed to be identical to the mean of the group, so we weight each squared deviation with the sample size for that group.

How ever because of samling error s and other Variations, some disparity between these two values will be there,even when the null hypothesis is true,meaning that all population mean are equal.The extent of disparity between the two variances and populations are equal.

The extent of disparity between the two variances and consequently ,the value of F,will influence our decision on whether to accept or reject the null hypothesis. It is logical to conclude that if the population means are not equal then their sample leans will also vary greatly from another, resulting in a larger value of http://www.psychstat.missouristate.edu/introbook/sbgraph/sigma.gif2between,and hence a larger value of F.

According to larger the value of F,the more likely the decision to reject the null hypothesis .But how large the value of F be so as to reject the null hypothesis ?

The answer is that the computed value of F must be larger than the critical value of F given level of significance and calculated number of degree of freedom

1.The numerator.

2.The denominator.

The F distribution

The F distribution is an asymmetric distribution that has a minimum value of 0, but no maximum value. The curve reaches a peak not far to the right of 0, and then gradually approaches the horizontal axis the larger the F value is. The F distribution approaches, but never quite touches the horizontal axis. The F distribution has two degrees of freedom, d1 for the numerator, d2 for the denominator. For each combination of these degrees of freedom there is a different F distribution. The F distribution is most spread out when the degrees of freedom are small. As the degrees of freedom increase, the F distribution the F distribution is less dispersed.

Picture shows the shape of the distribution. The F value is on the horizontal axis, with the probability for each F value being represented by the vertical axis. The shaded area in the diagram represents the level of significance α shown in the table.

There is a different F distribution for each combination of the degrees of freedom of the numerator and denominator. Since there are so many F distributions, the F tables are organized somewhat differently than the tables for the other distributions. The three tables which follow are organized by the level of significance.

The first table gives F values for that are associated with α = 0.10 of the area in the right tail of the distribution. The second table gives the F values for α = 0.05 of the area in the right tail, and the third table gives F values for the α = 0.01 level of significance. In each of these tables, the F values are given for various combinations of degrees of freedom.

The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares Between groups. If the sample means are close to each other (and therefore the Grand Mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom.

ANOVA table

anova table

If there are n total data points collected, then there are n−1 total degrees of freedom & If there are m groups being compared, then there are m−1 degrees of freedom associated with the factor of interest.

If there are n total data points collected and m groups being compared, then there are n−m error degrees of freedom.

Now, the sums of squares (SS) column:

(1) As we'll soon formalize below, SS(Between) is the sum of squares between the group means and the grand mean. As the name suggests, it quantifies the variability between the groups of interest.

(2) Again, as we'll formalize below, SS(Error) is the sum of squares between the data and the group means. It quantifies the variability within the groups of interest.

(3) SS(Total) is the sum of squares between the n data points and the grand mean. As the name suggests, it quantifies the total variabilty in the observed data. We'll soon see that the total sum of squares, SS(Total), can be obtained by adding the between sum of squares, SS(Between), to the error sum of squares, SS(Error). That is:

SS(Total) = SS(Between) + SS(Error)

The mean squares (MS) column, as the name suggests, contains the "average" sum of squares for the Factor and the Error:

(1) The Mean Sum of Squares between the groups, denoted MSB, is calculated by dividing the Sum of Squares between the groups by the between group degrees of freedom. That is, MSB = SS(Between)/(m−1).

(2) The Error Mean Sum of Squares, denoted MSE, is calculated by dividing the Sum of Squares within the groups by the error degrees of freedom. That is, MSE = SS(Error)/(n−m).

The F column, not surprisingly, contains the F-statistic. Because we want to compare the "average" variability between the groups to the "average" variability within the groups, we take the ratio of the Between Mean Sum of Squares to the Error Mean Sum of Squares. That is, the F-statistic is calculated as F = MSB/MSE

How anova is being practiced

To conduct findings in retailers, wholesalers and agents differ in their in their attitudes toward the firm’s distribution policies.

To do a brand evalution of groups exposed to different commercials.

To find various segments differ in terms of their volume of product consumption.

To find consumer’s intentions to buy the brand vary with different prices

What is the effect of consumers familiarity with the store on preference for the store.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now