Statistics Essay
Type of paper: Essay
Topic: Education, Size, Distribution, Information, Gender, Discrimination, Height, Correlation
Pages: 7
Words: 1925
Published: 2020/10/10
Introduction
The company is interested in making one-sized shoes only and the data has been provided I will need to establish if the data is from a random variable before I proceed. It is important to observe the data for possible outliers which are clearly not from the sample for instance in a study to observe the number of youth between the ages of 15-25 engaged in smoking cigarettes a value of age maybe 81 will stand out as a data handling error. Someone may have been intending to enter 18 and in the rush they fail to realize that they entered 81. Alternatively other statistics could be used in place of the mean and variance which are in most cases the ones affected by outliers. In the place of the mean the median could be used and the interquartile range could be used to replace the standard deviation. However the median in many cases does not present the sample as required especially in cases where the sample values are skewed either to the right or the left. The semi-interquartile range could be used also as an alternative or the interquartile range and is usually preferred because it represents a middle value.
Discussion
After I have established the nature of our data I will first determine the descriptive that is the mean, mode and median before I carry out tests to that will help us in understanding whether the sample distribution is identical to the distribution of a normally distributed random variable (Kanji 26). Key to our analysis is the relationship that the variables in the sample data have with each other and this will help me in making a decision whether the company should or should not make a one-sized shoe and from there I can present my conclusions and recommendations. The relationship between these variables can strongly be relied on if the distribution of the sample data is identical to the distribution of a normally distributed random variable. I also need to estimate the proportion of males in the population and establish parametrically if they differ with the proportions of females.
Since the data for both height and shoe size are both continuous and quantitative, I can establish if they come from a random variable. Although I are utilizing a non-parametric test to establish if the sample does indeed come from a random variable it does not jeopardize our estimates interpretation and inference. I carry out a runs test to establish if the sample data is from a random sample. I am using the null hypothesis that the data does not come from a random variable. The results indicate that the tests is insignificant and as such I fail to reject the null hypothesis that the data does not come from a random variable. The distribution of gender is most probably binomial and I need to carry out a binomial test to determine if the proportions of females is significantly similar to that of males. The observed proportions of Females-1 is 0.51 and that of males is 0.49. The test is insignificant thus I cannot reject the hypothesis that the proportion of females equals to 0.5. The test is insignificant even at 90% confidence level which is considered too loose a test (Kanji 208).
I would like to determine the basic statistics which I will use to establish the distribution of the sample. A normal distribution has the same mean, mode and median, thus we will be more interested in the measures of central tendency than in the measures of dispersion. I am particularly interested in the normal distribution in my analysis because when a sample is normally distributed I can make statistically inference about the population freely. Furthermore important tests like the t-test and the f-test can be carried out only on condition that the sample is normally populated. The z test also requires the normality assumption and working on a normalized data is not only easier but also presents reliable inferences and interpretation (Kanji 41). In our case I will utilize the coefficient of correlation to establish the degree of the relationship between gender and shoe size and also the magnitude of the relationship between height and shoe size. I will then test the hypothesis that the distribution of the sample is identical to that of the normal distribution. If there is no significant difference between the sample distribution and the normal distribution. If the case is so I will use the assumption of normality to carry out tests that will help us establish if the company can go ahead with the decision to make one size shoe.
The mean shoe size is 9.1429 while the median shoe size is 9, the modal shoe size is 7 and the skewness is 0.367, I obtained a kurtosis of -1.083 for the shoe size data. The mean height of the sample is 68.94 and the median height is 70 which is equivalent to the modal height. There are multiple modes in this sample and the smallest is selected. I obtained a variance of 16.323, the skewness was -0.233 and the kurtosis obtained was 0.336 for the height data. The skewness of a distribution measures the degree of symmetry of that distribution it is therefore imperative that a symmetrical distribution has a skewness of zero. If the skewness obtained is greater than zero then, the distribution is skewed to the right and if it is less than zero that is negative then the distribution is skewed to the left. Kurtosis is a measure of the extent to which the frequencies are distributed close to the mean. A bell shaped distribution will most likely have a kurtosis of three. A flat shaped distribution on the other hand will have a kurtosis which is more than three. Kurtosis will assist us in understanding better how the decision by the company to manufacture the same shoe size may impact on the population.
In the event that the sample is not normally distributed and we fail to establish a transformation of the variable that will be normally distributed, we will have to use the non-parametric alternatives which although they are not as strong as the parametric tests, they will also allow us to make an inference on the population using the sample we have. The fact that the mean, mode and median slightly differ in both the shoe size and the height data does not rule out our sample from being normally distributed. I have observed from the HISTOGRAMS and Q-Q plots and I have established that the distribution is somewhat close to that of a normal distribution. Since observation alone is non- conclusive I will need to carry out parametric test to determine if the distribution of the sample is identical to the distribution of a normally distributed random variable. I have used the one sample KOLMOGOROV- Smirnov Test to establish the normality of the sample distribution and I have established that the Test distribution is normal based on the calculations obtained from the data (Kanji 109).
Firstly, it is of interest to determine if the shoe size differs across gender. Graphs are used to generate boxplots that will help indicate if there is a significant difference between the shoe sizes based on gender. It is observed that the upper limit of mean of the female shoe size is below the lower limit of the mean shoe size of the male gender. The outliers for the mean shoe sizes do not differ much from the sample and as such they will not be ruled out from the sample. In some cases due to errors in data entry and handling outliers which are too far from the sample are observed. It is crucial to be able to establish if they are really from the sample or they are errors that may have been introduced during data handling. Although some outliers do fall in some of these categories, it can be safely concluded that the mean shoe size of the male gender is higher than the mean shoe size of the female gender which is actually the case in normality. The company has considered making one size of shoe regardless of height and gender.
Since now I have established that the distribution of the sample is normal I can establish using parametric tests whether or not there is a correlation between gender and shoes size. I therefore proceed to carry out parametric tests on our sample data. I have to carry out a test to ascertain whether there is a correlation between height and shoe size. Since both variables that is height and shoe size are quantitative, I will use the Pearson method of computing correlation coefficient between variables. The CORRELATION coefficient between height and shoe size is 0.853 indicating a strong positive correlation. The test is also significant at 99% confidence interval which increases the reliability of our estimated value and its interpretation. The fact that the correlation coefficient is positive indicates that height and shoe size change in the same direction that is an increase in height will lead to an increase in shoe size. The value 0.853 indicates that the size of the relationship is very large. The correlation coefficient also between gender and shoe size is also very large 0.804.
I used the dummy variables male-0 and female-1 and as expected the correlation coefficient is negative but in this case we are interested in the value since numbering for gender was just to enable us to compute. The magnitude of the CORRELATION coefficient indicates that the relationship between gender and shoe size is very large. The fact that the test is significant at 99% confidence level indicates that our estimate of the correlation coefficient and the interpretation of the result is highly reliable. In normal situations a 95% confidence level is considered appropriate so as to enable us to strike a balance between not having too strict (99%) a test and not using a too loose one (90%) either (Kanji 19). At the moment establishing the degree of relationship between gender and height may not be instrumental to the task at hand since we are interested in the outcomes of the company’s decision to make a one sized shoe for all which is not targeted at a specified gender.
The last two analyses on the correlation coefficient the first between height and shoe size and the second between gender and shoe size are both significant at 99% level of confidence. This coupled with the fact that our data fitted that of the normal distribution model greatly increases the reliability of our analysis and the subsequent interpretation. Shoe size is highly correlated to height and gender. I was also able to establish that the means differ across the different genders so the company will have a difficult time deciding on the standard shoe size without being discriminative on gender lines.
Conclusion and Recommendation
The company should not go ahead with its decision to make a one sized shoe since it will have a negative impact on its clients and possibly its market share. As it is facing financial hardships, I would recommend that a further study is carried out to establish the shoe size and height of most of its clients and maybe come up with a range of shoe sizes that do not differ much and are targeted at retaining its loyal clients and its market share. Alternatively other cost cutting precautions could be adopted so as to achieve the current budgetary expectations of the company.
Work Cited
Kanji, Gopal. 100 Statistical Tests. New York: SAGE, 2006. Print.
Appendix
Binomial Test
a Based on Z Approximation.
Statistics
a Multiple modes exist. The smallest value is shown
One-Sample Kolmogorov-Smirnov Test
a Test distribution is Normal.
b Calculated from data.
Binomial Test
a Based on Z Approximation.
Correlations
** Correlation is significant at the 0.01 level (2-tailed).
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA