Good Statistical Research Essay Example
Type of paper: Essay
Topic: Cost, Distribution, Information, Correlation, Business, Commerce, Model, Statistics
Pages: 6
Words: 1650
Published: 2020/12/17
Introduction
In this paper I will discuss applications of the basics of statistics and probability theory related to a real world problem.
In this paper I consider how the cost of a pizza franchise affects the formation of the cost to start a selling pizzas business.
All calculations in this research work are provided by using Minitab 16 Statistics Software.
Body
The data was retrieved from:
http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html (Pizza Franchise).
We are given with the data of 36 observations with two variables.
X represents a cost of annual franchise fees (in $1000)
Y represents a start up cost (in $1000)
We believe that cost of franchise affects start up cost, hence, X is a predictor and Y is a response variable.
Follow the steps which are given in instructions to this research.
1.
a) Range and IQR:
Descriptive Statistics: Y; X
Variable Range IQR
Y 780,0 50,0
X 675,0 170,0
b) Make a histogram
A blue bell-shaped curve on the graphs represents a normal curve of the Gaussian distribution. I can compare the distribution of my variables to the normal distribution. The distribution of annual fees is right-skewed; the distribution of startup cost is symmetrical close to normal distribution.
c) Make a boxplot
The distribution of X is more ranged than Y. There are few outliers in X and there are many outliers in Y (they are represented as “*” on the graphs).
2. Make a scatterplot and describe direction, form, strength correlation, outliers
The direction of the association is positive; the form is close to linear (look at the group of the points in the right lower corner of the graph). It seems that the association is strong because the points are tightly concentrated around a straight line. Compute the correlation coefficient:
Correlations: X; Y
Pearson correlation of X and Y = 0,477
P-Value = 0,003
We can see that the correlation is moderate and positive (r=0.477) and significant at 1% level of significance (p=0.003).
There are some outliers of the data: a group of points in the left lower corner and one point in the right upper corner of the graph.
3. Construct a linear regression
a-f)
Mean and standard deviation:
Descriptive Statistics: Y; X
Variable Mean StDev
Y 1291,1 124,1
X 1134,8 158,6
The coefficient of correlation r was computed above and is equal to 0.477.
The coefficients a and b of the model are given in the output below:
Regression Analysis: Y versus X
The regression equation is
Y = 868 + 0,373 X
Predictor Coef SE Coef T P
Constant 867,6 135,1 6,42 0,000
X 0,3732 0,1179 3,16 0,003
S = 110,626 R-Sq = 22,8% R-Sq(adj) = 20,5%
Analysis of Variance
Source DF SS MS F P
Regression 1 122565 122565 10,01 0,003
Residual Error 34 416099 12238
Total 35 538664
The coefficient of determination R-square is 0.228. This means that approximately 22.8% of startup cost variance is explained by this model.
g) Using the obtained model calculate residuals and plot residuals vs x:
The data is located similarly to the original data. I can see that outliers affect the accuracy of my model
h) There are some outliers in my data, and some of them are influential:
Unusual Observations
Obs X Y Fit SE Fit Residual St Resid
10 1350 1830,0 1371,4 31,4 458,6 4,32R
31 750 1250,0 1147,5 49,0 102,5 1,03 X
33 700 1300,0 1128,8 54,5 171,2 1,78 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Make forecasts for X=1150, 1250, 1050:
y=868+0.373*1150=1296.95y=868+0.373*1250=1334.25y=868+0.373*1050=1259.65
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA