Type of paper: Essay

Topic: Information, House, Bedroom, Value, Confidence, Size, Model, Statistics

Pages: 4

Words: 1100

Published: 2020/10/31

Introduction

In this paper we show basics of statistical methods and hypothesis testing to investigate association between the variables of the given data set. We are working with a data of 781 houses which have been sold in San Luis Obispo county and around it. The purpose of this paper is to find a relationship between house prices and other factors which may affect the price. The data has been introduced (by each variable) and analyzed with descriptive statistics. 95% confidence intervals were constructed for each numerical variable mean value. Finally, the most appropriate multiple regression model was developed to predict house prices based on the given values of other variables. The results and recommendations was summarized in conclusion section.

Data Description

We are given with a data set of houses which was taken from:
https://wiki.csc.calpoly.edu/datasets/wiki/Houses

There are 781 observations for each of eight variables used. The description of the variables are given below:

MLS: house ID
Price: Listing price of the house (in the United States dollars)
Bedrooms: the amount of bedrooms in the observed house
Bathrooms: the amount of bathrooms in the observed house
Size: in square feet, size of the given house
Price/SQ.ft: the relative indicator – the price per square foot
Status: type of sale: Short Sale, Foreclosure or Regular.
As one of the purposes of this project is to use at least one dummy variable, we exclude the observations with Foreclosure sales and leave only observations for other two types of sales. Then we code Regular sale as 1 and Short Sale as 0. This will be our dummy variable.
We believe that house ID variable will not affect price because it is just a relative indicator which may be calculated from the total price and house size. Also, house ID also will not have any influence on its price. That’s why these two variables will not be used in regression analysis.
We also know that the one of the most significant factors of the house price is its location. As we have data from many different towns we may manipulate with this data in the following way: we just consider only those houses which are locate in Santa Maria-Orcutt. Others will be omitted. As all houses in the new data set are from one location, this variable will be also omitted in regression analysis.

In result we’ve got data set of 238 observations. The variables participated in regression analysis are:

Price (as dependent variable)
Bedrooms, Bathrooms, Size (as independent numerical variables)
Status (as independent dummy variable)
Descriptive Statistics
We have used Excel Analysis Tools and obtain the following descriptive statistics for the variables mentioned in the previous section:
According to the descriptive statistics we may conclude that the price variable is approximately normally distributed. The amount of Short sales are significantly bigger than regular sales. Only 14 houses were sold as regular. The size of the houses is mostly between 800 and 160 square feet (look at the histogram). There are 2 bathrooms and 3 bedrooms are met in houses more often.

Confidence Intervals

Confidence interval (CI) is an indication of the accuracy of measurement. This is also an indicator of how stable is obtained value, i.e. how close a value (initial value) you get when repeated measurements (experiment). 95% confidence interval may be interpreted in the following way: “We are 95% confident that the population mean of the variable (factor) is between lower bound and upper bound of this interval”. For our case we will calculate 95% CI in excel. We know that the formula of 95% CI is:
x±1.96*sn
where x is sample mean, s is sample standard deviation and n is sample size. The results are calculated in .xls file and given below in a table:
The interpretation for 95% CI for prices is: we are 95% confident that the mean value of the houses sold in Santa Maria-Orcutt is between $226,147.60 and $250,747.70.
The interpretation for 95% CI for number of bedrooms is: we are 95% confident that the mean value of bedrooms in the houses sold in Santa Maria-Orcutt is between 3.2393697 and 3.109665.
The interpretation for 95% CI for number of bathrooms is: we are 95% confident that the mean value of bathrooms in the houses sold in Santa Maria-Orcutt is between 2.385279 and 2.228167.
The interpretation for 95% CI for number of bathrooms is: we are 95% confident that the mean value of size of the houses sold in Santa Maria-Orcutt is between 1686,611 and 1540,473 square feet.

It should be noted, that we are telling about population mean in the interpretation section above.

Multiple Regression Analysis
Now we will develop a multiple regression equation for the prices of the houses based on number of bedrooms, number of bathrooms, sizes and type of sale.

We test the following hypotheses:

H0: β1=β2=β3=β4Ha:not all beta are equal to 0

We use MS Excel statistical tools and run regression analysis:

The resulting regression equation is:
Price=118085.9+12686.26*Bedrooms+15683.05*Bathrooms+76.11912*Size-
-84206.8*Status
According to the regression statistics we can see that the coefficient of determination R-square is 0,388831. This means that approximately 38.89% of response variable variance is explained by this model. According to the ANOVA output the model is significant because F-value is 37.05919 and its p-value is lesser than 0.001. Hence, we reject the null hypothesis and have enough evidence to say that not all betas are equal to 0 (at 5% level of significance). However, not all coefficients of the model are significant. The coefficients of Bedrooms and Bathrooms variables are significant only for level of significance of 17% or higher. This is not accurate result. That’s why these variables may be excluded from the model.

Conclusion

As the result of this research work we have developed the most appropriate (as far as this data allows) multiple regression model to predict the price of houses sold in Santa Maria-Orcutt. However, the accuracy of the model is not very high. The coefficient of determination is relatively low. This means that there are some other factors which must be included in the model because they affect the prices (or the additional data must be collected). The amount of bedrooms and bathrooms do not have significant effect on the prices of the houses. These factors may be excluded from the prediction equation.

References

David A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press (2005)
C.F. Gauss. Theoria combinationis observationum erroribus minimis obnoxiae. (1821/1823)
Fisher, R.A. (1922). "The goodness of fit of regression formulae, and the distribution of regression coefficients". Journal of the Royal Statistical Society (Blackwell Publishing) 85 (4): 597
Ronald A. Fisher (1954). Statistical Methods for Research Workers (Twelfth ed.). Edinburgh: Oliver and Boyd. ISBN 0-05-002170-2.

Cite this page
Choose cite format:
  • APA
  • MLA
  • Harvard
  • Vancouver
  • Chicago
  • ASA
  • IEEE
  • AMA
WePapers. (2020, October, 31) Free Dummy Variable Essay Sample. Retrieved November 05, 2024, from https://www.wepapers.com/samples/free-dummy-variable-essay-sample/
"Free Dummy Variable Essay Sample." WePapers, 31 Oct. 2020, https://www.wepapers.com/samples/free-dummy-variable-essay-sample/. Accessed 05 November 2024.
WePapers. 2020. Free Dummy Variable Essay Sample., viewed November 05 2024, <https://www.wepapers.com/samples/free-dummy-variable-essay-sample/>
WePapers. Free Dummy Variable Essay Sample. [Internet]. October 2020. [Accessed November 05, 2024]. Available from: https://www.wepapers.com/samples/free-dummy-variable-essay-sample/
"Free Dummy Variable Essay Sample." WePapers, Oct 31, 2020. Accessed November 05, 2024. https://www.wepapers.com/samples/free-dummy-variable-essay-sample/
WePapers. 2020. "Free Dummy Variable Essay Sample." Free Essay Examples - WePapers.com. Retrieved November 05, 2024. (https://www.wepapers.com/samples/free-dummy-variable-essay-sample/).
"Free Dummy Variable Essay Sample," Free Essay Examples - WePapers.com, 31-Oct-2020. [Online]. Available: https://www.wepapers.com/samples/free-dummy-variable-essay-sample/. [Accessed: 05-Nov-2024].
Free Dummy Variable Essay Sample. Free Essay Examples - WePapers.com. https://www.wepapers.com/samples/free-dummy-variable-essay-sample/. Published Oct 31, 2020. Accessed November 05, 2024.
Copy

Share with friends using:

Related Premium Essays
Contact us
Chat now