Statistics Research Papers Examples
Type of paper: Research Paper
Topic: Time, Stress, Pressure, Correlation, Failure, Length, Value, Model
Pages: 3
Words: 825
Published: 2021/01/03
Interpreting Logistic Regression Output from Excel
Interpreting Logistic Regression Output from Excel
Introduction
When we have a binary output variable Y, and we wish to model the conditional probability of (Y, given X) as a function of X, one approach is to perform logistic regression. Logistic regression introduces the ‘logistic transformation’, log p (x)/(1-p (x). This logistic transformation helps in avoiding results that would not normally make sense. Using this approach, logistic regression helps to determine the probability of a categorical dependent variable based on one or more predictor variables (Carnegie Mellon University, n.d.).
The challenge is to convert this data to a prediction model for Riyadh’s water supply. Essentially, we wish to answer the question about what is the probability of failure of any pipe, given its type, material, pressure, size, length or time elapsed.
Strength of the Model
The Receiver Operating Characteristics (ROC) curve is a tool to determine the strength of the predictive power of the logistic regression model. The higher the area under the curve, the better the model at its predictive capabilities. In the sheet ‘logistic regression’, the ROC table is created, mapping the probability of prediction (p-pred), against failure, success, cumulative failure and cumulative success. From this is determined the ‘True Positive Rate (TPR)’, False Positive Rate (FPR)’, and ‘Area under the curve’ (AUC). The net AUC is determined as 0.91. This indicates that the prediction model has a likelihood of predicting failures based on relevant categories correctly for 91% of the time. This is evident in the ROC curve, where the True Positive Rate graph climbs steeply upward, resulting in 91% of the area being under the ROC Curve.
Intercept Table
The intercept table analyses the hypothesis that the proportion of failures due to various categories of the pipes would be the same.
H0 : P(var1) = P(var2) = P(var3)=P(var4) = P(var5) = P(var6), where var1 is material, var2 is pressure, var3 is size, var4 is length, var5 is time and var6 is failure.
H1: The proportions are different.
The intercept table indicates the b-coefficient of each variable, the standard error, the Wald Chi square and the p value. P value, arrived at after the chi square, is the estimated probability of rejecting the null hypothesis (H0) when that hypothesis is true. If the p value is less than 0.05, it indicates that in thousands of experiments, not more than one in 20 times would the proportions differ. In the intercept table, the p value is more than 0.05 in the case of var1, var2 and var 4. This means that material, pressure and length would not homogeneously provide failure patterns. The Wald chi-square is the high for var 3(size) at 77.3 and var6 (failure state) at 137. This means that the probability of the model to indicate the failure patterns is strong for size and failure. This inference is corroborated from the correlation graphs that follow (UCLA Institute of Digital Research and Regressions, n.d.).
Correlation Table
The correlation table sheet shows the correlation between the various categorical inputs, namely type, material, pressure, size, length, time elapsed and failure state. It is observed that there is a 53.8% correlation between failure and time elapsed. This means that the older the pipe, the more likely it is to fail. There is an 88% correlation between pressure and length. This means that the longer the pipe, the more the buildup of pressure in the pipe. The correlation between pressure and size is 34.5%.
Correlation Graphs
The correlation table is graphically depicted as graphs to indicate the dependency of respective dependent variables on independent variables. Interpretations of each graph are given subsequently.
Prediction vs. Pressure
In the graph on prediction vs. pressure, the intent is to determine whether pressure (the dependent variable) is related to the prediction rate. Prediction (the ‘y’ variable) is related to pressure (the ‘x’ variable) through the following equation:
y = 0.5771ln(x) + 66.232
The R square value is 0.061, meaning that the pressure values indicate whether the pipe would fail or not 6.1% of the time. A low R square value is not necessarily a negative input; it has to be seen in the context of other tests and parameters. In this case, the correlation between prediction and pressure is 0.3, indicating a 30% link between prediction of failure and pressure.
Length vs. Time Elapsed
In the graph on length vs. time elapsed, the intent is to determine whether how the time elapsed (independent variable) determines the length (dependent variable). Length (the ‘y’ variable) is related to time elapsed (the ‘x’ variable) through the following equation:
y = -1.277ln(x) + 14.327
The R square value is 0.15, indicating a 15% goodness of fit of the data. The correlation coefficient is 0.4, indicating that the length of the pipe has a 40% relation to the time elapsed. This indicates that with increased experience in the water supply mechanism, the Riyadh authorities are 40% likely to lay longer pipes with time.
Time elapsed vs. Pressure
In the graph of time elapsed vs. pressure, the intent is to determine whether time elapsed (the dependent variable) can be related to pressure (independent variable). Time elapsed (the ‘y’ variable) is related to pressure (the ‘x’ variable) through the following equation:
y = -8.3ln(x) + 84.487
The R squared value is 0.12, indicating a 12% goodness of fit of the data. The correlation coefficient is 0.48, indicating a 48% relation of time to pressure. In effect, with more pressure in the pipes, the pipes have a 48% chance of surviving for a longer time. While this conclusion might appear counter-intuitive, it makes sense from the engineering perspective: if adequate water pressure is retained in pipes, they should have a longer lifespan; on the other hand, pipes in disuse are likely to deteriorate earlier.
Prediction vs. Time Elapsed
In the graph of prediction vs. time elapsed, the aim is to determine how the prediction of failure of pipes relates to the time elapsed, ie the life of the pipes. Prediction (the ‘y’ variable) is related to time elapsed (the ‘x’ variable) through the following equation:
y = 0.6269ln(x) + 10.806
R square value of 0.45 indicates that the data of prediction follows the relational model (data set) 45% of the time. This implies that the correlation as worked out would be adequately strong. The correlation coefficient is 0.65, indicating a 65% likelihood that the prediction regarding life of the pipes would correlate with the life of the pipes. It is therefore likely that the older the pipe, it is likely to fail.
Conclusion
The logistic regression model created for the water supply system of Riyadh has indicated correlations between time elapsed and life of pipes, as well as between water pressure and time elapsed. The managerial implications of the same would be that it would be prudent for the Riyadh administration authorities to ensure that the pipelines remain in good usage, as the likelihood of pipelines in disuse to fail is strong. Also, it would be imperative to institute regular checks on older pipes for failure, as the likelihood of older pipes to fail is strong.
Reference
Carnegie Mellon University. (n.d.). Logistic regression. Retrieved March 28, 2015, from http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf
UCLA Institute of Digital Research and Regressions. (n.d.). Summary table for logistical regression models. Retrieved March 28, 2015, from http://www.ats.ucla.edu/stat/sas/code/logit_table.htm
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA