Data mining and statistical analysis using SQL /

This volume is designed for database administrators who want to buttress their understanding of statistics to support data mining and customer relationship management and analytics, and who want to use Structured Query Language, (SQL).

Saved in:
Bibliographic Details
Main Author: Trueblood, Robert P.
Other Authors: Lovett, John N.
Format: Book
Language:English
Published: Berkeley, CA : Apress, ©2001.
Series:Books for professionals by professionals.
Subjects:
Table of Contents:
  • Chapter 1 Basic Statistical Principles and Diagnostic Tree 1
  • Categories of Data 2
  • Sampling Methods 2
  • Diagnostic Tree 5
  • SQL Data Extraction Examples 7
  • Chapter 2 Measures of Central Tendency and Dispersion 9
  • Measures of Central Tendency 10
  • Mean 10
  • Median 12
  • Mode 16
  • Geometric Mean 17
  • Weighted Mean 20
  • Measures of Dispersion 22
  • Histogram Construction 22
  • Range 32
  • Standard Deviation 33
  • Chapter 3 Goodness of Fit 41
  • Tests of Hypothesis 43
  • Goodness of Fit Test 46
  • Fitting a Normal Distribution to Observed Data 47
  • Fitting a Poisson Distribution to Observed Data 62
  • Fitting an Exponential Distribution to Observed Data 68
  • T-SQL Source Code 72
  • Make_Intervals 73
  • Combine_Intervals 74
  • Compare_Observed_And_Expected 80
  • Procedure Calls 82
  • Chapter 4 Additional Tests of Hypothesis 85
  • Comparing a Single Mean to a Specified Value 88
  • Comparing Means and Variances of Two Samples 94
  • Comparisons of More Than Two Samples 101
  • T-SQL Source Code 105
  • Calculate_T_Statistic 105
  • Calculate_Z_Statistic 107
  • Compare_Means_2_Samples 108
  • Contingency_Test 113
  • Procedure Calls 117
  • Chapter 5 Curve Fitting 119
  • Linear Regression in Two Variables 121
  • Linear Correlation in Two Variables 127
  • Polynomial Regression in Two Variables 130
  • Other Nonlinear Regression Models 136
  • Linear Regression in More Than Two Variables 141
  • T-SQL Source Code 147
  • Linear_Regression_2_Variables 148
  • Gaussian_Elimination 150
  • Array_2D 158
  • Polynomial_Regression 160
  • Exponential_Model 169
  • Multiple_Linear_Regression 172
  • Procedure Calls 179
  • Chapter 6 Control Charting 181
  • Common and Special Causes of Variation 183
  • Dissecting the Control Chart 193
  • Control Chorts for Sample Range and Mean Values 195
  • Control Chart for Fraction Nonconforming 206
  • Control Chart for Number of Nonconformities 213
  • T-SQL Source Code 216
  • Sample_Range_and_Mean_Charts 216
  • Standard_P_Chart 219
  • Stabilized_P_Chart 222
  • C_Chart 224
  • Procedure Calls 227
  • Chapter 7 Analysis of Experimental Designs 229
  • One-Way ANOVA 231
  • Two-Way ANOVA 238
  • ANOVA Involving Three Factors 245
  • T-SQL Source Code 261
  • ANOVA 261
  • Procedure Calls 275
  • Chapter 8 Time Series Analysis 277
  • Simple Moving Average 278
  • Single Exponential Smoothing 286
  • Double Exponential Smoothing 292
  • Incorporating Seasonal Influences 300
  • Criteria for Selecting the Most Appropriate Forecasting Technique 308
  • T-SQL Source Code 311
  • Simple Moving Average 312
  • Weighted Moving Average 315
  • Single Exponential Smoothing 318
  • Double Exponential Smoothing 322
  • Seasonal Adjustment 327
  • Procedure Calls 332
  • Appendix A Overview of Relational Database Structure and SQL 337
  • Appendix B Statistical Tables 359
  • Appendix C Tables of Statistical Distributions and Their Characteristics 373
  • Appendix D Visual Basic Routines 381.