Applies to: SQL Server 2019 and earlier Analysis Services Azure Analysis Services Fabric/Power BI Premium
Data mining was deprecated in SQL Server 2017 Analysis Services and now discontinued in SQL Server 2022 Analysis Services. Documentation is not updated for deprecated and discontinued features. To learn more, see Analysis Services backward compatibility.
Validation is the process of assessing how well your mining models perform against real data. It is important that you validate your mining models by understanding their quality and characteristics before you deploy them into a production environment.
This section introduces some basic concepts related to model quality, and describes the strategies for model validation that are provided in Microsoft SQL Server Analysis Services. For an overview of how model validation fits into the larger data mining process, see Data Mining Solutions.
There are many approaches for assessing the quality and characteristics of a data mining model.
All of these methods are useful in data mining methodology and are used iteratively as you create, test, and refine models to answer a specific problem. No single comprehensive rule can tell you when a model is good enough, or when you have enough data.
Measures of data mining generally fall into the categories of accuracy, reliability, and usefulness.
Accuracy is a measure of how well the model correlates an outcome with the attributes in the data that has been provided. There are various measures of accuracy, but all measures of accuracy are dependent on the data that is used. In reality, values might be missing or approximate, or the data might have been changed by multiple processes. Particularly in the phase of exploration and development, you might decide to accept a certain amount of error in the data, especially if the data is fairly uniform in its characteristics. For example, a model that predicts sales for a particular store based on past sales can be strongly correlated and very accurate, even if that store consistently used the wrong accounting method. Therefore, measurements of accuracy must be balanced by assessments of reliability.
Reliability assesses the way that a data mining model performs on different data sets. A data mining model is reliable if it generates the same type of predictions or finds the same general kinds of patterns regardless of the test data that is supplied. For example, the model that you generate for the store that used the wrong accounting method would not generalize well to other stores, and therefore would not be reliable.
Usefulness includes various metrics that tell you whether the model provides useful information. For example, a data mining model that correlates store location with sales might be both accurate and reliable, but might not be useful, because you cannot generalize that result by adding more stores at the same location. Moreover, it does not answer the fundamental business question of why certain locations have more sales. You might also find that a model that appears successful in fact is meaningless, because it is based on cross-correlations in the data.
SQL Server Analysis Services supports multiple approaches to validation of data mining solutions, supporting all phases of the data mining test methodology.
These metrics do not aim to answer the question of whether the data mining model answers your business question; rather, these metrics provide objective measurements that you can use to assess the reliability of your data for predictive analytics, and to guide your decision of whether to use a particular iterate on the development process.
The topics in this section provide an overview of each method and walk you through the process of measuring the accuracy of models that you build using SQL Server Data Mining.
Topics | Links |
---|---|
Learn how to set up a testing data set using a wizard or DMX commands | Training and Testing Data Sets |
Learn how to test the distribution and representativeness of the data in a mining structure | Cross-Validation (Analysis Services - Data Mining) |
Learn about the accuracy chart types provided. | Lift Chart (Analysis Services - Data Mining) |