What is Data Mining?

Asked

Viewed 313 times

6

Well, the title says it all:

What is Data Mining?

  • This term is seen more with database and BI.

  • Well remembered @rray, it may be that, or that, I think it got confused in the term, the first is ontopic, the second Offtopic

  • @rray updated the question

  • @Marceloboni updated the question

  • 1

    This may be the beginning of an answer: https://msdn.microsoft.com/pt-br/library/ms174949(v=sql.120). aspx

Show 1 more comment

1 answer

10


Data mining or English Data Mining

Since information became so important for decision-making, data has been stored on large scales. And with the volume of stored data growing daily, questioning began to appear. What to do with so much stored data? Traditional data mining techniques are no longer suitable for handling the vast majority of repositories. In order to answer this question, at the end of the 1980s, Mining Data, from Data Mining.

Data Mining is one of the most promising technologies today. One of the factors of this success is the fact that tens, and often hundreds of millions of reais, are spent by companies on data collection and yet no useful information is identified. In his work, Han (2006) in his book refers to this situation as "rich in data, poor in information". In addition to private enterprise, the public sector and the third sector (Ongt’s) can also benefit from Data Mining.

Data mining is not only used by Bitcoin, but Second Witten and Bramer can be used in some of the areas satisfactorily, such as:

  • Customer retention: profiling for certain products, cross selling;
  • Banks: identify standards to assist in managing customer relationship;
  • Credit card: identify market segments, identify rotational patterns;
  • Collection: detection of fraud;
  • Telemarketing: easy access to customer data;
  • Medicine: more accurate diagnosis indication;
  • Security: in the detection of terrorist and criminal activities;
  • Decision-making: filter the relevant information, provide probability indicators.
  • A supermarket improves the disposal of its products in shelves, through the consumption pattern of their customers;

But data mining works in practice?

Data mining is used in large amounts of data and uses mathematical analysis to derive anomalies, patterns and correlations, capturing only what is relevant, through a previous education of the tool. Companies use this technology to support decision-making and provide strategic advantages. Using a wide range of techniques, you can use this information to increase revenue, reduce costs, improve customer relationships, reduce risks, and more.

The most important thing for any project that decides to use data mining is to clearly define which problem will be solved.

According to the website of SAS Institute

Data mining is defined as a combined discipline, represents a variety of methods or techniques used in different analytical capabilities addressing a range of needs organizational, answer different types of questions and use different levels of rules for reaching a decision.

And also define several types of modeling, such as: Descriptive modeling, Predictive modeling and Prescriptive modeling.

One of the most widespread standards for working with data mining is CRISP-DM (Cross-Industry Standard Process of Data Mining), due to the wide literature available and currently being considered the most accepted standard, according to HAN (2006).

As stated by Olson et al. (2008) in his book, the CRISP-DM process consists of six cyclically arranged phases, as shown in the figure below. In addition, although composed of phases, the flow is not unidirectional, and can go back and forth between phases.

inserir a descrição da imagem aqui

The phases of the CRISP-DM process are:

  1. Business Understanding: At this stage, the focus is to understand which the objective to be achieved with data mining. The business understanding will help in the next steps.
  2. Understanding of Data: Data sources may come of various locations and have various formats. According to Olson et After defining the objectives, it is necessary to know the data aiming at: Clearly describe the problem; Identify the data relevant to the problem in question; make sure that the variables relevant to the project are not interdependent. Normally the techniques of grouping and visual exploitation also are used in this stage.
  3. Data Preparation: Due to the many possible sources, it is common for data to be unprepared for data mining methods to be applied directly. Depending on the quality of this data, some actions may be required. This data cleaning process usually involves filtering, matching and filling empty values.
  4. Modeling: It is at this stage that mining techniques (algorithms) will be applied. The choice of technique(s) depends on the desired objectives. Data Mining: Concepts, Tasks, Methods and Tools 5
  5. Appraisal: Considered a critical phase of the mining process, at this stage it is necessary the participation of data experts, business connoisseurs and decision makers. Several graphic tools are used for the visualization and analysis of the results (models). Tests and validations, aiming to obtain the reliability in the models, must be executed (cross validation, suplied test set, use training set, percentage split) and indicators to assist the analysis of the results need to be obtained (confusion matrix, correction index and inaccuracy of mined instances, kappa statistics, absolute mean error, mean relative error, accuracy, F-Measure, among others).
  6. Distribution: After running the model with real and complete data it is necessary that those involved know the results.

Note: Answer based on the mentioned books. All knowledge based on the statements and knowledge of the highest on the subject

Browser other questions tagged

You are not signed in. Login or sign up in order to post.