Introduction to data mining

Document Type:Research Paper

Subject Area:Statistics

Document 1

Big data is used to refer to huge complex chunks of data that cannot be analyzed, visualized, stored or captured by traditional statistical ways. Moreover, data mining utilizes neural network analysis, decision tree optimization, and machine learning. Big Data Philanthropic organizations and companies deal with huge amount of data concerning their customers. This data is collected for every transaction that occurs, and in just a few years the number of transactions is usually dozens and hundreds of entries on an annual basis. Some companies usually separate data of products and information concerning customers, which makes them create relational databases linking other indexed databases. Analysis of the data can begin with the analysis of customer response, customer purchase behavior and raise particular concern about the data.

Sign up to view the full document!

The approach of asking questions to identify patterns is referred to as "Online Analytical Processing" (OLAP). OLAP is used in the fields of finance, inventory, budgeting, marketing and sales to raise specific queries concerning the same. Another model is the predictive model which tries to predict responses by using predictor variables. The role of data mining is similar to the traditional statistical analysis since it exploits modeling and data analysis. Software vendors usually exaggerate the automatic capabilities of their tools as a way of raising the number of sales of their software. Data mining can be used to identify useful patterns and trends on customer behavior in the future, although the more the user understands their business the high the chances of successfully applying data mining.

Sign up to view the full document!

Therefore, data mining is not a magic that can overcome poor data collection and data quality. Some Major Myths of data mining include: Myth 1: Getting answers to questions that are not asked Myth 2: Monitoring the datasets automatically to get desirable patterns Myth 3: Remove the need for comprehending the business Myth 4: Remove the desire to collect high-quality data Myth 5: Remove the need for skillful data analysis techniques. Successful Data Mining The size of data contained in a warehouse makes the whole process of data mining challenging. Classification problem refers to the state where the variable under analysis is categorical. In the classification problem, the model anticipates patterns by guessing the most likely outcome using probability for every class. Bother regression and classification problems are good examples of supervised problems.

Sign up to view the full document!

Supervised problems entail a training set that contains some original data that will guide the data miner to creating the model. Thereafter, the analyst will come up with a test set that will be used to test the prediction of the constructed model. There are several procedures utilized by different algorithms which try to identify and profile two groups that are clearly contrasted by a split. After finding the first split, the algorithm proceeds by searching for another split from the remaining results until the split point and best variable are attained. In decision analysis representation of decision making, a decision can always take place explicitly and visually on a decision tree. In data mining, data can clearly be described by decision trees.

Sign up to view the full document!

From $10 to earn access

Only on Studyloop

Original template