Introduction to data mining

Document 1

Big data is used to refer to huge complex chunks of data that cannot be analyzed, visualized, stored or captured by traditional statistical ways. Moreover, data mining utilizes neural network analysis, decision tree optimization, and machine learning. Big Data Philanthropic organizations and companies deal with huge amount of data concerning their customers. This data is collected for every transaction that occurs, and in just a few years the number of transactions is usually dozens and hundreds of entries on an annual basis. Some companies usually separate data of products and information concerning customers, which makes them create relational databases linking other indexed databases. Analysis of the data can begin with the analysis of customer response, customer purchase behavior and raise particular concern about the data. The approach of asking questions to identify patterns is referred to as "Online Analytical Processing" (OLAP).

OLAP is used in the fields of finance, inventory, budgeting, marketing and sales to raise specific queries concerning the same. Another model is the predictive model which tries to predict responses by using predictor variables. The role of data mining is similar to the traditional statistical analysis since it exploits modeling and data analysis. Software vendors usually exaggerate the automatic capabilities of their tools as a way of raising the number of sales of their software. Data mining can be used to identify useful patterns and trends on customer behavior in the future, although the more the user understands their business the high the chances of successfully applying data mining. Therefore, data mining is not a magic that can overcome poor data collection and data quality. Some Major Myths of data mining include: Myth 1: Getting answers to questions that are not asked Myth 2: Monitoring the datasets automatically to get desirable patterns Myth 3: Remove the need for comprehending the business Myth 4: Remove the desire to collect high-quality data Myth 5: Remove the need for skillful data analysis techniques.

Successful Data Mining The size of data contained in a warehouse makes the whole process of data mining challenging. Classification problem refers to the state where the variable under analysis is categorical. In the classification problem, the model anticipates patterns by guessing the most likely outcome using probability for every class. Bother regression and classification problems are good examples of supervised problems. Supervised problems entail a training set that contains some original data that will guide the data miner to creating the model. Thereafter, the analyst will come up with a test set that will be used to test the prediction of the constructed model. There are several procedures utilized by different algorithms which try to identify and profile two groups that are clearly contrasted by a split. After finding the first split, the algorithm proceeds by searching for another split from the remaining results until the split point and best variable are attained.

In decision analysis representation of decision making, a decision can always take place explicitly and visually on a decision tree. In data mining, data can clearly be described by decision trees. Neural Networks An Artificial Neural Network, frequently referred to as a neural system, is a numerical model which tries to mimic how the brain is functioning. • Data preparation –Involves all the actions to create the final data set that will be used for the initial raw data • Modeling –This comprises of choosing and applying the most suitable modeling techniques to obtain optimal values. There are different techniques that are usually utilized in solving the problem hence moving back to data preparation stage is necessary. • Evaluation – This involves creating a high-quality model from the data analysis point of view. • Deployment –This is presenting the organized data to provide insights and useful predictive information that can be used for making decisions.

From $10 to earn access

Only on Studyloop

Original template

Downloadable

Content type:User generated

Words:1736

Pages:7

Level:High School

Language:English

Reference list:Yes

Formatting:AMA

Uploaded by:Elizabeth Davison