Introduction to data mining

Document Type:Research Paper

Subject Area:Statistics

Document 1

Big data is used to refer to huge complex chunks of data that cannot be analyzed, visualized, stored or captured by traditional statistical ways. Moreover, data mining utilizes neural network analysis, decision tree optimization, and machine learning. Big Data Philanthropic organizations and companies deal with huge amount of data concerning their customers. This data is collected for every transaction that occurs, and in just a few years the number of transactions is usually dozens and hundreds of entries on an annual basis. Some companies usually separate data of products and information concerning customers, which makes them create relational databases linking other indexed databases. Customer specific data can as well be bought from some commercialized companies. Some organizations can purchase necessary information about customers to be able to predict their consumer behaviors. The selling, buying and sharing of information lead to privacy and confidentiality issues particularly when the data involved is associated with personal information. A data house is where the merging of and gathering of some disparate data takes place. Keeping the data house running entails some financial obligations which can cost up to millions for the technical personnel, hardware, and software. Many companies usually try to get the most value out of their developed data house. For instance, an analyst may investigate to see the individuals that can reply to direct messaging during a Red Cross campaign. Similarly, a financial institution may investigate the clients who are likely to accept a new service that they intend to introduce. The Goals of Data Mining The key objective of data mining is to extract valuable information from huge chunks of databases.

Sign up to view the full document!

This task of searching of information in huge databases can be compared to searching for a needle in a haystack. Analysis of the data can begin with the analysis of customer response, customer purchase behavior and raise particular concern about the data. The approach of asking questions to identify patterns is referred to as "Online Analytical Processing" (OLAP). OLAP is used in the fields of finance, inventory, budgeting, marketing and sales to raise specific queries concerning the same. Another model is the predictive model which tries to predict responses by using predictor variables. The role of data mining is similar to the traditional statistical analysis since it exploits modeling and data analysis. Software vendors usually exaggerate the automatic capabilities of their tools as a way of raising the number of sales of their software.

Sign up to view the full document!

Data mining can be used to identify useful patterns and trends on customer behavior in the future, although the more the user understands their business the high the chances of successfully applying data mining. Therefore, data mining is not a magic that can overcome poor data collection and data quality. Some Major Myths of data mining include: Myth 1: Getting answers to questions that are not asked Myth 2: Monitoring the datasets automatically to get desirable patterns Myth 3: Remove the need for comprehending the business Myth 4: Remove the desire to collect high-quality data Myth 5: Remove the need for skillful data analysis techniques. Successful Data Mining The size of data contained in a warehouse makes the whole process of data mining challenging. Classification problem refers to the state where the variable under analysis is categorical.

Sign up to view the full document!

In the classification problem, the model anticipates patterns by guessing the most likely outcome using probability for every class. Bother regression and classification problems are good examples of supervised problems. Supervised problems entail a training set that contains some original data that will guide the data miner to creating the model. Thereafter, the analyst will come up with a test set that will be used to test the prediction of the constructed model. There are several procedures utilized by different algorithms which try to identify and profile two groups that are clearly contrasted by a split. After finding the first split, the algorithm proceeds by searching for another split from the remaining results until the split point and best variable are attained. In decision analysis representation of decision making, a decision can always take place explicitly and visually on a decision tree.

Sign up to view the full document!

In data mining, data can clearly be described by decision trees. Neural Networks An Artificial Neural Network, frequently referred to as a neural system, is a numerical model which tries to mimic how the brain is functioning. • Data preparation –Involves all the actions to create the final data set that will be used for the initial raw data • Modeling –This comprises of choosing and applying the most suitable modeling techniques to obtain optimal values. There are different techniques that are usually utilized in solving the problem hence moving back to data preparation stage is necessary. • Evaluation – This involves creating a high-quality model from the data analysis point of view. • Deployment –This is presenting the organized data to provide insights and useful predictive information that can be used for making decisions.

Sign up to view the full document!

From $10 to earn access

Only on Studyloop

Original template