Big data indexing research
Document Type:Research Paper
Subject Area:Computer Science
To solve this challenge, indexing of big data is essential. Indices are a list of tags, subjects, names, and tag which reference where data can be found. The indexing explains how the information is arranged in storage systems and assists in the retrieval of information [1]. It is important for analysts of data to receive results in a quick way from huge data stored in machines that are heterogenous. There are numerous structures for processing massive information available during search operations [2]. Index definition By definition, in the programming and computing sector, an index is an integer or any other key that indicates the location of data such as within a vector, table or database. It quantifies the comparison or change usually across multiple points of data. Indexes help to simplify analysis by making analysts to communicate their findings to decision makers succinctly and clearly.
For example, a product index can be 135 for people whose age falls between 18-24 years. This means that it appeals more to the younger generation unlike the older generation. Secondly, data indexes are efficient through the provision of context and comparison in a single number. By delineating a value set against another, an audience can easily notice any disparity between the sets (Gani et al, 241-284). Thirdly, indexes are favorable and well suited during the making of disparate comparisons. With a single glance, even a data novice can simply make an orange to apple set contrast. One this data is juxtaposed with an index, insights as well as correlations can surely jump off the page. It would prove wise to note that this dense index usually points to the initial record with that particular input in indices which are clustered and have duplicate keys.
In third place is the sparse index where in databases, it is usually a file with keys which come in pairs and usually have pointers for each and every chunk in the data folder. This is supposedly used to mean that each key in this file type has a direct associate with a certain indicator to the chunk in the information file that is sorted. Here, the key thing to note is that the sparse index usually indicates the lowly search key in every hunk where indices that are clustered have duplicate keys. In fourth place is the reverse key index that as the name suggests usually possesses the distinct characteristic feature of reversing the value that is key before entering it into the index. Unlike the non-clustered mode, here only a single index that is clustered can be formed on a table database that is given.
The good thing with clustered indices is that they possess a unique deal in the way they increase the overall speed used in retrieval greatly. However, this is possible if at all the data has been acquired sequentially in a similar or reverse order of the index that is clustered or maybe when a certain item range has been chosen. Only a few data block reads are usually required because the physical records are in a n order that is sort on the disk and the next item that is row in such a sequence usually comes just immediately before or just after the last one. Thus the basic feature found in an index that is clustered is the order in which data rows appear as the index blocks have a point at them.
Several design tradeoffs that are complex involve index update performance, lookup performance as well as the index size. In addition, policing the database constraints is one of the usages of indexes. This entails methods such as unique, foreign key, exclusion and primary key. By unique, we mean that the index creates an implicit constraint on the table that is underlying. Exclusion constraint on the other hand ensures that for any updated recorded that is newly inserted, no record is held for a certain predicate. Therefore, your interests are notable making analytics get a close look at the trending items. More so, trends can be found through surfing the internet where one can find what products people are talking about (Kim et al, 107-115). In the stores, some tools can be used to observe the trending products such as video object tracking in the use of surveillance videos, cameras on phone so as to determine what areas in the store are frequently visited and the kind of displays that have a regular pattern.
After using the insights on the customers’ habits, what is demand is noted by retailers and what to get rid off is also known. It is here that the aspect of big data offers a hand in money economy through stopping the wastage of money on products that are on the dead end thus filling shelves with what products really want. Therefore, big data is one of the issues under discussion and is surely completely transforming the way business is done thus impacting most other parts of our lives. The whole story behind the common phrase which is big data is that in all that we do, there is an increasing digital trace or data being left which we plus others can comfortably use and analyze. Thus, big data refers to all the ability levied to make use of the volumes of data that are ever increasing.
According to Eric Schmidt, the Executive Chairman of Google, since the dawn of civilization all the way to the year 2003, five exabytes of data were generated by humankind which is currently being produced after every two days meaning the pace is increasing. The initial five exabytes include the internet of things data whereby with the advent of televisions, one is able to collect and process data. In addition, CCTV cameras that are in bulk nowadays take video images that without limit are uploaded and posted in sites such as you tube. As for sensor data, sensors that collect and share data are increasingly surrounding us. For instance, a simple device like the smart phone usually contains a global positioning sensor which has the ability to track where you are exactly in every second of the day.
To make it more effective, it has an accelometer that tracks the speed and direction in which you as an individual is travelling. Recently, sensors have been incorporated in many products and devices. The other type of a model is the global index which has this unique characteristic of being partitioned independently hence being placed away from the data found on the nodes. This works great for any queries although keeping up with mutations can sound so challenging since a network access is required when indexing the data. Query latencies work faster and global indexes are seldom noticed in the distributed databases such as the NoSQL. Neither the Cassandra nor MongoDb comes along with global indexes. However, Couchbase Server can possess the global indexes under the global secondary indexes. Research has however based its argument on the artificial intelligence based cooperative indexing which has the potential to accelerate the deployment and progress of the BD-MCC.
Big data indexing requirements Majorly, timeliness plus accuracy have been taken to be the core requirements that present the effectiveness of indexing methods. However, other six Vs which are value, variability, veracity, variety, velocity and volume plus complexity come in handy. Chang et al has considered them to intricate enormous data as a fresh model. Indexing and managing extensive volumes of data has proved to be a challenge as petabytes are now anticipated to amplify to zettabytes in the coming days. The use of the indexing technique is crucial so as to deal with the challenge of value, variability, veracity, variety, velocity, volume plus complexity. When dealing with conversation data, photo and video image data as well as sensor data, the index architecture can be applied as illustrated in the above discussion by using modes such as the clustered method, non-clustered method and the cluster mode.
Related work Mercer kernel-based clustering in feature space M. Girolami The article presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a non linear data transformation into some high dimensional is possible. B. Habbal, A. Hassan, S. Cottrell, R. Les, White, B. Journal of Intelligent & Fuzzy Systems, 32(5), 3259–3271. Gani, Abdullah, et al. A Survey on Indexing Techniques for Big Data: Taxonomy and Performance Evaluation. Knowledge and Information Systems, vol. no. com/docview/902125567?accountid=45049. Kim, Dongmin, Jinwoo Choi, and Chongwoo Woo. A Design and Development of Big Data Indexing and Search System Using Lucene. Journal of Internet Computing and Services 15. Web. Web. Wang, Jun, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang.
Learning to Hash for Indexing Big Data—A Survey. Proceedings of the IEEE 104. Web. s11036-014-0547-2. Zuze, Herbert, and Melius Weideman. Keyword Stuffing and the Big Three Search Engines. Online Information Review, vol. no.
From $10 to earn access
Only on Studyloop
Original template
Downloadable
Similar Documents