Example Of Data Analytics Critical Thinking
Type of paper: Critical Thinking
Topic: Information, Big Data, Analytics, Technology, Store, Aliens, Ability, Management
Pages: 2
Words: 550
Published: 2021/01/07
Abstract
Big data is a popular term nowadays, appearing in almost all professional conferences devoted to the data analysis, predictive analytics, data mining, CRM. The term is used in areas where work with large volumes of data is very relevant, and where the velocity of data flow into the organizational process increases constantly: economics, banking, manufacturing, marketing, telecommunications, web analytics, medicine and others. This essay aims at clarifying and considering the new possibilities for analytics of big volumes of data which enable to optimize processes and solve problems, highlighting the challenges involved in using the data efficiently and indicating the steps necessary for protecting the data.
As a rule, the Big Data discussion is concentrated on the data storages (and the analysis based on such storages), the volume of which is much bigger than just a few terabytes. In particular, some data storages can grow to thousands of terabytes, i.e., up to petabytes (1000 terabytes = 1 petabyte). Beyond the scope of petabytes, data accumulation can be measured in exabytes, for example, it is estimated that during 2010 in the manufacturing sector worldwide there was accumulated a total of 2 exabytes of new information (Manyika, J. et al., 2011). There are industries where data is collected and accumulated very rapidly.
The classification of the data volumes can be represented as follows: Large data sets (from 1 gigabyte to hundreds of gigabytes); Huge data sets (from 1 terabyte to several terabytes); Big Data (from a few terabytes to hundreds of terabytes); Extremely Big Data (from 1 to 10 petabytes).
There are three types of problems related to Big Data (Purcell, B., 2013):
1. Storage and Management (the data volume equal to hundreds of terabytes or petabytes does not allow to easily store and manage them by means of traditional relational databases).
2. Unstructured Information (most of the Big Data are unstructured, for how is it possible to organize text, video, images, and so on).
3. Analysis of Big Data (how to analyze unstructured information; how to make simple reports, build and implement in-depth predictive models based on Big Data).
Big Data is usually stored and organized in distributed file systems. In general, the information is stored on multiple hard drives on standard computers (Manyika, J. et al., 2011). A so-called map keeps track of where (on which computer and/or disk) a particular piece of information is stored. To provide fault tolerance and reliability, each piece of information is usually stored several times (Purcell, B., 2013).
Most of the information collected in a distributed file system consists of unstructured data such as text, images, photos or video. This has both the advantages and the downsides. The benefit is the possibility to store large data which allows storing “all data” without worrying about how much of the data is relevant for analysis and decision making. The disadvantage is that in such cases, subsequent processing of huge amounts of data is required to extract useful information (Purcell, B., 2013). Some of these operations may be simple (e.g., simple calculations, etc.), while others require more complex algorithms, which must be specially designed to work effectively in the distributed file system. Thus, while the amount of data may increase in geometric progression, the ability to retrieve information and to act on the basis of this information will be limited and will asymptotically reach the limit. It is important that along with the data storage systems the methods and procedures for the model construction and updates, as well as for automation of decision making process were developed to ensure that such systems are useful and advantageous for the company (Manyika, J. et al., 2011).
There is a really big problem with the analysis of unstructured data: how to analyze it to be able to use it properly. A number of issues should be considered, one of which is that despite the data sets can be very large, the information contained in them has considerably smaller dimension. For example, while data is collected every second or every minute, many of the parameters are stable over long time intervals. In other words, the data which is recorded every second basically repeats the same information (Purcell, B., 2013). Thus, it is necessary to carry out “intelligent” data aggregation which will ensure receiving those data for simulation and optimization which contain only the necessary information about the dynamic changes affecting the efficiency of work.
According to Gorton, I., Greenfield, P., Szalay, A., Williams, R. (2008), in order to prevent cyber-attacks, as well as for their recognition and response to them, intrusion detection systems which will process network packets at speeds of several gigabits per second are required. Ideally, such systems must ensure the issuance of warnings of a possible attack within seconds or at least minutes, so that operators can manage to protect themselves from attacks when they do occur.
Thus, the creation and maintenance of storages with capacity to store big volumes of data was made possible by means of technology of distributed file systems. In practice, Big Data analysis is rarely connected with calculation of the statistical results of all the data. The importance of Big Data is the ability to separate the data into the “micro-segments” and using the methods of data mining and predictive modeling to build a large number of models for small groups of observations.
References
Gorton, I., Greenfield, P., Szalay, A., Williams, R. (2008). Data-Intensive Computing in the 21st Century. IEEE Computer Society, 41 (4), 78-80.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. Retrieved from http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
Purcell, B. (2013). The emergence of “big data” technology and analytics. Journal of Technology Research. Retrieved from http://www.aabri.com/manuscripts/121219.pdf
- APA
- MLA
- Harvard
- Vancouver
- Chicago
- ASA
- IEEE
- AMA