What is Big Data Analysis. How is Big Data used? The best books about Big-Data technology

It was predicted that the total global volume of data created and replicated in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

More complex definition

Nevertheless` big data` involve more than just analyzing vast amounts of information. The problem is not that organizations create huge amounts of data, but that most of it is presented in a format that does not fit well with the traditional structured database format - it is web logs, videos, text documents, machine code, or, for example, geospatial data . All this is stored in many different repositories, sometimes even outside the organization. As a result, corporations can have access to a huge amount of their data and not have the necessary tools to establish relationships between these data and draw meaningful conclusions from them. Add to this the fact that data is now updated more and more often, and you get a situation in which traditional methods of information analysis cannot keep up with huge amounts of constantly updated data, which ultimately paves the way for technology. big data.

Best Definition

In essence, the concept big data involves working with information of a huge volume and diverse composition, very often updated and located in different sources in order to increase work efficiency, create new products and increase competitiveness. The consulting company Forrester puts it succinctly: ` big data bring together techniques and technologies that extract meaning from data at the extreme limit of practicality`.

How big is the difference between business intelligence and big data?

Craig Bathy, Chief Marketing Officer and Chief Technology Officer of Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business in a given period of time, while the processing speed big data allows you to make the analysis predictive, able to offer business recommendations for the future. Big data technologies also allow you to analyze more types of data than business intelligence tools, which makes it possible to focus not only on structured storage.

Matt Slocum from O "Reilly Radar believes that although big data and business intelligence have the same goal (finding answers to a question), they differ from each other in three aspects.

  • Big data is designed to process larger amounts of information than business intelligence, and this, of course, fits the traditional definition of big data.
  • Big data is designed to process faster and more rapidly changing information, which means deep exploration and interactivity. In some cases, the results are generated faster than the web page loads.
  • Big data is designed to handle unstructured data that we are only just beginning to explore how to use it after we have been able to collect and store it, and we need algorithms and dialogue to make it easier to find the trends contained within these arrays.

According to the Oracle Information Architecture: An Architect's Guide to Big Data white paper published by Oracle, we approach information differently when working with big data than when doing business analysis.

Working with big data is not like a typical business intelligence process, where simply adding together known values ​​yields results: for example, adding bills paid together becomes sales for a year. When working with big data, the result is obtained in the process of cleaning them through sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model is built, on its basis the correctness of the hypothesis put forward is checked, and then the next one is put forward. This process requires the researcher to either interpret visual meanings or make interactive knowledge-based queries, or develop adaptive `machine learning` algorithms capable of producing the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big Data Analysis Techniques

There are many different methods for analyzing data arrays, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not claim to be complete, but it reflects the most popular approaches in various industries. At the same time, it should be understood that researchers continue to work on the creation of new methods and the improvement of existing ones. In addition, some of the techniques listed are not necessarily applicable exclusively to large data and can be successfully used for smaller arrays (for example, A / B testing, regression analysis). Of course, the more voluminous and diversifiable the array is analyzed, the more accurate and relevant data can be obtained at the output.

A/B testing. A technique in which a control sample is compared with others in turn. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing offer. big data allow to carry out a huge number of iterations and thus obtain a statistically significant result.

association rule learning. A set of techniques for identifying relationships, i.e. association rules between variables in large data arrays. Used in data mining.

classification. A set of techniques that allows you to predict consumer behavior in a particular market segment (purchase decisions, churn, consumption volume, etc.). Used in data mining.

cluster analysis. A statistical method for classifying objects into groups by identifying common features that are not known in advance. Used in data mining.

Crowdsourcing. A technique for collecting data from a large number of sources.

Data fusion and data integration. A set of techniques that allows you to analyze the comments of social network users and compare them with real-time sales results.

data mining. A set of techniques that allows you to determine the most susceptible categories of consumers for the promoted product or service, identify the characteristics of the most successful employees, and predict the behavioral model of consumers.

Ensemble learning. This method uses a lot of predictive models, which improves the quality of the predictions made.

Genetic algorithms. In this technique, possible solutions are represented as `chromosomes` that can combine and mutate. As in the process of natural evolution, the fittest individual survives.

machine learning. A direction in computer science (historically, the name `artificial intelligence` has been assigned to it), which aims to create self-learning algorithms based on the analysis of empirical data.

natural language processing (NLP). A set of natural language recognition techniques borrowed from computer science and linguistics.

network analysis. A set of techniques for analyzing links between nodes in networks. With regard to social networks, it allows you to analyze the relationship between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesigning complex systems and processes to improve one or more indicators. Helps in making strategic decisions, for example, the composition of the product line introduced to the market, conducting investment analysis, etc.

pattern recognition. A set of techniques with elements of self-learning for predicting the behavioral model of consumers.

predictive modeling. A set of techniques that allow you to create a mathematical model of a predetermined probable scenario for the development of events. For example, the analysis of the CRM-system database for possible conditions that will push subscribers to change providers.

regression. A set of statistical methods for identifying patterns between changes in a dependent variable and one or more independent variables. Often used for forecasting and predictions. Used in data mining.

sentiment analysis. The techniques for assessing consumer sentiment are based on human natural language recognition technologies. They allow you to isolate messages related to the subject of interest (for example, a consumer product) from the general information flow. Next, evaluate the polarity of the judgment (positive or negative), the degree of emotionality, and so on.

signal processing. A set of techniques borrowed from radio engineering, which aims to recognize a signal against a background of noise and its further analysis.

Spatial analysis. A set of techniques, partly borrowed from statistics, for analyzing spatial data - terrain topology, geographic coordinates, geometry of objects. source big data in this case geographic information systems (GIS) often act.

Statistics. The science of collecting, organizing, and interpreting data, including designing questionnaires and conducting experiments. Statistical methods are often used to make value judgments about the relationships between certain events.

Supervised learning. A set of techniques based on machine learning technologies that allow you to identify functional relationships in the analyzed data arrays.

simulation. Modeling the behavior of complex systems is often used to predict, predict and work out various scenarios when planning.

Time series analysis. A set of methods borrowed from statistics and digital signal processing for analyzing data sequences that repeat over time. One obvious use is to track the stock market or the incidence of patients.

Unsupervised learning. A set of techniques based on machine learning technologies that allow you to identify hidden functional relationships in the analyzed data sets. Has common features with cluster analysis.

Visualization. Methods for graphical presentation of the results of big data analysis in the form of diagrams or animated images to simplify interpretation and facilitate understanding of the results obtained.


A visual presentation of the results of big data analysis is of fundamental importance for their interpretation. It is no secret that human perception is limited, and scientists continue to conduct research in the field of improving modern methods of presenting data in the form of images, diagrams or animations.

Analytical tools

For 2011, some of the approaches listed in the previous subsection, or a certain combination of them, make it possible to put into practice analytical engines for working with big data. Of the free or relatively inexpensive open systems for analyzing Big Data, we can recommend:

  • Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest on this list is Apache Hadoop, an open source software that has been tested as a data analyzer by most stock trackers over the past five years. As soon as Yahoo opened up the Hadoop code to the open source community, a whole new trend in the IT industry quickly emerged to create products based on Hadoop. Almost all modern analysis tools big data provide integration with Hadoop. Their developers are both startups and well-known global companies.

Markets for Big Data Management Solutions

Big Data Platforms (BDP, Big Data Platform) as a means of combating digital hording

Ability to analyze big data, colloquially called Big Data, is perceived as a boon, and unambiguously. But is it really so? What can the unbridled accumulation of data lead to? Most likely to the fact that domestic psychologists in relation to a person call pathological hoarding, syllogomania, or figuratively "Plyushkin's syndrome." In English, the vicious passion to collect everything is called hording (from the English hoard - “reserve”). According to the classification of mental illness, hording is classified as a mental disorder. In the digital age, digital (Digital Hoarding) is added to the traditional material chording, both individuals and entire enterprises and organizations () can suffer from it.

World and Russian market

Big data landscape - Main providers

Interest in collection, processing, management and analysis tools big data showed almost all the leading IT companies, which is quite natural. Firstly, they directly experience this phenomenon in their own business, and secondly, big data open up excellent opportunities for developing new market niches and attracting new customers.

A lot of startups have appeared on the market that do business on processing huge amounts of data. Some of them use ready-made cloud infrastructure provided by large players like Amazon.

Theory and practice of Big Data in industries

The history of development

2017

TmaxSoft forecast: the next "wave" of Big Data will require DBMS modernization

Businesses know that the huge amounts of data they accumulate contains important information about their business and customers. If the company can successfully apply this information, then it will have a significant advantage over its competitors, and it will be able to offer better products and services than theirs. However, many organizations still cannot effectively use big data due to the fact that their legacy IT infrastructure is unable to provide the necessary storage capacity, the data exchange processes, utilities and applications necessary to process and analyze large arrays of unstructured data to extract valuable information from them, TmaxSoft indicated.

In addition, increasing the processing power needed to analyze ever-increasing volumes of data can require significant investment in an organization's legacy IT infrastructure, as well as additional maintenance resources that could be used to develop new applications and services.

On February 5, 2015, the White House released a report discussing how companies are using " big data to set different prices for different buyers - a practice known as "price discrimination" or "differential pricing" (personalized pricing). The report describes the benefits of "big data" for both sellers and buyers, and concludes that many of the issues raised by the advent of big data and differential pricing can be addressed within existing anti-discrimination laws and regulations. protecting the rights of consumers.

The report notes that at this time, there is only anecdotal evidence of how companies are using big data in the context of individualized marketing and differentiated pricing. This information shows that sellers use pricing methods that can be divided into three categories:

  • studying the demand curve;
  • Steering and differentiated pricing based on demographics; and
  • target behavioral marketing (behavioral targeting - behavioral targeting) and individualized pricing.

Studying the demand curve: In order to understand demand and study consumer behavior, marketers often conduct experiments in this area, during which customers are randomly assigned one of two possible price categories. “Technically, these experiments are a form of differential pricing because they result in different prices for customers, even if they are “non-discriminatory” in the sense that all customers have the same chance of “hitting” the higher price.”

Steering: This is the practice of presenting products to consumers based on their belonging to a certain demographic group. For example, a computer company website may offer the same laptop to different types of customers at different prices based on the information they provide about themselves (for example, depending on whether the user is a representative of government agencies, scientific or commercial institutions, or an individual) or their geographic location (for example, determined by the IP address of a computer).

Targeted Behavioral Marketing and Customized Pricing: In these cases, personal data of buyers is used for targeted advertising and individualized pricing of certain products. For example, online advertisers use data collected by advertising networks and third party cookies about user activity on the Internet to target their advertising materials. This approach, on the one hand, allows consumers to receive advertisements of goods and services of interest to them, but it may cause concern for those consumers who do not want certain types of their personal data (such as information about visiting websites linked to with medical and financial matters) met without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of individualized pricing in the online environment. The report speculates that this may be because methods are still being developed, or because companies are reluctant to adopt (or prefer to keep quiet about) individual pricing, possibly fearing a backlash from consumers.

The authors of the report believe that "for the individual consumer, the use of big data is undoubtedly associated with both potential returns and risks." While acknowledging that there are transparency and discrimination issues associated with the use of big data, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also highlights the need for “ongoing scrutiny” when companies use confidential information in a non-transparent manner or in ways that are not covered by the existing regulatory framework.

This report is a continuation of the White House's efforts to examine the use of "big data" and discriminatory pricing on the Internet, and the resulting consequences for American consumers. It was previously reported that the White House Working Group on Big Data published its report on this issue in May 2014. The Federal Trade Commission (FTC) also addressed these issues during its September 2014 workshop on discrimination in relation to the use of big data.

2014

Gartner demystifies Big Data

A fall 2014 policy brief from Gartner lists and debunks a number of common myths about Big Data among CIOs.

  • Everyone implements Big Data processing systems faster than us

Interest in Big Data technologies is at an all-time high, with 73% of organizations surveyed by Gartner analysts this year already investing in or planning to do so. But most of these initiatives are still in their very early stages, and only 13% of those surveyed have already implemented such solutions. The hardest part is figuring out how to monetize Big Data, deciding where to start. Many organizations get stuck in the pilot phase because they can't tie new technology to specific business processes.

  • We have so much data that there is no need to worry about small errors in it.

Some CIOs believe that small flaws in the data do not affect the overall results of analyzing huge volumes. When there is a lot of data, each error separately really affects the result less, analysts say, but the errors themselves become larger. In addition, most of the analyzed data is external, of unknown structure or origin, so the probability of errors increases. Thus, in the world of Big Data, quality is actually much more important.

  • Big Data technologies will eliminate the need for data integration

Big Data promises the ability to process data in its original format with automatic schema generation as it is read. It is believed that this will allow the analysis of information from the same sources using multiple data models. Many believe that this will also enable end users to interpret any set of data in their own way. In reality, most users often want the traditional out-of-the-box schema where the data is formatted appropriately and there is agreement on the level of information integrity and how it should relate to the use case.

  • Data warehouses do not make sense to use for complex analytics

Many information management system administrators feel that it makes no sense to spend time creating a data warehouse, given that complex analytical systems use new types of data. In fact, many sophisticated analytics systems use information from a data warehouse. In other cases, new data types need to be additionally prepared for analysis in Big Data processing systems; decisions have to be made about the suitability of the data, the principles of aggregation, and the required level of quality - such preparation can take place outside the warehouse.

  • Data warehouses will be replaced by data lakes

In reality, vendors mislead customers by positioning data lakes as a replacement for storage or as critical elements of an analytical infrastructure. The underlying technologies of data lakes lack the maturity and breadth of functionality found in data warehouses. Therefore, leaders responsible for managing data should wait until the lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are satisfied with the result

Among the main advantages of big data, respondents named:

  • "search for new sources of income" (56%),
  • "improving customer experience" (51%),
  • "new products and services" (50%) and
  • "an influx of new customers and maintaining the loyalty of old ones" (47%).

When introducing new technologies, many companies have faced traditional problems. For 51%, the stumbling block was security, for 47% - the budget, for 41% - the lack of necessary personnel, and for 35% - difficulties in integrating with the existing system. Almost all surveyed companies (about 91%) plan to soon solve the problem with a shortage of staff and hire big data specialists.

Companies are optimistic about the future of big data technologies. 89% believe they will change business as much as the internet. 79% of respondents noted that companies that do not deal with big data will lose their competitive advantage.

However, the respondents disagreed on what exactly should be considered big data. 65% of respondents believe that these are “large data files”, 60% are sure that this is “advanced analytics and analysis”, and 50% that this is “data visualization tools”.

Madrid spends 14.7 million euros on big data management

In July 2014, it became known that Madrid would use big data technologies to manage urban infrastructure. The cost of the project is 14.7 million euros, and the solutions to be implemented will be based on technologies for analyzing and managing big data. With their help, the city administration will manage the work with each service provider and pay accordingly, depending on the level of services.

We are talking about contractors of the administration who monitor the condition of the streets, lighting, irrigation, green spaces, clean up the territory and remove, as well as process garbage. In the course of the project, 300 key performance indicators of city services have been developed for specially assigned inspectors, on the basis of which 1.5 thousand various checks and measurements will be carried out daily. In addition, the city will start using an innovative technological platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: The peak of fashion for Big Data

Without exception, all vendors in the data management market are currently developing technologies for Big Data management. This new technological trend is also being actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, the wave of discussion around " big data"exceeded all conceivable dimensions. After analyzing the number of mentions of Big Data in social networks, Datashift calculated that in 2012 this term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3070 mentions per hour.

Gartner: Every second CIO is ready to spend money on Big data

After several years of experiments with Big data technologies and the first implementations in 2013, the adaptation of such solutions will increase significantly, Gartner predicts. The researchers surveyed IT leaders around the world and found that 42% of those surveyed have already invested in Big data technologies or are planning to make such investments over the next year (data as of March 2013).

Companies are forced to spend money on processing technologies big data As the information landscape is rapidly changing, I require new approaches to information processing. Many companies have already realized that big data is critical, and working with it allows you to achieve benefits that are not available using traditional sources of information and methods of processing it. In addition, the constant exaggeration of the topic of "big data" in the media fuels interest in relevant technologies.

Frank Buytendijk, vice president of Gartner, even urged companies to tone down as some are worried that they are lagging behind competitors in mastering big data.

“There is no need to worry, the possibilities for realizing ideas based on big data technologies are virtually limitless,” he said.

Gartner predicts that by 2015, 20% of the Global 1000 companies will have a strategic focus on "information infrastructure."

In anticipation of the new opportunities that big data processing technologies will bring, many organizations are already organizing the process of collecting and storing various kinds of information.

For educational and government organizations, as well as companies in the industry, the greatest potential for business transformation lies in the combination of accumulated data with the so-called dark data (literally - “dark data”), the latter includes email messages, multimedia and other similar content. According to Gartner, those who learn how to deal with a wide variety of information sources will win the data race.

Poll Cisco: Big Data will help increase IT budgets

The Cisco Connected World Technology Report (Spring 2013) conducted in 18 countries by independent analyst firm InsightExpress surveyed 1,800 college students and an equal number of young professionals aged 18 to 30. The survey was conducted to find out the level of readiness of IT departments for the implementation of projects big data and gain an understanding of the associated challenges, technological flaws, and strategic value of such projects.

Most companies collect, record and analyze data. However, according to the report, many companies face a range of complex business and information technology challenges in connection with Big Data. For example, 60 percent of those surveyed acknowledge that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent said that they are already getting real strategic benefits from the accumulated information.

More than half of the CIOs surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased demands on technology, personnel and professional skills. At the same time more than a half of respondents expect that such projects will increase IT budgets in their companies already in 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects will require the use of cloud computing. Thus, the spread of cloud technologies can affect the speed of distribution of Big Data solutions and the value of these solutions for business.

Companies collect and use data of various types, both structured and unstructured. Here are the sources from which survey participants receive data (Cisco Connected World Technology Report):

Nearly half (48 percent) of CIOs predict that the load on their networks will double over the next two years. (This is especially true in China, where 68 percent of those surveyed hold this point of view, and in Germany, 60 percent.) 23 percent of respondents expect network traffic to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for an explosive growth in network traffic.

27 percent of those surveyed admitted that they need better IT policies and information security measures.

21 percent need more bandwidth.

Big Data opens up new opportunities for IT departments to create value and build close relationships with business units to increase revenue and strengthen a company's financial position. Big Data projects make IT departments a strategic partner of business departments.

According to 73 percent of respondents, it is the IT department that will become the main engine for implementing the Big Data strategy. At the same time, respondents believe that other departments will also be involved in the implementation of this strategy. First of all, this concerns the departments of finance (named by 24 percent of respondents), research and development (20 percent), operations (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Millions of new jobs needed to manage big data

Global IT spending will reach $3.7 billion by 2013, up 3.8% from IT spending in 2012 (year-end forecast is $3.6 billion). Segment big data(big data) will evolve at a much faster pace, according to a Gartner report.

By 2015, 4.4 million IT jobs will be created to serve big data, of which 1.9 million will be in . What's more, each such job will generate three additional non-IT jobs, so that in the United States alone, 6 million people will be working to support the information economy over the next four years.

According to Gartner experts, the main problem is that there is not enough talent in the industry for this: both private and public educational systems, for example, in the United States, are not able to supply the industry with a sufficient number of qualified personnel. So of the mentioned new jobs in IT, only one out of three will be provided with personnel.

Analysts believe that the role of cultivating qualified IT personnel should be taken directly by companies that are in dire need of them, as such employees will become a pass for them into the new information economy of the future.

2012

First skepticism about Big Data

Analysts from Ovum and Gartner suggest that for a trendy topic in 2012 big data it may be time to let go of illusions.

The term "Big Data" at this time usually refers to the ever-growing volume of information coming online from social media, sensor networks and other sources, as well as the growing range of tools used to process data and identify important business from it. -trends.

“Because of (or in spite of) the hype surrounding the idea of ​​big data, manufacturers in 2012 looked at this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer said that DataSift conducted a retrospective analysis of big data references in

At one time, I heard the term “Big Data” from German Gref (head of Sberbank). Like, they are now actively working on implementation, because this will help them reduce the time they work with each client.

The second time I came across this concept was in the client's online store, on which we worked and increased the assortment from a couple of thousand to a couple of tens of thousands of commodity items.

The third time I saw that Yandex needed a big data analyst. Then I decided to delve deeper into this topic and at the same time write an article that will tell you what kind of term it is that excites the minds of TOP managers and the Internet space.

VVV or VVVVV

I usually start any of my articles with an explanation of what kind of term it is. This article will be no exception.

However, this is not primarily due to the desire to show how smart I am, but because the topic is really complex and requires careful explanation.

For example, you can read what big data is on Wikipedia, do not understand anything, and then return to this article in order to understand the definition and applicability for business. So, let's start with a description, and then to business examples.

Big data is big data. Amazing, right? Actually, from English it is translated as “big data”. But this definition, one might say, is for dummies.

Important. Big data technology is an approach/method of processing more data to obtain new information that is difficult to process in conventional ways.

Data can be both processed (structured) and fragmented (i.e. unstructured).

The term itself appeared relatively recently. In 2008, a scientific journal predicted this approach as something necessary to deal with a large amount of information that is growing exponentially.

For example, every year the information on the Internet that needs to be stored, and, of course, processed, increases by 40%. Again. +40% every year new information appears on the Internet.

If the printed documents are understandable and the ways of processing them are also understandable (transfer to electronic form, stitch into one folder, numbered), then what to do with information that is presented in completely different “carriers” and other volumes:

  • Internet documents;
  • blogs and social networks;
  • audio/video sources;
  • measuring devices;

There are characteristics that make it possible to classify information and data as big data.

That is, not all data may be suitable for analytics. These characteristics contain the key concept of big data. They all fit in three V.

  1. Volume (from English volume). Data is measured in terms of the physical volume of the “document” to be analyzed;
  2. Speed ​​(from English velocity). The data does not stand in its development, but constantly grows, which is why they need to be processed quickly to obtain results;
  3. Variety (from English variety). The data may not be uniform. That is, they can be fragmented, structured or partially structured.

However, from time to time, a fourth V (veracity - reliability / credibility of the data) and even a fifth V are added to VVV (in some cases it is viability - viability, in others it is value).

Somewhere I even saw 7V, which characterize data related to big data. But in my opinion, this is from a series (where Ps are periodically added, although the initial 4 is enough for understanding).

WE ARE ALREADY MORE THAN 29,000 people.
TURN ON

Who needs it?

A logical question arises, how can information be used (if anything, big data is hundreds and thousands of terabytes)? Not even like that.

Here is the information. So why did they come up with big data then? What is the use of big data in marketing and business?

  1. Conventional databases cannot store and process (I'm not even talking about analytics now, but simply storing and processing) a huge amount of information.

    Big data solves this main problem. Successfully stores and manages information with a large volume;

  2. Structures information coming from various sources (video, images, audio and text documents) into one single, understandable and digestible form;
  3. Formation of analytics and creation of accurate forecasts based on structured and processed information.

It's complicated. Simply put, any marketer who understands that if you study a large amount of information (about you, your company, your competitors, your industry), you can get very decent results:

  • Full understanding of your company and your business from the side of numbers;
  • Study your competitors. And this, in turn, will make it possible to get ahead by dominating them;
  • Learn new information about your customers.

And precisely because big data technology gives the following results, everyone rushes with it.

They are trying to screw this business into their company in order to get an increase in sales and a decrease in costs. And to be specific, then:

  1. Increasing cross-sells and up-sells through better knowledge of customer preferences;
  2. Search for popular products and reasons why they are bought (and vice versa);
  3. Product or service improvement;
  4. Improvement in the level of service;
  5. Increasing loyalty and customer focus;
  6. Fraud prevention (more relevant for the banking sector);
  7. Reducing excess costs.

The most common example given in all sources is, of course, Apple, which collects data about its users (phone, watch, computer).

It is because of the presence of the eco-system that the corporation knows so much about its users and in the future uses this for profit.

You can read these and other examples of use in any other article except this one.

Let's go to the future

I will tell you about another project. Or rather, about a person who builds the future using big data solutions.

This is Elon Musk and his company Tesla. His main dream is to make cars autonomous, that is, you get behind the wheel, turn on the autopilot from Moscow to Vladivostok and ... fall asleep, because you don’t need to drive a car at all, because he will do everything himself.

It would seem fantastic? But no! It's just that Elon acted much wiser than Google, who control cars using dozens of satellites. And went the other way:

  1. Each car sold is equipped with a computer that collects all the information.

    All means everything. About the driver, his driving style, the roads around, the movement of other cars. The volume of such data reaches 20-30 GB per hour;

  2. Further, this information is transmitted via satellite to the central computer, which processes this data;
  3. Based on the big data that this computer processes, a model of an unmanned vehicle is built.

By the way, if Google is doing pretty badly and their cars get into accidents all the time, then Musk, due to the fact that he is working with big data, is doing much better, because test models show very good results.

But... It's all about the economy. What are we all about profit, yes about profit? Much that big data can solve is completely unrelated to earnings and money.

Google statistics, just based on big data, shows an interesting thing.

Before doctors announce the beginning of an epidemic of a disease in a region, the number of search queries for the treatment of this disease increases significantly in this region.

Thus, the correct study of the data and their analysis can form forecasts and predict the onset of the epidemic (and, accordingly, its prevention) much faster than the opinion of the authorities and their actions.

Application in Russia

However, Russia, as always, slows down a bit. So the very definition of big data in Russia appeared no more than 5 years ago (I'm talking about ordinary companies now).

And this is despite the fact that this is one of the fastest growing markets in the world (drugs and weapons are nervously smoking on the sidelines), because every year the market for software for collecting and analyzing big data grows by 32%.

To characterize the big data market in Russia, I am reminded of an old joke. Big date is like sex before 18.

Everyone is talking about it, there is a lot of hype around it and little real action, and everyone is ashamed to admit that they themselves are not doing this. Indeed, there is a lot of hype around this, but little real action.

Although the well-known research company Gartner announced already in 2015 that big data is no longer an increasing trend (like artificial intelligence, by the way), but completely independent tools for analyzing and developing advanced technologies.

The most active niches where big data is used in Russia are banks / insurance (not without reason I started the article with the head of Sberbank), telecommunications, retail, real estate and ... the public sector.

For example, I will tell you in more detail about a couple of sectors of the economy that use big data algorithms.

Banks

Let's start with banks and the information they collect about us and our actions. For example, I took the TOP-5 Russian banks that are actively investing in big data:

  1. Sberbank;
  2. Gazprombank;
  3. VTB 24;
  4. Alfa Bank;
  5. Tinkoff bank.

It is especially pleasant to see Alfa Bank among the Russian leaders. At the very least, it's nice to know that the bank, whose official partner you are, understands the need to introduce new marketing tools into your company.

But I want to show examples of the use and successful implementation of big data on the bank, which I like for the non-standard look and actions of its founder.

I'm talking about Tinkoff Bank. Their main task was to develop a system for analyzing big data in real time due to an overgrown customer base.

Results: the time of internal processes was reduced by at least 10 times, and for some - more than 100 times.

Well, a little distraction. Do you know why I started talking about the non-standard antics and actions of Oleg Tinkov?

It’s just that, in my opinion, it was they who helped him turn from a middle-class businessman, of which there are thousands in Russia, into one of the most famous and recognizable entrepreneurs. To prove it, watch this unusual and interesting video:

The property

In real estate, things are much more complicated. And this is exactly the example that I want to give you to understand the big date within the normal business. Initial data:

  1. Large volume of text documentation;
  2. Open sources (private satellites transmitting earth change data);
  3. The vast amount of uncontrolled information on the Internet;
  4. Constant changes in sources and data.

And on the basis of this, it is necessary to prepare and evaluate the cost of a land plot, for example, under the Ural village. It will take a week for a professional.

The Russian Society of Appraisers & ROSEKO, which actually implemented big data analysis with the help of software, will take no more than 30 minutes of leisurely work. Compare, a week and 30 minutes. Colossal difference.

Well, for a snack

Of course, huge amounts of information cannot be stored and processed on simple hard drives.

And the software that structures and analyzes data is generally intellectual property and each time it is an author's development. However, there are tools on the basis of which all this charm is created:

  • Hadoop & MapReduce;
  • NoSQL databases;
  • Tools of the Data Discovery class.

To be honest, I can’t clearly explain to you how they differ from each other, since acquaintance and work with these things are taught in physical and mathematical institutes.

Why then did I start talking about it if I can't explain it? Remember in all the movies the robbers go into any bank and see a huge number of all sorts of pieces of iron connected to the wires?

The same is true for big data. For example, here is a model that is currently one of the most leaders on the market.

Big date tool

The cost in the maximum configuration reaches 27 million rubles per rack. This is, of course, the deluxe version. I mean that you try on the creation of big data in your business in advance.

Briefly about the main

You may ask why do you, small and medium-sized businesses, work with big data?

To this I will answer you with a quote from one person: “In the near future, customers will be in demand for companies that better understand their behavior, habits and correspond to them as much as possible.”

But let's face it. To implement big data in a small business, it is necessary to have not only large budgets for the development and implementation of software, but also for the maintenance of specialists, at least such as a big data analyst and a system administrator.

And now I am silent about the fact that you should have such data for processing.

OK. For small businesses, the topic is almost not applicable. But this does not mean that you need to forget everything that you have read above.

Just study not your own data, but the results of data analytics from well-known both foreign and Russian companies.

For example, the Target retail chain, using big data analytics, found out that pregnant women before the second trimester of pregnancy (from the 1st to the 12th week of pregnancy) are actively buying non-flavored products.

With this data, they send them discount coupons for unscented products with a limited expiration date.

And if you are just a very small cafe, for example? Yes, very simple. Use a loyalty app.

And after some time and thanks to the accumulated information, you will be able not only to offer customers dishes relevant to their needs, but also to see the most unsold and most marginal dishes with just a couple of mouse clicks.

Hence the conclusion. It is hardly worth implementing big data for small businesses, but using the results and developments of other companies is a must.

The constant acceleration of data growth is an integral part of today's realities. Social networks, mobile devices, data from measuring devices, business information are just a few of the types of sources that can generate huge amounts of data.

Currently, the term Big Data (Big data) has become quite common. Far from everyone is still aware of how quickly and deeply technologies for processing large amounts of data are changing the most diverse aspects of society. Changes are taking place in various areas, giving rise to new problems and challenges, including in the field of information security, where such important aspects as confidentiality, integrity, availability, etc. should be in the foreground.

Unfortunately, many modern companies resort to Big Data technology without creating the proper infrastructure for this, which could ensure reliable storage of the huge amounts of data that they collect and store. On the other hand, blockchain technology is currently rapidly developing, which is designed to solve this and many other problems.

What is Big Data?

In fact, the definition of the term lies on the surface: "big data" means the management of very large amounts of data, as well as their analysis. If you look more broadly, then this is information that cannot be processed by classical methods due to its large volumes.

The term Big Data itself (big data) appeared relatively recently. According to the Google Trends service, the active growth in the popularity of the term falls on the end of 2011:

In 2010, the first products and solutions directly related to the processing of big data began to appear. By 2011, most of the largest IT companies, including IBM, Oracle, Microsoft and Hewlett-Packard, are actively using the term Big Data in their business strategies. Gradually, information technology market analysts begin active research on this concept.

Currently, this term has gained considerable popularity and is actively used in a variety of fields. However, it cannot be said with certainty that Big Data is some kind of fundamentally new phenomenon - on the contrary, large data sources have existed for many years. In marketing, they can be databases of customer purchases, credit histories, lifestyles, and more. Over the years, analysts have used this data to help companies predict future customer needs, assess risk, shape consumer preferences, and more.

Currently, the situation has changed in two aspects:

— More sophisticated tools and methods have emerged to analyze and compare different datasets;
— Analysis tools have been complemented by many new sources of data, driven by widespread digitization, as well as new methods of collecting and measuring data.

Researchers predict that Big Data technologies will be most actively used in manufacturing, healthcare, trade, public administration and in other very diverse areas and industries.

Big Data is not a specific array of data, but a set of methods for processing them. The defining characteristic for big data is not only their volume, but also other categories that characterize the labor-intensive processes of data processing and analysis.

The initial data for processing can be, for example:

— Internet user behavior logs;
— Internet of things;
- social media;
— meteorological data;
— digitized books of the largest libraries;
– GPS signals from vehicles;
— information about transactions of bank customers;
— data on the location of subscribers of mobile networks;
— information about purchases in large retail chains, etc.

Over time, the amount of data and the number of their sources is constantly growing, and against this background, new methods of information processing appear and existing methods of information processing are improved.

Basic principles of Big Data:

- Horizontal scalability - data arrays can be huge and this means that the big data processing system must dynamically expand as their volumes increase.
- Fault tolerance - even if some pieces of equipment fail, the entire system must remain operational.
— Data locality. In large distributed systems, data is usually distributed over a significant number of machines. However, whenever possible and in order to save resources, data is often processed on the same server as it is stored.

For the stable operation of all three principles and, accordingly, the high efficiency of storing and processing big data, new breakthrough technologies are needed, such as, for example, blockchain.

What is big data for?

The scope of Big Data is constantly expanding:

— Big data can be used in medicine. So, it is possible to establish a diagnosis for a patient not only based on the data of the analysis of the medical history, but also taking into account the experience of other doctors, information about the ecological situation of the patient's area of ​​residence, and many other factors.
— Big Data technologies can be used to organize the movement of unmanned vehicles.
— By processing large amounts of data, it is possible to recognize faces in photographic and video materials.
- Big Data technologies can be used by retailers - trading companies can actively use data arrays from social networks to effectively set up their advertising campaigns, which can be maximally focused on a particular consumer segment.
— This technology is actively used in the organization of election campaigns, including for the analysis of political preferences in society.
— The use of Big Data technologies is relevant for income assurance (RA) class solutions, which include tools for detecting inconsistencies and in-depth data analysis that allow timely identification of probable losses or distortions of information that can lead to a decrease in financial results.
— Telecommunication providers can aggregate big data, including geolocation data; in turn, this information may be of commercial interest to advertising agencies, which may use it to display targeted and local advertising, as well as to retailers and banks.
“Big data can play an important role in deciding whether to open a retail outlet in a particular location based on data on the presence of a powerful targeted flow of people.

Thus, the most obvious practical application of Big Data technology lies in the field of marketing. Thanks to the development of the Internet and the proliferation of all kinds of communication devices, behavioral data (such as the number of calls, shopping habits, and purchases) are becoming available in real time.

Big data technologies can also be effectively used in finance, sociological research and many other areas. Experts argue that all these possibilities of using big data are only the visible part of the iceberg, since these technologies are used in intelligence and counterintelligence, in military affairs, as well as in everything that is commonly called information warfare, to a much greater extent.

In general terms, the sequence of working with Big Data consists of collecting data, structuring the information received using reports and dashboards, and then formulating recommendations for action.

Let us briefly consider the possibilities of using Big Data technologies in marketing. As you know, for a marketer, information is the main tool for forecasting and strategizing. Big data analysis has long been successfully used to determine the target audience, interests, demand and activity of consumers. Big data analysis, in particular, makes it possible to display advertising (based on the RTB auction model - Real Time Bidding) only to those consumers who are interested in a product or service.

The use of Big Data in marketing allows businessmen to:

- better recognize your consumers, attract a similar audience on the Internet;
- evaluate the degree of customer satisfaction;
— to understand whether the proposed service meets the expectations and needs;
- find and implement new ways to increase customer confidence;
— create projects that are in demand, etc.

For example, the Google.trends service can tell a marketer a forecast of seasonal demand activity for a particular product, fluctuations, and geography of clicks. If you compare this information with the statistics collected by the corresponding plugin on your own site, you can make a plan for the distribution of the advertising budget, indicating the month, region, and other parameters.

According to many researchers, it is in the segmentation and use of Big Data that the success of the Trump campaign lies. The team of the future US president was able to correctly divide the audience, understand its desires and show exactly the message that voters want to see and hear. So, according to Irina Belysheva from the Data-Centric Alliance, Trump's victory was largely due to a non-standard approach to Internet marketing, which was based on Big Data, psycho-behavioral analysis and personalized advertising.

Trump's political technologists and marketers used a specially developed mathematical model, which made it possible to deeply analyze the data of all US voters and systematize them, making ultra-precise targeting not only by geographical features, but also by the intentions, interests of voters, their psychotype, behavioral characteristics, etc. After To this end, marketers have organized personalized communication with each of the groups of citizens based on their needs, moods, political views, psychological characteristics, and even skin color, using their own message for almost every individual voter.

As for Hillary Clinton, she used “time-tested” methods based on sociological data and standard marketing in her campaign, dividing the electorate only into formally homogeneous groups (men, women, African Americans, Hispanics, poor, rich, etc.) .

As a result, the winner was the one who appreciated the potential of new technologies and methods of analysis. Notably, Hillary Clinton's campaign spending was twice that of her opponent:

Data: Pew Research

The main problems of using Big Data

In addition to the high cost, one of the main factors hindering the introduction of Big Data in various areas is the problem of choosing the data to be processed: that is, determining which data needs to be extracted, stored and analyzed, and which ones should not be taken into account.

Another problem of Big Data is ethical. In other words, a natural question arises: can such data collection (especially without the knowledge of the user) be considered a violation of privacy boundaries?

It's no secret that the information stored in Google and Yandex search engines allows IT giants to constantly improve their services, make them user-friendly and create new interactive applications. To do this, search engines collect user data about user activity on the Internet, IP addresses, geolocation data, interests and online purchases, personal data, email messages, etc. All this allows displaying contextual advertising in accordance with user behavior on the Internet. At the same time, users' consent is usually not asked for this, and the choice of what information about themselves to provide is not given. That is, by default, everything is collected in Big Data, which will then be stored on the sites' data servers.

From this follows the next important issue regarding the security of storage and use of data. For example, is an analytics platform that consumers automatically share their data with secure? In addition, many business representatives note a shortage of highly qualified analysts and marketers who are able to effectively operate large amounts of data and solve specific business problems with their help.

Despite all the difficulties with the implementation of Big Data, the business intends to increase investments in this area. According to a Gartner study, the leaders of industries investing in Big Data are media, retail, telecom, banking and service companies.

Prospects for interaction between blockchain technologies and Big Data

Integration with Big Data has a synergistic effect and opens up a wide range of new opportunities for businesses, including allowing:

— get access to detailed information about consumer preferences, on the basis of which you can build detailed analytical profiles for specific suppliers, products and product components;
- integrate detailed data on transactions and statistics on the consumption of certain groups of goods by various categories of users;
- obtain detailed analytical data on supply and consumption chains, control product losses during transportation (for example, weight loss due to shrinkage and evaporation of certain types of goods);
– counteract counterfeit products, increase the effectiveness of the fight against money laundering and fraud, etc.

Access to detailed data on the use and consumption of goods will largely unlock the potential of Big Data technology to optimize key business processes, reduce regulatory risks, and open up new opportunities for monetization and creating products that will best meet current consumer preferences.

As you know, representatives of the largest financial institutions are already showing significant interest in blockchain technology, including, etc. According to Oliver Bussmann, IT manager of the Swiss financial holding UBS, blockchain technology can “reduce transaction processing time from several days to several minutes” .

The potential for analysis from the blockchain using Big Data technology is huge. Distributed registry technology ensures the integrity of information, as well as reliable and transparent storage of the entire transaction history. Big Data, in turn, provides new tools for effective analysis, forecasting, economic modeling and, accordingly, opens up new opportunities for making more informed management decisions.

The tandem of blockchain and Big Data can be successfully used in healthcare. As you know, imperfect and incomplete data on the health of the patient at times increase the risk of making an incorrect diagnosis and incorrectly prescribed treatment. Critical data about the health of clients of medical institutions should be as secure as possible, have the properties of immutability, be verifiable and not be subject to any manipulation.

The information in the blockchain meets all of the above requirements and can serve as high-quality and reliable source data for in-depth analysis using new Big Data technologies. In addition, with the help of the blockchain, medical institutions could exchange reliable data with insurance companies, justice authorities, employers, scientific institutions and other organizations in need of medical information.

Big Data and information security

In a broad sense, information security is the protection of information and supporting infrastructure from accidental or intentional negative impacts of a natural or artificial nature.

In the field of information security, Big Data faces the following challenges:

— Problems of data protection and ensuring their integrity;
— the risk of outside interference and leakage of confidential information;
— improper storage of confidential information;
- the risk of information loss, for example, due to someone's malicious actions;
— the risk of misuse of personal data by third parties, etc.

One of the main problems of big data that the blockchain is designed to solve lies in the field of information security. Ensuring compliance with all its basic principles, distributed ledger technology can guarantee the integrity and reliability of data, and due to the absence of a single point of failure, the blockchain makes information systems stable. Distributed ledger technology can help solve the problem of trust in data, as well as provide the possibility of a universal exchange of data.

Information is a valuable asset, which means that the main aspects of information security should be at the forefront. In order to survive in the competition, companies must keep up with the times, which means that they cannot ignore the potential opportunities and advantages that blockchain technology and Big Data tools contain.

Only the lazy one does not talk about Big data, but he hardly understands what it is and how it works. Let's start with the simplest - terminology. Speaking in Russian, Big data is a variety of tools, approaches and methods for processing both structured and unstructured data in order to use them for specific tasks and purposes.

Unstructured data is information that does not have a predetermined structure or is not organized in a particular order.

The term "big data" was coined by Nature editor Clifford Lynch back in 2008 in a special issue on the explosive growth of the world's information volumes. Although, of course, big data itself existed before. According to experts, the majority of data flows over 100 GB per day belong to the Big data category.

Read also:

Today, this simple term hides only two words - data storage and processing.

Big data - in simple words

In the modern world, Big data is a socio-economic phenomenon, which is associated with the fact that new technological opportunities have appeared for analyzing a huge amount of data.

Read also:

For ease of understanding, imagine a supermarket in which all the goods are not in the order you are used to. Bread next to fruit, tomato paste next to a frozen pizza, lighter fluid next to a rack of tampons that has avocados, tofu, or shiitake mushrooms, among others. Big data puts everything in its place and helps you find nut milk, find out the cost and expiration date, and also who, besides you, buys such milk and how it is better than cow's milk.

Kenneth Cookier: Big data is better data

Big data technology

Huge amounts of data are processed so that a person can get specific and necessary results for their further effective application.

Read also:

In fact, Big data is a problem solver and an alternative to traditional data management systems.

Techniques and methods of analysis applicable to Big data according to McKinsey:

  • crowdsourcing;

    Blending and data integration;

    Machine learning;

    Artificial neural networks;

    Pattern recognition;

    Predictive analytics;

    simulation modeling;

    Spatial analysis;

    Statistical analysis;

  • Visualization of analytical data.

Horizontal scalability that enables data processing is the basic principle of big data processing. Data is distributed to computing nodes, and processing occurs without performance degradation. McKinsey also included relational management systems and Business Intelligence in the context of applicability.

Technologies:

  • NoSQL;
  • MapReduce;
  • Hadoop;
  • Hardware solutions.

Read also:

For big data, there are traditional defining characteristics developed by the Meta Group back in 2001, which are called “ Three V»:

  1. Volume- the value of the physical volume.
  2. Velocity- growth rate and the need for fast data processing to obtain results.
  3. Variety- the ability to simultaneously process different types of data.

Big data: application and opportunities

The volumes of heterogeneous and rapidly incoming digital information cannot be processed by traditional tools. The analysis of the data itself allows you to see certain and imperceptible patterns that a person cannot see. This allows us to optimize all areas of our lives - from public administration to manufacturing and telecommunications.

For example, some companies a few years ago protected their customers from fraud, and taking care of the client's money is taking care of your own money.

Susan Atliger: What about big data?

Solutions based on Big data: Sberbank, Beeline and other companies

Beeline has a huge amount of data about subscribers, which they use not only to work with them, but also to create analytical products, such as external consulting or IPTV analytics. Beeline segmented the database and protected clients from money fraud and viruses by using HDFS and Apache Spark for storage, and Rapidminer and Python for data processing.

Read also:

Or remember Sberbank with their old case called AS SAFI. This is a system that analyzes photos to identify bank customers and prevent fraud. The system was introduced back in 2014, the system is based on comparing photos from the database that get there from webcams on racks thanks to computer vision. The basis of the system is a biometric platform. Thanks to this, cases of fraud decreased by 10 times.

Big data in the world

By 2020, according to forecasts, humanity will form 40-44 zettabytes of information. And by 2025 it will grow 10 times, according to The Data Age 2025 report, which was prepared by IDC analysts. The report notes that most of the data will be generated by businesses themselves, not ordinary consumers.

Analysts of the study believe that data will become a vital asset, and security - a critical foundation in life. Also, the authors of the work are confident that the technology will change the economic landscape, and the average user will communicate with connected devices about 4800 times a day.

Big data market in Russia

Typically, big data comes from three sources:

  • Internet (social networks, forums, blogs, media and other sites);
  • Corporate archives of documents;
  • Indications of sensors, instruments and other devices.

Big data in banks

In addition to the system described above, in the strategy of Sberbank for 2014-2018. talks about the importance of analyzing super-data sets for quality customer service, risk management and cost optimization. The bank now uses Big Data to manage risks, fight fraud, segment and assess the creditworthiness of customers, manage personnel, predict queues at branches, calculate bonuses for employees, and other tasks.

VTB24 uses big data to segment and manage customer churn, generate financial statements, and analyze feedback on social networks and forums. To do this, he uses Teradata, SAS Visual Analytics, and SAS Marketing Optimizer solutions.

The term Big Data usually refers to any amount of structured, semi-structured and unstructured data. However, the second and third can and should be ordered for subsequent analysis of information. Big data does not equate to any actual volume, but speaking of Big Data in most cases, we mean terabytes, petabytes and even extrabytes of information. This amount of data can accumulate in any business over time, or, in cases where a company needs to receive a lot of information, in real time.

Big Data Analysis

Speaking about the analysis of Big Data, first of all, we mean the collection and storage of information from various sources. For example, data about customers who made purchases, their characteristics, information about launched advertising campaigns and an assessment of its effectiveness, contact center data. Yes, all this information can be compared and analyzed. It is possible and necessary. But for this you need to set up a system that allows you to collect and transform information without distorting information, store it and, finally, visualize it. Agree, with big data, tables printed on several thousand pages will not help much for making business decisions.

1. The arrival of big data

Most services that collect information about user actions have the ability to export . In order for them to enter the company in a structured form, various ones are used, for example, Alteryx. This software allows you to automatically receive information, process it, but most importantly, convert it into the desired form and format without distorting it.

2. Storage and processing of big data

Almost always, when collecting large amounts of information, the problem of its storage arises. Of all the platforms that we studied, our company prefers Vertica. Unlike other products, Vertica is able to quickly "give" the information stored in it. The disadvantages include a long recording, but during the analysis of big data, the speed of return comes to the fore. For example, if we are talking about compiling using a petabyte of information, upload speed is one of the most important characteristics.

3. Visualization of Big Data

And finally, the third stage of the analysis of large amounts of data is . This requires a platform that is able to visually reflect all the information received in a convenient form. In our opinion, only one software product, Tableau, can cope with the task. Undoubtedly, one of the best solutions today that can visually show any information, turning the company's work into a three-dimensional model, collecting the actions of all departments into a single interdependent chain (you can read more about Tableau's capabilities).

Instead of a summary, we note that almost any company can now generate their own Big Data. Big data analysis is no longer a complex and expensive process. The company's management is now required to correctly formulate questions to the collected information, while there are practically no invisible gray areas.

Download Tableau

Download the full version of Tableau Desktop for FREE, 14 days and get Tableau business intelligence training materials as a GIFT

Share