• Post published:12 September 2020
  • Post comments:37 Comments
You are currently viewing What exactly is the Big Data ?

Note : This page has been translated into English from French by a machine translation tool

Every action we perform leaves a digital trace… This sentence may seem surprising, but it is far from being totally false. Indeed, as soon as we use our smartphone, tablet or computer with a network connection, what we do with it directly or what comes out of it is therefore likely to be recorded. All of these measures, which are constantly emanating from the actions of billions of people, represent a colossal amount of data collected every day. To this we can add industrial companies that record, with the help of numerous sensors, data in real time on their premises, or sites dedicated to scientific experiments or observations that generate, again with tools such as large telescopes for example, very large quantities of measurements. All these devices in our pockets, on our desks, in our companies have generated in a few years more data than we have ever collected since the dawn of time. However, collecting data is good, but it is essential to analyze them in order to extract important information. And the change of scale in the amount of data collected and stored has meant that we have had to think about new ways to manage and process it. This sudden increase in the amount of data collected through the network and new information technologies led to the introduction, a little more than twenty years ago, of a new and more appropriate term for it. The Big Data was born…

Definition of Big Data

The notion of Big Data is extremely broad and it is relatively difficult to produce a precise definition. The term Big Data, which can be translated into French as “megadonnées” or “données massives”, refers to the extremely large masses of data collected at any given moment by many devices and in many fields, and whose quantity in particular, but also the diversity and speed of production, require the implementation of tools specially designed to process them, analyze them and extract important information from them. The advantage of Big Data is that, thanks to the IT resources available to us today, we can use very large amounts of data in an automated way to create value and extract knowledge that we have not been able to highlight until now, due to a lack of resources.

Characteristics of the Big Data

Big Data is regularly characterized by multiple “V’s” designating the first letter of three main terms to which two, three or even four others can sometimes be added. Let’s start with the one that characterizes the very essence of Big Data, namely “Volume”. Indeed, the term Big Data refers first of all to the quantity of data that is collected and this volume of data follows a law of almost exponential evolution due to the fact that there are more and more of us, that we have more and more devices capable of recording data and that technological progress is increasingly favorable to this.

Then, the Big Data is characterized by the “Variety” of the data. This is one of the reasons why Relational Database Management Systems (RDBMS), behind which the traditional SQL language is found, is no longer adapted to the processing and analysis of Big Data. Indeed, in addition to being very numerous, these collected data are very different from each other and sometimes seem to have no link between them. Among them are text, but also sound, images and videos that are constantly feeding the databases. And this is one of the great strengths of Big Data! It is to have such a variety in the collected data that it becomes possible to extract relations between a part of them that intuitively a preliminary treatment carried out by a man could not have imagined.

The last characteristic that is not debated about Big Data is “Speed”. As we mentioned before, if the amount of data at our disposal doubles regularly it is because we are collecting more and more data but also because we are collecting it faster and faster. It then becomes necessary to put in place tools that can not only process the masses of Big Data, but above all process them extremely quickly until we can do it in real time.

From the previous one, a fourth can be derived, namely “Volatility”. With the speed at which data is collected, more and more frequent updates naturally follow. In this fast-moving world, the data is bound to change very quickly and thus become more frequently obsolete. The need to analyze them without delay or even in real time is even more essential because they can very quickly lose their value because of this obsolescence.

Let’s quickly evoke the other characteristics that can be attributed to Big Data… First of all, the “Truthfulness” notion which is also specific to the Big Data in relation to the relational databases mentioned above, for the simple reason that collecting data internally and in smaller quantities favors the reliability of the latter. As soon as you massively collect data from many sources that are sometimes extremely heterogeneous, the risk of obtaining very low quality data increases. The more unstructured the data, the more complex it is to process. Some are semi-structured and others are not structured at all. And in order to infer knowledge from these data, confidence in them is a sine qua non condition for obtaining valid inferences. In the same theme, we can mention “Validity”, which refers more to the importance of having data that, in addition to being accurate, is adequate and relevant to the inferences that one wishes to generate.

Finally, as we vaguely mentioned earlier, we must insist on “Value”. This is the last indispensable element without which the Big Data process would have very little purpose. Value is what we want to extract from this processing by obtaining information that can be exploited for commercial or scientific purposes. As a result, the need for efficient and effective data processing to ensure that the inferences derived from this process are reliable and accurate is unavoidable.

The sources of the Big Data

We talked about the colossal masses of data that are collected at any given moment and that are the essence of Big Data. But where does this data actually come from ? Even if we have already cited some of these sources, let’s take a closer look at all the media used to acquire these massive data.

Let’s start with perhaps the most obvious source of acquisition, which is undoubtedly social networks. Facebook or Twitter in the lead are probably the organizations that entered the Big Data era first. And for good reason: the amount of messages, posts, comments or personal data published every day by the millions or even billions of users of these services constitutes a colossal pool of information. For these companies, storing them is part of the service they provide to their users. It was therefore obvious that they needed to find ways to exploit them.

Another source of data acquisition is without a doubt companies. As mentioned above, the presence of sensors to evaluate the operating status of industrial machinery, or all the commercial and financial data generated by large companies as well as the personal data of their customers, are all bases that are being expanded hour by hour. Large retailers in particular produce colossal amounts of data from the thousands of sales receipts produced and recorded every minute.

But Big Data is also the insane amount of data we all produce every day using our smartphones. We move around with our small devices that are always in our pockets. Equipped with GPS chips and accelerometers, it is possible for anyone who uses the data transmitted by these phones to know the position of the device and therefore its owner at any time. Moreover, since these smartphones are permanently connected to the telephone network, it is perfectly possible to know the approximate position of the device through the relay antennas that are triggered. For each call we make, or SMS we send, the numbers involved, but also the duration when it comes to calls, are also recorded and kept by the operators.

But the internet in general also produces a huge amount of data. Let’s start with the queries made on search engines, which are a significant source of information. Let’s also note the traffic on the different websites which are permanently traced by the recording of IP addresses, identifying the different devices connected to the networks, but also by cookies recorded on the storage spaces of the surfers’ browsing devices in order to follow their path on the web. But we can go even further by recording the time spent on the different pages, the areas of the screen on which the mouse passes and even the interactive elements that are clicked by visitors. Please note that we do not make a distinction here between data that is collected with your permission for legal reasons and data that is not required.

The list doesn’t stop there, but we have seen here some of the sources that are strongly generating data for the Big Data. In the same vein as social networks, let’s note all the online platforms for distributing images or videos, such as Youtube or Twitch. But all these data collected are not and should not be collected in vain. Let’s see what uses can be made of the Big Data…

The Big Data, what for ?

As you will have understood, the purpose of Big Data is to exploit all the data collected in order to extract knowledge that can later be used for decision making. While they can be more precisely the utilities of the Big Data ? As for the various data sources that we have detailed, we will not be able to list all the potential sectors and subjects that already benefit or could benefit in the future from Big Data technologies, as here again the possible uses are numerous and vast. However, it is possible to list a few of them, which could be classified among the main ones…

Let’s start with companies… If they collect data internally as we have seen but also seek data from outside, it is primarily for commercial and financial reasons. Collecting data about their customers and their behavior is primarily to understand what interests them and how they operate. The ultimate goal is to tailor the offer to match what has the greatest potential to generate sales. We can go even further by drawing up various customer profiles and addressing each one in a personalized manner in order to once again build customer loyalty and increase sales volume. Industrial companies, as we have discussed, collect data on their production equipment using sensors. The aim is to keep it under surveillance in order to anticipate possible risks of failure. Using Big Data technologies wisely is above all to be one step ahead…

On the other hand, we need to detail the use of Big Data for the geolocation of smartphone users. For example, it allows us to propose specific promotional offers available in a store or a store as soon as a customer is located nearby. It can also be used to find out how many people visit certain places that do not have a visitor counting system, such as museums, for example, where the number of tickets sold can be easily counted. Geolocation also makes it possible to know the habits of consumers and thus to offer them, for example, commercial offers on do-it-yourself products if they are regularly located in stores dedicated to the sale of this type of product. With geolocation systems, it is easy to know if you move often or very little. Here again, it allows to know a little more about you and your habits.

In the previous section, we talked about the data collected in particular by recording the requests made by Internet users on search engines. The latter are a gold mine that allows us to know the trends of what interests them and fashionable subjects for example. But that’s not all… These data also allow to obtain information with a high level of anticipation. For example, when there is an upsurge in queries about the symptoms of certain viruses, such as the flu, the data can be used to anticipate the arrival of an epidemic, for example. But this brings us back again to the commercial interests of companies since the exploitation of the searched subjects allows us to draw up your profile and then to submit commercial offers related to them since they are supposed to interest you.

Let’s continue in this section with the data collected on social networks. Here again, commercial interests are omnipresent and provide valuable data on users. From profile data, to “likes” that determine interests, to viewing certain categories of videos on Facebook Watch – which can increase advertising revenue by simply submitting similar videos that are more likely to be viewed – all of this behavioral data is used to target commercial offers as precisely as possible to maximize their effectiveness. Advertising agencies, such as Google‘s, use the same process for inserts on their websites as well as for services they offer themselves, such as Gmail, for example. All the comments and groups reached are as much information allowing us to refine our profiling. Recommendation algorithms such as those of YouTube or Netflix for example have the same principle of operation by associating an account with the behavior of its owner. This aims to increase service consumption and user loyalty by constantly improving the suitability of the content that is submitted by simply analyzing content that has already been viewed.

At this stage, it is easy to get the impression that Big Data can only be at the service of large commercial powers and used partially to the detriment of consumers. But the uses of Big Data are much broader, starting with medicine. Indeed, collecting data about sick patients can for example enable a better understanding of the intrinsic characteristics that make it possible to identify the presence of a disease, or even to anticipate its occurrence. During natural disasters such as major fires or earthquakes, for example, we can also analyze information shared on social networks or exchanged in private messages in order to focus assistance and rescue services where they will be most useful.

And Big Data can also provide real services, for example by exploiting traffic data to identify slowdowns or traffic jams and save time for other users by offering alternative routes. But there is even more interesting in this case… Let’s not forget that reducing traffic jams means reducing the time spent in the car and therefore the engine running time. In this way, we reduce the level of polluting particles and greenhouse gas emissions into our atmosphere. Also in the area of ecology, when it comes to energy consumption in very energy-intensive infrastructures, such as data centers for example, Big Data can also help to identify periods when servers are less busy, and therefore when it would be possible to put them on standby, or even turn them off altogether, to limit their overall energy consumption. And the use of Big Data for ecological reasons can also be considered to optimize energy production, in particular by evaluating the distribution of production using renewable sources and fossil fuels, depending on needs and resources.

We have reviewed here what the potential of Big Data may be, but many other subjects are already under study and others will most likely be discovered in the more or less distant future. But, as we have already said, the processing of all this data requires the deployment of very specific technologies that are significantly different from the systems used in the operation of traditional databases. So let’s take a closer look at what these technologies are ?

The Big Data technologies

It’s hard to talk about Big Data without talking about artificial intelligence and machine learning. We are not going to go into detail here about these principles and especially about machine learning, but rather to quickly discuss how they fit into Big Data data analysis. If you want to know more about these machine learning methods, we invite you to read our article Artificial intelligence.

Data in relational database systems are linked together by… relationships. It is therefore possible to extract data with a query language using these links. Big Data databases are much more massive and above all are generally unstructured or only semi-structured which means that there are no or few links between the tables or the data themselves. This is where the learning machine comes in… These algorithms will then be in charge of exploiting these masses of data in order to establish themselves these missing relationships to deduce correlations and new knowledge. This is the famous value we have already mentioned earlier.

On the other hand, the size of databases sometimes makes it difficult to process them by a single device and it is sometimes necessary to coordinate the work of several computers or servers in order to perform a specific task on the data, such as counting the number of occurrences of one or more strings in huge amounts of text data. Some tools such as Hadoop and its modules allow the coordination of computer resources to speed up the processing process. To do this, these tools will, for example, choose to divide datasets into multiple smaller databases that will be processed individually by each device, also called a node. Once the work has been done by each of them, the coordination tool will take charge of gathering the results in order to obtain them in the form that would have been its own if it had been done by a single device. In our simple example, we add the totals obtained by each of the nodes on its part. The role of the tool that coordinates the operation is also to allocate new processing resources if it notices that one of the devices dedicated to this task fails. This is of course a simplified example and the ways of carrying out these processes can be very diverse. The latter would not necessarily be suitable for all the tasks that can be performed on large datasets.

As you will have understood, the subject of Big Data is extremely vast, both in terms of the sources from which the ever-increasing amount of data is obtained and the applications that make it possible to exploit it, as well as the tools to carry out this processing. Even if some people consider Big Data to be unethical because it also gathers personal and private data among the mass of those collected, the benefits that can be derived from some of these data will certainly prove to be very important in fields such as ecology or medicine.

« Data is not information, information is not knowledge […] »

Clifford Stoll

Even if data is not knowledge, today, and this is the reason why more and more of us are collecting it massively, the economic model of digital societies is based on their acquisition. It is therefore highly likely that in the future, mastering the ability to collect data combined with the ability to use it correctly and extract relevant information will be one of the major challenges. In any case, the association of Big Data with artificial intelligence should be a source of major revolution for all of us in the coming years.

S’abonner
Me notifier les
guest
37 Comments
le plus récent
le plus ancien le plus populaire
Commentaires en ligne
Voir tous les commentaires
Binance - rejestracja
12 May 2024 5 h 28 min

Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me?

Diana3116
Diana3116
1 March 2024 20 h 50 min
Dorothy1156
Dorothy1156
1 March 2024 18 h 44 min
Gerald2288
Gerald2288
1 March 2024 15 h 59 min
Jacob4574
Jacob4574
1 March 2024 15 h 34 min
Fred1632
Fred1632
1 March 2024 15 h 33 min
Mona60
Mona60
1 March 2024 12 h 24 min
Charlie3212
Charlie3212
29 February 2024 23 h 04 min
Jace362
Jace362
29 February 2024 23 h 00 min
Marcus1138
Marcus1138
29 February 2024 22 h 12 min
Paige245
Paige245
29 February 2024 19 h 11 min
Heidi3677
Heidi3677
29 February 2024 6 h 58 min
Kimberly3372
Kimberly3372
29 February 2024 3 h 56 min
Bridget4061
Bridget4061
29 February 2024 3 h 47 min
Alondra3070
Alondra3070
29 February 2024 2 h 40 min
Alfred1608
Alfred1608
29 February 2024 0 h 19 min
Paige1407
Paige1407
28 February 2024 14 h 22 min
Lyla290
Lyla290
28 February 2024 8 h 26 min
Judy3462
Judy3462
28 February 2024 7 h 48 min
Frank4341
Frank4341
28 February 2024 4 h 34 min
Derrick4000
Derrick4000
27 February 2024 18 h 34 min
Daisy1841
Daisy1841
27 February 2024 15 h 46 min
Irma4350
Irma4350
27 February 2024 12 h 44 min
Jolene3390
Jolene3390
27 February 2024 12 h 29 min
Miguel2962
Miguel2962
27 February 2024 12 h 20 min
AeroSlim Weight loss benefits
27 February 2024 8 h 51 min

Ive read several just right stuff here Certainly price bookmarking for revisiting I wonder how a lot effort you place to create this kind of great informative website

Percy534
Percy534
27 February 2024 4 h 14 min
Dwight4615
Dwight4615
25 February 2024 20 h 52 min
fitspresso reviews
22 February 2024 23 h 18 min

Somebody essentially lend a hand to make significantly articles Id state That is the very first time I frequented your website page and up to now I surprised with the research you made to make this actual submit amazing Wonderful task

Lacey823
Lacey823
22 February 2024 22 h 11 min
puravive reviews
21 February 2024 3 h 32 min

I loved as much as youll receive carried out right here The sketch is attractive your authored material stylish nonetheless you command get bought an nervousness over that you wish be delivering the following unwell unquestionably come more formerly again as exactly the same nearly a lot often inside case you shield this hike

Adalyn2591
Adalyn2591
14 February 2024 1 h 41 min

Элвис Пресли, безусловно, один из наиболее влиятельных музыкантов в истории. Родившийся в 1935 году, он стал иконой рок-н-ролла благодаря своему харизматичному стилю и неповторимому голосу. Его лучшие песни, такие как “Can’t Help Falling in Love”, “Suspicious Minds” и “Jailhouse Rock”, стали классикой жанра и продолжают восхищать поклонников по всему миру. Пресли также известен своими выдающимися выступлениями и актёрским талантом, что сделало его легендой не только в музыке, но и в кинематографе. Его наследие остается живым и вдохновляет новые поколения артистов. Скачать музыку 2024 года и слушать онлайн бесплатно mp3.