Currently, although the existing biological heuristic algorithms have different heuristic objects, they all simulate the process of simple individuals collaborating to solve complex problems. Comparing with the existing ones, the new bio-inspired algorithms have more focus on parallelism and intersectionality, where the former means these new bio-inspired algorithms should have more effective optimization ability and the latter means combination of two or more biological heuristic algorithms such as immune genetic algorithm, genetic particle swarm optimization and so on.
In addition, with the development of big data and cloud computing technology, bio-inspired computation with intelligent emergence mechanism will also be widely used in human brain, evolvable software, cloud service network and other emerging areas.
The Research Topic welcomes submissions on the new bio-inspired technologies and their applications. Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements.
It is field which helps to tackle complex real problems using computational methods based on biology. Computational method of solving problems using algorithms inspired from biological processes. Bio-inspired computing is a field devoted to tackling complex problems using computational methods modeled after design principles encountered in nature Learn more in: Extraction of Protein Sequence Motif Information using Bio-Inspired Computing.
Find more terms and definitions using our Dictionary Search. Bio-Inspired Computing appears in:. Handbook of Research on Design, Control, and Search inside this book for more research materials. Recommend to a Librarian Recommend to a Colleague. Looking for research materials? Indicative content include evolutionary computing, cellular-inspired computing, swarm-based systems and neural-inspired systems full details to be confirmed.
Available to students in Computer Science only. Students must have existing coding skills in Python. Not available to students who have taken COM We define now the four aforementioned large families of bio-inspired computation methods and their connection to the above generic problems. Table 1 complements these explanations with an excerpt of the particular problems in the context of Big Data that such families can address, as well as the affected Big Data dimensions:.
Neural networks are computational models inspired by brain modeling studies. It consists of a set of units, called artificial neurons, connected together to transmit signals. The smallest unit of analysis of neural networks in the computational domain is what is called neuron or perceptron. An important feature of neural networks is their ability to learn from their environment. Neural networks have been widely applied on supervised, unsupervised, hybrid and reinforcement learning [ 61 ].
For this reason they have been extensively applied to modeling problems such as classification, regression or matching, as well as to simulation problems via unsupervised neural approaches such as Kohonen maps, auto-encoders, Hebbian learning and the like. Evolutionary Computation EC comprises a family of algorithms for global optimization inspired by biological evolution. Some recurrent ideas that have been used as inspiration up to now are, among others, the survival of the fittest, natural selection, reproduction, mutation, competition or symbiosis.
For properly emulating the processes involved in nature and the natural selection mechanism, candidate solutions are organized in a population, and the fitness function determines how good they are adapted to the environment in which solutions live. This fitness should be strictly related with the problem at hand, being proportional to the quality of the solution solving that problem. Most representative EC techniques, which differ in the way in which they represent and evolve individuals, are as follows: 1 genetic programming , in which individuals are represented as executable programs [ 62 ]; 2 evolutionary programming , phenotype-oriented [ 63 ]; 3 evolutionary strategies , which can be deemed as the evolution of evolution [ 64 ]; 4 differential evolution , population-based search strategy in which the modification of individuals is based on the difference between them [ 65 ]; 5 genetic algorithms , population-based techniques based on the Darwinian evolution of species theory [ 8 ]; 6 cultural evolution , adaptation to the environment at faster rates than biological evolution [ 66 ], 7 co-evolution , distribute solvers in which multiple subpopulations evolve in a joint way [ 67 ].
Up to now, EC has been applied in a wide spectrum of knowledge fields. For interested readers, we suggest the findings reported in works such as [ 68 , 69 , 70 ] for the analysis of recent research trend in some specific applications. Swarm Intelligence SI is a specific branch of Computational Intelligence also dedicated to the optimization of complex problems through the study and adaptation of the collective behavior of decentralized, self-organized agents.
This way, SI methods usually consist of a population swarm of simple agents, which evolve jointly along time through local interactions with one another, and with their environment. Furthermore, despite the interactions among individuals are determined beforehand, social interaction plays a key role in the resulting behavior of the swarm towards achieving a global objective. In other words, although every agent relies on local interactions impacting on the resulting behavior of the swarm, the global performance of the group simultaneously determines the conditions under which individual agents perform.
As previously mentioned, a wide spectrum of inspirational sources has been embraced over the last couple of decades for producing SI methods. We can highlight among such sources the behavioral patterns of animals such as bees [ 7 ], cuckoos [ 51 ], fireflies [ 71 ], or cats [ 72 ].
Other inspiring motifs for SI methods are physical processes, such as the electromagnetic theory [ 73 ], optic systems [ 74 ], or general relativity [ 75 ]. Social human behaviors have also served as inspiration for modeling novel metaheuristics, with renowned examples such as anarchic societies [ 76 ].
One of the main features that make SI methods specially efficient for solving optimization problems is their ability for distributing the optimization tasks, decentralizing in this way the evolution of solutions.
This feature makes them particularly appealing for their implementation in Big Data ephemeral environments, in which computation resources are intermittently available.
Other acknowledged differences of this optimization paradigm with respect to EC are the behavioral mechanisms by which the swarm evolves towards the best solution of the problem at hand, which are driven by one-to-one simple interaction rules rather than by population-based selection and crossover operators see Fig.
Fuzzy systems are specific mechanisms within Computational Intelligence which faithfully adapts to the human reasoning model and to the real-world. This logic introduces a better understanding of clauses of the type it is hot , it is high or it is fast. In this context, the term fuzzy refers to the fact that the logic involved can deal with concepts that cannot be expressed as true or false , but rather as partially true.
For reaching this goal, the core concept of fuzzy systems is to understand the quality quantifiers for inferences and human reasoning. In this way, fuzzy systems are usually used as mechanisms inside other methods, but also as monolithic methods. Up to now, many real-world applications have been benefited from these paradigms, mainly control optimization , prediction modeling and decision support [ 77 , 78 , 79 ].
This section is devoted to presenting and describing the main synergies between both paradigms studied in this paper: Big Data and bio-inspired computation. Several reviews and surveys have so far addressed this intersection from different perspectives, domains or applications. Table 2 summarizes the essential information of such works carried out during the last two years, including the period of time covered by the articles analyzed in it, the number of reviewed works, the proposal of a taxonomy to organize them, the phases of the Big Data life cycle covered, families of bio-inspired algorithms under scope and, finally, whether a critical analysis, challenges and research directions are given.
The comparison made in these terms with the present work reveals several aspects of improvement:. A self-contained introduction to the concepts underneath Big Data and bio-inspired computation Sect.
A significantly higher number of reviewed works , which nearly triples the amount of references considered in other surveys alike. In our case, we use a three-fold criterion when designing our taxonomy: Big Data infrastructure, Big Data technologies and Big Data life cycle phases.
A more extensive taxonomy to classify the works under analysis in terms of the families of bio-inspired algorithms used in every reviewed work. A critical analysis dissecting what has been done so far in the field, along with a set of future challenges that are tightly connected to bio-inspired computation and Big Data, avoiding to fall into generalistic formulations.
For this purpose, two separated biases have been used: i the adoption of bio-inspired computation for modifying different technologies of the Big Data stack, in terms of infrastructure and life cycle technologies; and ii the evolution of bio-inspired algorithms adapted to the Big Data life cycle and its features, such as programming models and Big Data volumes. To this end, we divide the analysis into three subsections.
The first two are associated with Big Data infrastructure Sect. Section 3. Figure 5 summarizes the recent literature noted in the field, in which the combination of these technologies has reported remarkable performance and efficiency gains so far.
Generally, Big Data platforms can be deployed into two different kinds of infrastructures: on-premise or in the cloud [ 82 ]. Furthermore, a third approach hybridizing these two concepts is also possible.
The existence of these types makes necessary the existence of tools for the systematization of the deployment, used as a guide for the system administrator. In this specific point is where the optimization capabilities of bio-inspired computation solvers acquire relevance, allowing for the automatization of these tasks in an efficient fashion.
The main goal of the system administrator is to achieve a smart system management, which can lead to significant improvements in resource usage, such as provisioning; virtualization and allocation [ 41 ]; scheduling and optimization; balancing and reservation; and anomaly detection, among many others.
In this regard, it should be clarified that resources are conceived as the elements that make up the infrastructure, such as virtual machines, containers, network elements, physical servers or computer nodes. Analytical models are nowadays deployed on hybrid, volatile, highly scalable and rapidly reconfigurable resources. It is within this complex ecosystem of computation technologies where it becomes essential to ensure that systems and processes meet the aforementioned capabilities, paving the way for bio-inspired computation to become an enabler for this purpose.
To properly categorize the analysis of the study, we follow the previously mentioned classification, which is the most commonly used within the Big Data context: on-premises infrastructures Sect. Taxonomy of works related to the application of bio-inspired computation to the big data domain, classified as per the different application areas under consideration. Briefly explained, on-premise regards to the software and technology located within the physical confines of an organization.
This concept opposes running the system remotely on hosted servers or in the cloud. Thus, by installing and running software on hardware located within the premises of the company, full physical access to the data is available.
Furthermore, the configuration, management and security of the computing infrastructure can be carried out directly in the system. Regarding the configuration and management, bio-inspired computation can resolve problems related to task allocation and resource scheduling.
In [ 83 ], for example, the authors present an approach based on distributed SI mechanisms that mimic the behavior of social insects to solve problems such as overlay management, routing, task allocation, and resource discovery. Through this approach, the authors of [ 83 ] construct an adaptive and robust management system for peer-to-peer networks.
The use of Graphics Processing Units GPUs and cluster-based parallel computing techniques is also a research trend, aiming at accelerating the process of extracting the correlations between items in sizeable data instances. In [ 84 ], for instance, authors propose four different population-based metaheuristics for efficiently mining association rules, which benefit from the cluster intensive computing and massive GPU threading.
On another vein, a special case of Big Data on-premise infrastructure is the so-called High Performance Computing HPC, [ 85 ] , which refers to hardware and programming models specialized in solving highly complex problems mainly via parallelization. In this sense, using HPC solutions requires new techniques for memory management.
An interesting recent survey published by Pupykina et al. In the security context, referring to the application level security as well as advanced protection against malware, the paper presented by Mthunzi et al. It is also interesting the work of Rauf et al. Lastly, another totally different approach can be found in [ 88 ], where several management problems related to the increase in complexity and the need for energy are addressed in detail.
For achieving the planned objectives, a bio-inspired self-organized technique is proposed for the redistribution of load among servers in data centers. Reflecting on the activity noted so far on bio-inspired computation applied to the design, management and operation of on-premise Big Data infrastructures, we stress on the lack of informed evidences whether bio-inspired algorithms can meet realistic complexity scales of large computing farms.
Furthermore, even if resource utilization does not vary as dynamically as in other alternative shared computing environments, most works reviewed in this strand of literature do not inform about the latencies induced by the usage of bio-inspired methods for, e. This criticism mostly refers to optimization methods: Biologically inspired modeling solutions suited for their deployment over Big Data infrastructure are far more mature than their optimization counterparts.
In few words, Cloud Computing infrastructure can be defined as the collection of hardware and software elements needed to enable the remote management of the whole Big Data system. These concepts include computing power, networking and storage. It also contemplates an interface for users to access their virtualized resources, like cloud management software, deployment software and platform virtualization.
In the Big Data context, the ability of Cloud Computing to offer fully scalable technical resources adapted to the needs of each project is crucial. Thanks to that, limitations of traditional physical servers are avoided. However, appropriate management tools are needed in order to efficiently take care of tasks such as resource virtualization or services deployment optimization.
In the current literature, works in this line of research can be classified into two main strands: i approaches related to the resource provisioning and allocation in Cloud Computing environments, and ii tasks related to the deployment, planning and optimization of services and applications:. On the one hand, the allocation and scheduling of multiple virtual resources, such as virtual machines VMs , is a well-known research field in Cloud Computing.
In [ 89 ], for example, a Genetic Algorithm is proposed for the optimization of VM distribution across a federated cloud. Similar is the approach followed by Rocha et al.
This way, the energetic efficiency and network quality of service are jointly optimized. More recent is the work presented in [ 91 ], which solves the same problem by means of an ant colony system.
In addition, the research introduced in [ 92 ] hybridizes a Firefly Algorithm with fuzzy logic for server consolidation and VM placement in cloud data centers. Also interesting is the study presented in [ 93 ], which focuses on Hadoop Big Data technology. In that work, authors implement a bio-inspired solver for optimizing the placement of VMs in OpenStack.
In [ 94 ], Pires et al. Additionally, in [ 95 ] an Ant Colony Optimization and dynamic forecast scheduling is combined for solving the VM placement problem, showing a remarkable efficiency in terms of less wasted resources and better load balancing.
Finally, an interesting approach based on Cuckoo Search is proposed in [ 96 ] for data center resource provisioning in the cloud. On the other hand, task scheduling over distributed and virtual resources is a main concern which can affect the performance of Big Data system. In [ 97 ], a meta-heuristic algorithm called Chaotic Social Spider Algorithm is developed for solving the task scheduling problems in virtual machines.
The authors of this work focused on minimizing the overall makespan, while leveraging load balancing. Additionally, in the survey presented in [ 98 ], different bio-inspired approaches are analyzed for tackling the aforementioned problem. A work closer to Big Data technologies is conducted in [ 99 ], in which authors theorize on how the Map Reduce programming model performs the assignment of tasks in Cloud Computing environments.
This analysis is carried out by resorting to assorted algorithms, including bio-inspired techniques. It is also worth mentioning that one of the key goals in cloud environments is the optimal use of resources, for which load balancing techniques are often applied. This has been a particularly profitable playground for bio-inspired optimization techniques, yielding extensive surveys such as the one in [ ], which provides a wide coverage of nature-inspired meta-heuristic techniques applied in the area of cloud load balancing.
In this line [ ] addresses the problem of load balancing in cloud environments by proposing a hybrid Cuckoo Search and Firefly Algorithm, showing a promising performance. An additional approach for load balancing is described in [ ], focused on both Fog and Cloud Computing environments. The authors compare the performance of several bio-inspired computation methods, including Cuckoo Search, Flower Pollination and Bat Algorithm.
Our review of the literature related to Cloud Computing infrastructure has revealed that in most cases, the conditions under which algorithmic proposals are validated are largely uncoupled from the constraints and computation budgets that such algorithms would encounter in practical settings.
This criticism refers not only to the scales by which, e. Furthermore, very scarce to null attention is paid to the efficiency of the bio-inspired algorithm itself, mainly due to the simplicity of the simulation settings under which algorithms are validated.
We advocate for a closer look taken at the implications of using bio-inspired algorithms, taking a step aside common practice, and informing the community of bio-inspired methods that can truly be adopted under computation-intensive regimes. As mentioned, hybrid infrastructures comprise a blend of private clouds, public clouds and on-premise data centers.
Thus, Big Data systems and applications can be deployed on any of these environments, depending on several business strategies, such as the main objective of the system, its tactical requirements and the required outcome. This is the case for heterogeneous distributed systems, in which environments and resources such as cluster computing, grid computing, peer-to-peer computing, cloud computing and ubiquitous computing are mixed [ , ].
This particular scenario brings the necessity of efficiently managing a large variety of tools and software. This need motivates the development of new algorithms schemes for events and tasks scheduling.
Thus, new methods for resource management should also be designed for increasing the performance of such systems. In [ ], for example, a valuable survey is presented revolving around the advances on scheduling algorithms, energy-aware models, self-organizing resource management, dataware service allocation, Big Data management and performance analysis.
All this analysis is conducted from the perspective of bio-inspired computation. In [ ], a review of biological concepts and principles to solve service provisioning problems is presented, along with the proposal of a bio-inspired cost minimization mechanism for data-intensive scenarios where such problem emerges.
The proposed method utilizes bio-inspired mechanisms to search and find the optimal data service solution in Big Data environments, considering data management and service maintenance costs. Finally, in [ ], a preliminary work is presented on the deployment of evolutionary algorithms on Hybrid Big Data infrastructures.
To do that, authors widen the functionality of the well-known ECJ tool [ ] for fulfilling their purpose. On a short reflexive note, here we foresee an increasing prevalence of bio-inspired algorithms capable of bringing together multiple conflicting objectives.
Such objectives emerge as a result of the hybridization of different infrastructures, both private and public, which may have some goals in common e. This paves the way towards a magnificent opportunity for multi-criteria decision making algorithms suited to deal with multiple confronted objectives, such as multi-objective meta-heuristics.
Our examination of the literature uncovers that this is a niche of opportunity that should attract more efforts in the near future. We finish this subsection turning our attention towards a particularly significant element within the infrastructure: the network.
In fact, different computing models can configure their operation based on the network topology and the associated communication latency. Examples of these models are Fog [ ] and Edge Computing [ ].
In this area, there are multiple open opportunities and a wide room for improvement, by means of optimization techniques used for orchestrating the deployment of elements depending on the features and distribution of the network. It is in this specific stream in which bio-inspired algorithms can emerge as an efficient approach for the aforementioned orchestration.
For instance, in [ ] a scheduling method for application modules in a fog computing environment is proposed using bio-inspired solving schemes such as Genetic Algorithm, Particle Swarm Optimization and Ant Colony Optimization for the reduction in the energy consumption and execution time.
Another cornerstone task related to the infrastructure network is the security in communications. For this problem, bio-inspired algorithms can also be very useful, as shown in [ ].
In that paper, authors propose a semi-class intrusion detection method which combines multiple classifiers to arrange exceptions and typical exercises in a computer system. Another axis of interest is the scalability of the network, which is also an aspect of utmost relevance in Big Data scenarios. In [ , ], for example, authors propose and utilize a framework that supports simulation and testbed experiments to investigate the scalability and adaptability of ant routing algorithms in networking.
In this application area, there is a notable inertia towards the use of bio-inspired techniques for network security purposes. However, Big Data networks, stricto sensu , has so far not been risen much interest in the use of bio-inspired computation to address inherent problems such as latency minimization, routing or network dimensioning.
We nevertheless envision that the extrapolation of the Big Data paradigm towards ephemeral computing will span further opportunities due to the intermittency of the network, the variability of task completion schedules and the uncontrolled availability of computation nodes. It is only under these circumstances when the complexity of governing ephemeral computing resources will require the flexibility and adaptability granted by bio-inspired computation.
The fast evolution and the emergence of new technologies in the Big Data stack, along with the adhesion of a growing number of organizations to this paradigm, causes the appearance of new challenges and opportunities in this field. Usually, these challenges are associated with the development, management and operation of new functionalities. In this regard, one of the essential aspects related to the Big Data technology stack is the non-functional requirements that the solution and tools need to consider.
Singh et al. Based on these six criteria, we can classify Big Data tools into three large groups [ ]: NoSQL databases , parallel and distributed programming models and ecosystems of tools. We now analyze them in detail:. In a nutshell, a NoSQL [ ] database provides a mechanism for the storage and retrieval of data, which is modeled in means other than the traditional tabular relations used in relational databases. This kind of database presents different points of improvements which can be addressed through the application of bio-inspired algorithms.
Some of these applications are related to the horizontal scalability choice of cluster topology , availability and replication of the data assignment of the replicas to the nodes , or the consistency level of the information ensuring the writing optimization , among many others. In [ ], for example, authors present a framework that allows Hadoop to manage the distribution of the data and its placement based on cluster analysis of the data itself.
This work is not directly related to NoSQL databases, but it arguably represents an interesting approach for optimal data distribution in physical storage using evolutionary clustering techniques. The paper presented by Nowosielski et al. In the specific context of data availability and replication, the work published in [ 14 ] presented an adaptive distributed database replication technique based on the application of an algorithm based on colonies of pogo antsis.
An additional valuable research can be found in [ ], in which the Firefly Algorithm is applied for the positioning and optimization of traffic in NoSQL database system, modeled with exponentially distributed service and vacation.
Bio-inspired computation can also contribute to the design of the logical data schema. The research presented in [ ] is an example of this trend, proposing a design repository for storing and retrieving biological and engineering design strategies.
Another interesting investigation is also shown in [ ], in which a data warehouse schema design optimization is optimized by means of a Particle Swarm Optimization approach. In [ ] a mathematical model of a column-oriented database performance was presented. Authors propose the use of Flower Pollination Algorithm for regression equation coefficients optimization. Furthermore, they highlight its accuracy and sophistication, which makes it appropriate for the foundation of database performance optimization.
Another highly relevant field of study combining NoSQL databases and bio-inspired computing is the so-called query optimization [ ].
The work presented by Rani et al. The same author presents in [ ] a study revolving around the distributed query processing optimization based on artificial immune systems, which is among the few references identified so far where immune systems have been utilized in Big Data scenarios.
Furthermore, there are situations in which bio-inspired techniques assist in the extraction of association rules over databases, as can be seen in [ ]. In that study, authors showcase an approach for extracting association rules by applying a Bee Swarm Optimization meta-heuristic algorithm to a large database using the massively parallel threads of a GPU processor. An additional valuable approach is proposed in [ ] for association rule mining, in which the JAYA algorithm is applied to big database instances.
Finally, an additional possible viewpoint can also be highlighted in this section, which evinces even further how bio-inspired optimization methods can take advantage of NoSQL technologies. This is the concrete proposal of Jordan et al. In this paper, authors showcase how a system benefits from optimization knowledge persisted on a NoSQL database, serving as associative memory to better guide the optimizer through dynamic environments. This supports our claim that bio-inspired computation can not only benefit non-conventional databases, but can also leverage conversely the storage capabilities of such databases to store history information that can be retrieved and exploited by the bio-inspired algorithm upon requiring it, as in, e.
This synergy is worth to be explored further by prospective studies around recurrent evolving learning environments. The significant rise of distributed and parallel processing techniques has dramatically transformed the use case landscape, improving existing levels of processing performance. In this context, two clear approaches can be spotted: batch programming models and those adapted to real-time or streaming environments.
As in other situations discussed before, problems arising in these two scenarios can be tackled through the perspective of bio-inspired computation. On the one hand, regarding batch parallel programming models, two main challenges can be found: i improvements over existing programming models such as MapReduce [ ] , or ii the development of new improved computing approaches under bio-inspired computation techniques.
In the first case, we find interesting works such as [ ] and [ ]. In those studies, the former introduces improvements into the programming model regarding the efficient distribution of tasks, whereas the latter showcases more precise locations of the distributed data.
Another remarkable research work can be found in [ ], which provides a Big Data scheme based on Spark to handle highly imbalanced datasets. They successfully validated their approach over several datasets composed of up to 17 million instances. In [ ], Hans et al. It is interesting to highlight also the work presented in [ ], where authors focus on the Cloud Computing paradigm with emerging programming models, such as Spark, to prove how several parallel differential evolutionary algorithms can perform well in this situation.
Obtained outcomes demonstrate the existence of a competitive speedup against serial implementations, along with a remarkable horizontal scalability. Finally, we can find new programming models such as the one proposed in [ ], in which a new approach to deploy computing intensive runs of enterprise applications on Big Data infrastructures is presented.
On the other hand, a streaming system can be referred to as real-time if it guarantees a response within tight deadlines. Furthermore, depending on the specific context of the application, tight times can be a matter of minutes, seconds, or even milliseconds.
Nowadays, due to the velocity dimension of Big Data, these systems are cornerstones of the technology stack in the treatment of large volumes of data, and they can take advantage from the characteristics of bio-inspired computation, such as its speed and efficiency when solving complex problems. An additional example for supporting this statement can be found in [ ], in which a new approach to stream computing is introduced.
For achieving online optimization and scheduling, a particle swarm optimization algorithm hybridized with back-propagation and an immune clonal algorithm are used in that work. Lastly, we pause at the term Organic Computing [ ], which behaves and interacts with humans in a bio-inspired manner.
0コメント