Skip to content

1 Simulation and Modeling of Edge and Mist Computing: A Background

Charafeddine Mechalikh edited this page Oct 18, 2022 · 1 revision

1 Introduction

As IoT becomes more prevalent in our daily lives, we anticipate a rapid proliferation of connected devices. Along with this proliferation, edge computing and its associated models, including mist computing, fog computing, cloudlets, etc., are believed to be viable alternatives for handling the sheer amount of delay-sensitive and security-critical data generated by those devices.

This chapter outlines relevant concepts with regard to the modeling and simulation of edge computing. An overview of this computing paradigm, the rationale behind it, and its similarities and differences with respect to the many related computing models are given in Section ‎1.2. It also helps to understand better how these computing models may benefit both the future and the current landscape of connected devices. Finally, the fundamental notions and definitions of simulation and modeling, in general, are discussed in Section ‎1.3.

1.2 Edge Computing and Its Related Computing Models

This section compares cloud computing with edge computing and its associated computing models to show their viability in various use cases. It also helps to understand better how they fit the present and upcoming landscape of connected devices. We discuss all these computing models in chronological order and point out how some of them have led to the appearance of others.

1.2.1 Cloud Computing

Cloud computing was a huge success on today’s Internet. It has opened up many opportunities for businesses and individuals by providing computing, storage, networking infrastructure, and services. It is defined by the National Institute of Standards and Technology (NIST) as a computing paradigm that offers omnipresent remote access to pooled resources upon demand (Mell & Grance, 2011). A cloud data center is a massive collection of virtualized resources that are highly accessible and reconfigurable to meet dynamic workloads, enabling the delivery of cloud services on the pay-as-you-go pricing strategy (Armbrust et al., 2010; Vaquero et al., 2009). Through this pricing model, users can easily access resources and services whilst paying by usage. These cloud-based resources are hosted in massive data centers, provisioned by vendors like Amazon, Google, Microsoft, and IBM.

Cloud provides multiple service models from which developers may choose one based on their application requirements. The first model is Infrastructure-as-a-Service (IaaS). It enables cloud users to benefit from direct access to computing infrastructures for computation, networking, and storage (Dillon et al., 2010). If a user wants to build an IoT-based system to monitor and control greenhouses, he or she heads to a cloud provider to acquire an IaaS for his or her system. He or she can then configure the IaaS, which is often proposed as a stand-alone virtual machine (VM), according to his or her need. Infrastructure control (IaaS) enables him or her to customize hardware setup, like RAM capacity and the number of CPU cores, as well as systems-level software. The second is Platform-as-a-Service (PaaS), which in contrast, enables applications development by providing full support for software lifecycle, typically through middleware, for managing and configuring software. When a user is not interested in configuring the infrastructure, managing, and customizing software and hardware can hamper his or her productivity. Now, he or she could contemplate using PaaS proposed by any cloud provider for his or her business. PaaS takes care of the low-level supporting processes and lets him or her only manage software specific for his or her IoT interactions. In addition, PaaS providers generally propose solutions to manage databases and scaling applications easily.

image

Figure ‎1.1 Cloud services classification based on the application stack

Finally, the last one is Software-as-a-Service (SaaS). Let us imagine that the same user is ready to invest extra money to get complete software, but he or she is not interested in dealing with software issues, including socket management, database scalability, etc. SaaS offers an environment that allows him or her to centrally host his or her application without having to install the software manually. These scenarios show that cloud services may be used for a multitude of use cases by different users. The relationship between these service models and the supporting cloud infrastructure is shown in Figure ‎1.1, which indicates the layer of the application stack maintained by the cloud provider.

image

Figure ‎1.2 Cloud provisioning based on the application demand. Adapted from (Armbrust et al., 2010)

1.2.1.2 Cloud Resource Provisioning

Due to the variation in demands, fixing the amount of cloud resources leads to either over- or under-supply (see Figure ‎1.2). The principle of cloud computing lies in providing solely the resources needed to satisfy the demand. This involves employing virtualization to deploy applications on-demand and resource provisioning to control both software and hardware in the data centers. Resource provisioning represents a major subject of research in cloud computing. The difficulty to estimate service usage by clients made cloud vendors adopt a pay-as-you-go payment strategy. Consequently, cloud providers will have more flexibility over the way to provide resources, while users only pay for what they use.

1.2.1.3 Types of Cloud Deployments

Different kinds of clouds exist: the private cloud, the community one, the public one, and the hybrid one (Mell & Grance, 2011). The public cloud designates the typical cloud computing model, in which services are provided by cloud vendors like Microsoft, Google, IBM, and Amazon. The cloud is said “community cloud” when it is operated by a group of users whose infrastructure and ownership are shared among many organizations within the community while not depending on any of the major cloud providers for the computing infrastructure. The private one, on the other hand, is intended to be used by a single entity and guarantees a high level of configurability and confidentiality. It is suitable for any organization that needs an infrastructure for its applications. It is comparable to conventional server farms owned by enterprises. In contrast to a public cloud, the private cloud is less popular, cost-prohibitive, hard to maintain, and usually lacks the pay-as-you-go pricing model. Nevertheless, it offers users full customization of security settings, middleware, network, and hardware. The hybrid cloud is merely a mix of the above-mentioned types. It offers clients a greater degree of control over the virtualized infrastructure. This combination of the capabilities of different deployment types can be achieved via proprietary or standardized technologies (Sotomayor et al., 2009).

The principle of cloud computing is giving users remote access to computing resources. Although it has contributed to the advent of accessible computing, the delay needed to access its applications is too long, which is impractical for delay-intolerant applications. Moreover, the growth of devices and the data they continually generate calls for bringing resources closer to the source of data. The increased need for high-bandwidth, geo-distributed, low-latency, and a privacy-aware way to process data, is driving a critical necessity for a new computing paradigm positioned close to connected devices. A paradigm that has been suggested by academia and industry to solve this issue is fog computing (Bonomi et al., 2012; OpenFog Consortium, 2017). The next subsection goes into detail about it.

1.2.2 Fog Computing

Unlike the remote cloud, this emerging computing model brings cloud applications and services close to IoT devices by allowing network nodes to compute, store, and manage data. As a result, these operations will no longer be limited to the cloud but rather will be performed all the way from edge devices to the distant cloud, through which data passes. Fog computing, as stated by the OpenFog Consortium (2017), is “a horizontal system-level architecture that distributes computing, storage, control, and networking functions closer to the users along a cloud-to-thing continuum”. The horizontal fog platform enables the distribution of computing functions from one location to another, as well as other functionalities for pooling, management, and security of distributed resources, unlike vertical platforms that only support siloed applications (Zhang, 2017). Apart from providing a horizontal architecture, the flexibility of fog computing allows it to respond to the data-driven needs of users and carriers, making it a strong driver for IoT.

1.2.2.1 Fog vs. Cloud

One of the recurring arguments for distinguishing fog from cloud computing is the support of delay-intolerant applications without sacrificing QoS. Because fog nodes are placed in the vicinity of IoT devices, latency can be significantly lower in comparison to the cloud. Although such an argument can justify the adoption of fog computing, delay-sensitive applications represent only one of the several reasons behind its necessity. Unlike centralized cloud data centers, fog nodes are typically deployed in large numbers and in a geographically distributed manner, which also enables:

  1. Context-awareness: It stands for the use of information of any kind to gain insight into the situation of a particular entity (whether it is a thing, a person, or a place). Such contextual knowledge provides outstanding support for the processing and storage of big data and facilitating their interpretation. Thus, the effective implementation of services.

  2. Location-awareness: where the location may determine how certain processes operate.

Another difference is that security in fog computing has to be ensured at the network periphery or at fog nodes locations. From a hardware perspective, fog computing uses small servers, access points, gateways, switches, or routers, that offer reasonable availability of computing resources and consume less energy, while cloud computing employs large data centers that offer greater resource availability but at comparatively higher energy consumption (Jalali et al., 2016). This makes fog computing hardware significantly less cumbersome and takes up smaller space than the cloud one, allowing it to be placed nearer to the user. Because fog computing aims to bridge the gap between IoT and the distant cloud, it is accessible throughout the network from the core to the edge, unlike cloud computing that is only accessible from the network core (Yousefpour et al., 2019). Furthermore, having a continuous Internet connection is not a prerequisite to operate fog services. Services can still operate seamlessly even with a limited or totally absent Internet connection by sending important updates to the cloud as soon as the connectivity is restored, in contrast to cloud computing, in which devices must stay connected when the service is running.

1.2.2.2 Fog–Cloud Federation

It is true that there is a clear line and several compromises between fog and cloud computing, making it difficult, if not impossible, to choose between the two. In fact, fog and cloud are not meant to replace each other but rather to be mutually complementary (Jiang et al., 2019). Together, they can further enhance the services used by IoT devices. This collaboration can also improve data collection, processing, and storage capabilities. As an example, in stream processing applications, the aggregation, filtering, and pre-processing of sensor data can take place at the fog level, while complex data analytics or archival results can be forwarded to the cloud. This cooperation can be realized via a workload orchestrator. The latter provides a set of interoperable resources, deploys and schedules resources for application workflows, and controls QoS (Santoro et al., 2017; Sonmez et al., 2019; Yousefpour et al., 2019). Finally, using SDN will give fog providers more control over network configurations that involve many computing nodes carrying data from IoT devices to the cloud.

1.2.2.3 Fog RAN

To meet the growing demands for fast data applications and the massive access of IoT devices, a set of performance criteria has been defined for the fifth generation (5G) mobile communication. Specifically, 5G is expected to be capable of handling up to one million connections per square kilometer, and the system capacity will increase by a thousandfold over the fourth generation (4G) system to provide a consistent experience for diverse scenarios. Driven by the need to improve network architecture, a new computing model called F-RAN has emerged. The latter consists of coupling radio access networks (RANs) with fog computing. It uses its computing resources to cache content at the network edge, allowing it to be quickly retrieved and reducing the front-haul load. F-RAN can be realized using mobile technologies related to 5G (Hung et al., 2015).

1.2.3 Mobile Computing

Advances in fog computing are largely due to the foundations laid by the rise of mobile computing (MC). Also referred to as “nomadic computing”, it covers scenarios in which computing is carried out by mobile and portable devices, including smartphones, tablets, or laptops (Yousefpour et al., 2019). It can enable the creation of ubiquitous context-aware applications like GPS navigation systems.

The principle of adaptation in an environment with limited computing capacity and unreliable network connectivity is the core of mobile computing. Most of these issues (i.e., mobility of users, network heterogeneity, and limited bandwidth) have been covered in early works prior to the turn of this millennium. Among the approaches used to overcome these difficulties are compression algorithms, transmission hardware and protocols, and robust caching (Forman & Zahorjan, 1994). Yet, the evolution of mobile devices and their requirements has made mobile computing unsuited to many of today's computing challenges. The introduction of cloud and eventually fog computing has taken computing beyond the confines of a local network. These computing models have broadened the reach and the scale of mobile computing. While the latter relies solely on resource-constrained mobile devices, cloud and fog computing utilize powerful hardware with virtualization capabilities. Fortunately, this gap has been significantly narrowed over the past few years with the tremendous progress in mobile hardware and communication. Distributed applications can certainly exploit the distributed architecture of mobile computing. Nevertheless, it presents many obstacles, including resource constraints, the trade-off between interdependence and autonomy (which does exist in every distributed architecture), the latency of communication, and the necessity for adaptation that is imposed by the evolving environment (Satyanarayanan, 1996). Due to these shortcomings, MC is unsuitable for modern IoT applications that demand high reliability and ultra-low latency.

1.2.4 Mobile Cloud Computing

Cloud maturity has led it to become a vital partner for mobile computing. Combined, they have given rise to mobile cloud computing (MCC). According to NIST, MCC is concerned with next-generation infrastructures that exploit the cooperation between Mobile devices, IoT devices, and cloud computing (NIST, 2016). These are infrastructures in which data processing and storage takes place beyond mobile devices, allowing the applications of mobile computing to address a far wider spectrum of mobile users instead of being confined to smartphone users (Dinh et al., 2013). These applications can therefore be partitioned in such a way that computation-heavy tasks are forwarded to the cloud, enabling mobile devices to run their CPU-intensive applications while saving their energy, opening the doors to new kinds of CPU- and data-intensive applications such as e-health, augmented reality, and crowdsourcing (Ren et al., 2015; Sanaei et al., 2014).

1.2.6 Edge Computing

Like fog, edge computing moves storage and processing to the network edge near connected devices. It should be noted that the edge is by no means situated on IoT devices. Instead, it is one or sometimes a few hops away from them. Open Edge Computing Initiative (2019) defined edge computing as a computing model that provides storage and computing resources close to users via small-scale data centers and through open, standardized mechanisms. While it may seem similar to fog computing in the sense that both push processing and storage to the network’s periphery, these computing models are quite different. The Open Fog Consortium stated that edge computing is sometimes mislabeled as “fog computing”. According to this consortium, fog computing is hierarchical and aims to achieve a seamless continuum of services by providing computing, storage, networking, and control along the data path from the things to the cloud, in contrast to edge computing that views the network edge as an isolated system (see Figure‎ 1.3).

image

Figure ‎1.3 The deployment location of edge computing and the related models

Edge computing has become vital in today’s IoT landscape as it enables intelligent aggregation, filtering, and preprocessing of data through cloud services that are deployed near IoT devices (Reale, 2017). Latency, privacy, and connectivity are among the concerns that edge computing is best positioned to address. Owing to its location near the source of data, it guarantees extremely low latency compared to mobile cloud computing, provided its resources are sufficient. Should this not be the case, latency can become even higher. On the other hand, and despite being small, edge computing data centers will only serve a small portion of devices. This mainly entails that end devices will not wait for a service to be provided by a highly centralized platform and also that they are not limited in terms of resources as in traditional mobile computing. For these reasons, the service availability in edge computing is expected to be higher compared to the aforementioned paradigms. Adding to this, the fact that the capabilities of edge computing can be enhanced through the cooperation between edge data centers and the distant cloud (Garcia Lopez et al., 2015).

1.2.7 Multi-Access Edge Computing

Like MCC, which arose from the merging of cloud and mobile computing, combining mobile and edge computing has given rise this time to multi-access edge computing (MEC). MEC is defined by ETSI as a platform that brings computational capabilities, and cloud services into the 4G or 5G RAN, near mobile subscribers (Giust et al., 2018). MEC was formerly known as “mobile edge computing” before the expansion of this computing model to encompass a wider spectrum of applications that exceed tasks specific to mobile devices. Among these applications are the augmented reality, health monitoring, connected vehicles, and video analytics (Dong et al., 2020; Sonmez et al., 2019; Taleb et al., 2017). Through MEC, operators can embed edge computing functionalities into existing base stations using small data centers with virtualization capacity. The supporting hardware makes its resources moderate compared to cloud computing and mobile cloud computing. However, this deployment of small data centers near mobile subscribers has many strengths, such as the support for latency-critical applications. The distribution of its computing nodes (i.e., its small-scale data centers) also enables context- and location-aware applications to take advantage of real-time information and provide a personalized and adapted user experience. As with edge computing, MEC services can operate with a limited or completely absent Internet connection. Multi-access edge computing is also expected to profit significantly from the emerging 5G technology. 5G is regarded as a driver of MEC given the substantial reduction in latency and the increase in bandwidth, as well as the support it brings for a wider range of mobile devices (Hu et al., 2015). MEC, therefore, enables efficient access to edge computing by a wide variety of mobile devices (Taleb et al., 2017).

In addition to 5G, MEC has embraced software-defined networking (SDN) and network function virtualization (NFV) functionalities. SDN makes it easy to manage virtual networking devices by means of software APIs (Kadiyala & Cobb, 2017). NFV, on the other hand, reduces the time to deploy networking services by means of a virtualized infrastructure. As a result of incorporating them, developers and network engineers can deploy customized orchestrators to coordinate resource provisioning (Mirkhanzadeh et al., 2018).

1.2.8 Cloudlet Computing

Cloudlet computing (CC) is basically another branch of mobile computing. It happens to share multiple characteristics with MCC and MEC and actually solves several drawbacks of MCC. Cloudlets (or miniature clouds, as this term implies) are trusted, highly capable computers or clusters of computers that feature strong Internet connectivity (Satyanarayanan et al., 2009). In some literature, they are occasionally termed small or micro data centers (MDC) (Bahl, 2015; Qiu et al., 2017). This concept of MDC was first introduced in 2015 by Microsoft Research and has been referred to as the extension of conventional cloud computing data centers (Bahl, 2015).

Cloudlets form the mid-layer of a three-layer architecture (i.e., located between mobile devices and the cloud). The intent is to enable mobile devices to offload computation to VM-based cloudlets that are typically one hop farther (Hao et al., 2017), which also allows cloudlets to provide local services for mobile clients (Y. Li & Wang, 2014). Cloudlet carriers might be cloud service providers that are seeking to have their services instantly accessible in proximity to mobile devices. Networking and telecommunication companies, as well as mobile carriers (e.g., T-Mobile, Ericsson, etc.), can roll out cloudlets with virtualization capability in much smaller hardware footprints than the massive cloud data centers. Although this small footprint may seem like a downfall, cloudlets are meant to serve only devices located in their area, which lowers latency and power consumption. Besides, computations can always be offloaded to the cloud if needed (in the case of delay-tolerant applications or when an overload occurs). While both of them support mobility, compared to cloudlets, fog computing remains a more generic option that handles high traffic whose resources reside all over the thing-to-cloud continuum.

1.2.9 Mist Computing

Pushing computing further down towards the outer network’s edge (i.e., towards connected devices themselves) has introduced another computing paradigm, mist computing, that focuses on future autonomous systems (Davies, 2014; Preden et al., 2015). Mist computing takes place at the bottom layer of the IoT architecture and is aimed at leveraging IoT devices such as wearables, mobile devices, and smart TVs for computation, storage, and networking. It can be seen as a generalization of MACC because those devices are not necessarily mobile, and the networking is not confined to ad hoc. Silva et al. (2017) have examined how utilizing mobile devices for computing and caching decreases the burden on network infrastructures for video streaming applications. In their use case scenario, sporting event viewers are divided into multiple Wi-Fi-Direct groups and share video sequences when feasible, allowing them to rely less on the host server as well as the access points. This work provides a good illustration of mist computing, where edge devices behave like clients as well as serve as tiny servers. Additional use cases of mist computing include, but are not limited to, the efficient deployment of virtualized instances on single-board computers (Morabito, 2017) and preserving user privacy through processing their data locally (Salem & Nadeem, 2016).

1.2.10 Discussion

The above presentation of edge computing and its associated computing models highlights the value of comprehending their key characteristics in the face of the rapidly evolving computing landscape. These are the characteristics that need to be considered in order to model these environments realistically (refer to Section ‎2.3.4). As evidenced by their strengths and weaknesses, some computing models may be more tailored than others to a specific use case. Yet, fog computing fits into a wide range of applications. Although it is potentially unsuitable for a handful of other use cases, in particular, disaster areas or dispersed network topologies wherein computing in the extreme edge may be more appropriate (as in mist computing or MACC), the versatility of fog computing puts it in a prime position for numerous low-latency applications and data-driven computing. As a result of its extensive presence throughout the thing-to-cloud continuum, fog computing may be seen as a particularly general kind of computing in contrast to its similar paradigms. It is no wonder that IEEE standards have embraced the Open Fog Reference Architecture (IEEE Standards Association, 2018). A summary of these characteristics is presented in Table ‎1.1.

Table ‎1.1 The main features of the above-mentioned computing models

Attribute Mist MACC MC MEC EC Cloudlets FC MCC CC
Virtualization support      
Extremely low latency        
Standardized        
Mobility support  
Location awareness    
Geo-distribution        
Heterogeneity support        

It is noteworthy that the terms “fog”, “cloudlets”, “mist”, and “edge” have been referred to interchangeably in various papers since all of them have the term “edge” in common. The telecommunication industry typically uses the same term “edge” to refer to RANs, base stations, and Internet Service Providers (ISP). In IoT, however, it is currently employed to refer to the local network in which IoT devices and sensors are situated (Reale, 2017). Simply put, the so-called “edge” represents the next hop up from IoT devices but not these devices themselves, like IoT gateways and access points. On the other hand, when computation is performed on IoT devices themselves, it is called mist computing. The classification of all these computing models and their overlap is shown in Figure ‎1.4.

image

Figure ‎1.4 The classification of the discussed computing paradigms. Adapted from (Yousefpour et al., 2019).

Nonetheless, for the purposes of this dissertation, edge computing is employed as a concept that encompasses all the aforementioned computing models, as all of them seek to push computing more or less towards the edge of the network (refer to Figure‎1.3). PureEdgeSim (or Pure Edge Simulator, as the name suggests) is quite versatile in terms of modeling all sorts of distributed topologies and computing paradigms, even pure peer-to-peer ones.

1.3 Modeling and Computer Simulation

Despite the fast growth in the field of edge computing, there are hardly any commercial rollouts available for researchers to conduct experiments on (Varshney & Simmhan, 2017; Yousefpour et al., 2019). Furthermore, the setup, management, and maintenance of huge numbers of heterogeneous devices carries high operational costs. To aid edge computing research, various emulators (Coutinho et al., 2018; Hasenburg et al., 2021; Mayer et al., 2017) and models (Ahvar et al., 2019; Flores et al., 2018; Gupta et al., 2017; Hirsch et al., 2020; Jalali et al., 2016; Sarkar & Misra, 2016; Sonmez et al., 2018; Zeng et al., 2017) have been introduced in the last few years.

This section is intended to provide introductory knowledge on simulation and modeling. Section ‎1.3.1 introduces the different notions of simulation. Section ‎1.3.2 provides an overview of the simulation history. Section ‎1.3.3 discusses the benefits of simulation over emulation as well as its drawbacks. The process of building a simulation model is described in Section ‎1.3.4. The concepts of validation and verification are addressed in Section ‎1.3.5. Section ‎1.3.6 presents the distinctions between analytical and numerical approaches before concluding everything with a discussion in Section ‎1.3.7.

1.3.1 Definitions

Simulation is the process of mimicking real-world systems. In other words, it serves to create a model on which various actions are performed to visualize the way a real-world system would behave under varying situations (Imagine That Inc., 2021). Consequently, it is possible to infer that the primary motivation for simulation is to obtain an understanding of systems or processes and to improve them. It is due to this aspect that simulation represents such an important and extensively adopted technique (mostly through computer systems) and is encountered in a variety of sectors ranging from aviation to manufacturing.

In accordance, the system might be a natural or an artificial, abstract or concrete entity that fits into a particular reality bound by an environment (Wainer, 2017). A system may also be described as a group of components that work together to achieve a shared objective.

A model, on the other hand, is a depiction of a particular system that allows for a better understanding of it. It might be real, like a vehicle, or conceptual constructed solely in ones’ thoughts. It might act in a variety of ways. For instance, the building construction plan and the simulation model of a computation offloading process are different in certain aspects. The first one represents a static model that never changes over time, whereas dynamic ones, like the second, change themselves in response to circumstances and other variables.

In a discrete event simulation, the event represents the change in the model’s state. A discrete-event model, as will be discussed later in Section ‎1.3.6.2, changes each time an event occurs and maintains this state until the following event. The state of the model may be described by the values of its characteristics that are present at a given time period. In an edge computing system, the state can be characterized by resource utilization, the number of devices, and other factors.

1.3.2 Simulation History

The use of simulation methods has a long history. Simulation approaches date all the way back to the late 18th century. The development of simulation methods can be divided into several parts based on many perspectives, like the use of computer software and languages, the kinds of models, or the application scenarios. Nevertheless, it is widely recognized that the Monte-Carlo approach has started with Buffon’s "experiment of needles" in 1777. This experiment consisted of tossing needles on a plane with equidistant parallel lines to determine \pi value. Then, by plenty of adjustments, Laplace elaborated it. Of course, the greatest leap forward in simulation came after World War II. For example, SimScript was created in 1963 with FORTRAN. It featured a basic interface and used forms to define the model, initialize it, and generate the report. It lasted until 1981, when a new language, SIMULA, was created by Kristen Nygaard and Ole-Johan Dahl. The latter is regarded as one of the remarkable languages in programming history (Goldsman et al., 2010).

In fact, simulation has advanced even further, with the help of other technologies such as computer-aided design (CAD). Nowadays, simulation is applied in many fields, from manufacturing to aviation, with multiple simulators available on the market offering a variety of features. Some good examples of simulation software are Modelica, Matlab, Arena, Dymola, and Extendsim.

1.3.3 Simulation and Emulation

After providing a concise description of simulation and its motivation, this section will examine the advantages and disadvantages of using simulation in comparison to emulation. Emulators allow real-world applications to run in a virtual edge computing system. They imitate certain properties of edge computing by interpolating latencies or failure rates between virtualized computing nodes. This technique approaches reality and yields empirical findings but represents a significant disadvantage: running applications in emulated systems is more costly in terms of computation compared to running them natively. The cloud’s scalability may assist in overcoming this barrier to some degree. Large-scale experiments that involve large numbers of devices and applications, on the other hand, quickly become prohibitively expensive. Creating and using models is another way to explore edge computing systems. A model abstracts certain fundamental features of edge environments, allowing experiments to be efficient, scalable, and, above all, simpler to evaluate. The development and implementation of models may lead to a more thorough knowledge of the situation by determining the "driving" factors (Maria, 1997). However, modeling always represents a compromise between realism and simplicity, and the definition of a valid model is inextricably linked to the use case.

Simulation is the process of putting models into action. It can be effective for a number of reasons. To begin, when processes imply changes, foreseeable or not, a variation can always occur somewhere in them. Take the example of an edge computing environment: As soon as the number of devices or the distance between them is increased or decreased, the execution of tasks can be affected. Furthermore, the lack of bandwidth, as an unforeseeable change, might have unintended consequences as well. Another reason to choose simulation over other techniques is that it does not require real-time execution. In a matter of minutes, an abstract simulation may recreate the behavior of many hours of an edge computing system. In addition to all these benefits, the following points indicate why simulation should be used (Robinson, 2004).

  1. Cost savings: As an example, using simulation helps to avoid stopping processes when executing new ideas or redesigning everything when the implementation lowers the system performance. Indeed, such concerns have the potential to significantly raise expenses and reduce client satisfaction.

  2. Time savings: The use of simulation shortens the time it takes to implement operational changes. For instance, in experiments involving real systems, and even if it occurs in a relatively limited area, it takes much longer to change the network topology or to get feedback on an enhancement.

  3. Control: Simulation enables to control the process under different conditions. Hence, preparing for unexpected changes.

  4. System design: With simulation, it becomes possible to create a model with less time and cost and easily experiment with a series of scenarios under different conditions before rolling out a real system.

To conclude, such reasons compel the users to rely on simulation in their operations. It allows them to be prepared for the consequences of unforeseen changes, to assess the performance of the process under diverse situations, and to compare various scenarios. Nevertheless, modeling has a number of drawbacks as well: inaccurate or skewed assumptions might result in an erroneous outcome. This is especially true in the case of edge computing, where there are no comparable real-world infrastructure deployments (Kecskemeti et al., 2017; Svorobej et al., 2019).

1.3.4 The Creation of a Simulation Model

In accordance with Banks et al. (2000), a plan can be established for the purpose of creating a simulation model. The letter is composed of nine steps which are summarized in Figure ‎1.5.

  1. Problem formulation: Defining the problem in-depth and formulating it in such a manner that readers can also understand it is crucial for starting a simulation.

  2. The definition of goals and plan development: Before starting the simulation, the goals must be established properly. Also, the strategy that guides the project should be prepared.

  3. Model creation: To prevent wasting time and resources, it is advisable to start with a basic plan and then enhance it gradually.

  4. Data collection: As long as the objective is to obtain correct output in the end, it is important to gather detailed data and properly utilize them as input. It is worthwhile to gather as much data as possible both before and throughout the model's development. For example, data for an edge computing process may include failure rate, some probability calculations, etc.

  5. Verification and validation: Perhaps the main drawback of simulation is its accuracy (Robinson, 2004). Although it offers numerous advantages, there is no assurance that all simulation findings are correct. This is where validation comes into play. Simply put, validation only asks whether this is the right model. It examines whether this model that has been created is consistent with the actual process. Conversely, verification asks whether the model was constructed the right way. This process will be discussed in-depth in Section ‎1.3.5.

  6. Experimental design: It is equally critical to experiment with a variety of scenarios and monitor the duration and number of runs; the greater the number of simulations and their duration, the greater the accuracy.

  7. Production runs and analysis: Throughout the simulation runs, several analyses must be performed in view of evaluating the performance of the model. In this stage, various statistical methods and theories are used.

  8. Simulation runs: In the same way as for the sixth step, it is helpful to realize if more simulation runs are needed in order to obtain more accurate results. However, it is also important to note that more simulations take longer.

  9. Documentation and presentation: Not everyone is a specialist in simulation. In fact, many engineers may be unfamiliar with simulation models. This is why it is necessary to provide enough documentation to facilitate understanding by users.

image

Figure ‎1.5 The summary of model creation. Adapted from (Banks et al., 2000)

1.3.5 Validation and Verification

Validation and verification are major steps in the creation of a model. While the issues addressed by these two techniques seem to be quite similar, they are fundamentally distinct. In short, validation verifies whether the model in question accurately depicts the physical world (Wainer, 2017). A valid model is indistinguishable from the actual system in the experimental setting. Verification, however, checks if the model was created properly and a verified model behaves in line with the specified requirements (Davis, 1992). As shown below in Figure ‎1.6, there are many kinds of validation (Robinson, 2004):

image

Figure ‎1.6 The concepts of validation and verification in simulation projects. Adapted from (Landry et al., 1983)

  1. Conceptual model validation verifies that the model has all the required features of the actual system and checks whether these features are correct enough to achieve the goal (refer to the simulation requirements in Section ‎2.3.4).

  2. Data validation, on the other hand, tests if data is accurate and corresponds to the real process (see how energy models are parameterized in Section ‎3.3.4.2).

  3. Verification and white-box validation: The concept of verification may be thought of as a validation subset. Accordingly, although verification and white-box validation techniques are fundamentally distinct, they are performed in a continuous manner throughout the simulation. Yet, as already said, verification is concerned with the model structure rather than matching it to the actual system. As such, it may be best to assess the model in small parts when verifying it. For instance, reducing the number of edge devices to an ‘irrelevant’ level can aid in observing the change in the output value and testing the model’s correctness (see Section ‎5.3.5). Also, a practical way of verifying the model may consist in discarding certain aspects of the process or comparing the results to initial expectations. In contrast, white-box validation examines the model extensively to determine whether it is accurate enough to meet the goal. In addition, verification requires just the modeler, whereas white-box validation, in addition to the modeler, needs anyone with knowledge of the actual process.

  4. Black-box validation: As the name suggests, this method is the inverse of white-box validation in that it tests the model from a macro perspective to ensure that it generally fulfills the system needs. There are two different methods to achieve this: The model can be tested against the actual system, if applicable, or otherwise against another model. The first method, which is the comparison with the actual process, is accomplished with the data obtained from it, like service time or energy usage, as well as the data obtained from the simulation. Nevertheless, the reliability of data cannot be guaranteed, and it may alter in another time span. So, it is preferable to set a longer period of time. The other method is employed if actual data are not available. In this case, a mathematical model can be used.

  5. Experimentation validation: This entails checking if the experiment is sufficiently correct, which involves the duration of execution, the number of repetitions, and sensitivity analysis.

  6. Solution validation: This one resembles the black-box validation, except that it just verifies the end outcome and attempts to determine whether it is consistent with the goals (see the simulation results in Section ‎5.2.2).

After discussing validation and verification, it is important as well to consider the concepts’ flaws. To begin, the model cannot be validated from every possible angle. In other words, being valid for a given purpose does not necessarily imply that the model is valid for others as well. A model may be validated in terms of testing; however, that will not guarantee the validity of its output. Another potential issue is the reliability of actual system data. This is a significant concern since, during validation, accurate data are needed to compare them with the model output. For instance, the number of passengers at an airport changes from one season to another. Consequently, it is important to understand that these data are only a sample and may differ from the data at hand. It is advantageous to repeat validation several times to obtain the most accurate outcome, and this brings up another issue, the lack of time, as validation and verification are time-consuming. The last issue to consider is validation itself; validation never provides us with absolute certainty. It is just confidence that we give to the model. However, as pointed out earlier, the more validation, the more accurate the model.

In summary, although it is not possible to obtain a model that is completely correct, these techniques are employed with the goal of producing the optimal one. As a result, it can easily be said that the comparison of findings with real tests, as well as their continuous development, lead to the most realistic model in the end, although it is not fully identical to the actual process.

1.3.6 Modeling Approaches

A simulation model is essentially a digital representation of the actual system (Imagine That Inc., 2021). It must be simplified to provide users with efficiency, reliability, and usability while concentrating on the crucial aspects. There are two main types of models: analytical and numerical. As previously stated, while starting the simulation, it is critical to employ the appropriate one based on the simulated system. This section goes through the prerequisites of each one of them.

Analytical Models

Analytical models describe the underlying environment in a strictly mathematical manner. All infrastructure and applications characteristics are represented with a set of equations. For instance, a network connection’s throughput q may be expressed as (Wiesner et al., 2021):

q=t·b·1-p,

here t\ denotes the time in seconds, b represents bandwidth in bits per second, and p stands for the probability of a packet loss. Analytical models are highly abstract, allowing for a high-level comprehension of the issue. Parameters and their effect on the outcome can be effortlessly examined individually or as a group. The drawback of analytical models is the lack of a sense of time or state inside the system, while numerical models excel in this regard.

Numerical Models

Numerical models bypass the above-mentioned time constraint by simulating the model over time using a time-stepping method. This method allows for the simulation of increasingly complicated scenarios, particularly those involving scheduling, mobility models, and heterogeneous infrastructure and applications.

Using the same example above, we develop a numerical alternative to model the throughput q of a network connection (Wiesner et al., 2021). A network simulator might, for instance, deliver packets of size s\ following a random distribution D with a probability of p. The throughput is determined by executing this simulation for time t and then calculating the total number of transmitted bits.

It should be emphasized that p was a real number between 0\ and\ 1 in the analytical model, whereas, in the numerical one, it is a real probability that a package will be delivered. As a result, the numerical model includes randomness. Similarly, D is a random distribution as well, which in this scope, is typically a Poisson distribution. To minimize the impact of randomness, sensitive simulations should be performed many times and their output aggregated, as opposed to analytical models that, although more abstract, always produce accurate results. This situation also demonstrates another drawback of numerical models: Their outcome is more difficult to evaluate. Furthermore, at least within the context of network simulations, numerical model execution can be time-consuming, while analytical ones are typically quite quick.

Numerical models can be classified into three types: continuous, discrete event, and discrete rate. It is essential to choose the right one that suits the simulated system. This decision may be made using the real process. These types are explored in more detail below:

Continuous Simulation

As the term implies, continuous simulation is suitable for systems whose states change continuously over time. It indicates that time intervals are predefined and identical. This sort of modeling employs an approximation approach that entails taking small intervals to detect changes in processes. As an example, consider the volume of a liquid in a tank. We observe the changes in the state during the process at short time intervals; the shorter these intervals, the precise the result since the approximation is more exact. It does, however, make the simulation run slower and take longer.

Discrete Event Simulation

Discrete event simulation (DES) utilizes a queue of events that occur at a certain time. These events are handled chronologically, and every processed event may lead to the addition of new ones to that queue. It is assumed that the system does not change the state between successive occurrences. Hence, the simulation can go straight from one event to another. This enables it to run at a quicker rate than in real-time. When there are no more events in the queue or a preset time limit is achieved, the simulation terminates. While a DES grows approximately linearly with the number of events, it is notoriously difficult to parallelize owing to its sequential nature (Svorobej et al., 2019).

Discrete Rate Simulation

Because it incorporates the characteristics of the preceding two simulation approaches, discrete rate simulation is considered a hybrid kind of simulation. It represents flows in the same way that a continuous model does, but it only updates values in response to an event, like a DES. Continuous and discrete rate models both adhere to the First In First Out (FIFO) principle (Krahl, 2009). Time intervals, on the other hand, are event-dependent. This time-based distinction is depicted in Figure ‎1.7. In this figure, discrete event, and discrete rate simulation both exhibit variable time intervals, but their distinction derives from the process types on which they concentrate their attention.

image

Figure ‎1.7 Types of modeling

1.3.7 Discussion

Through the previous sections, we have been able to explore simulation concepts and approaches. As previously mentioned, it is essential to choose the proper modeling type based on the real system before beginning the simulation. While analytical models provide a high degree of abstraction, making them well-suited for large-scale experiments, they are unable to model edge computing environments for a variety of reasons. First, modeling an application scheduler in a meaningful manner using just analytical models is difficult since it responds to a specific state and state changes in the infrastructure. Additionally, it is far from being easy to model heterogeneous infrastructures. This is why the existing edge computing analytical models presented later in Section ‎2.3.3 are oversimplified and make many assumptions, such as an even distribution of devices and virtual machines and ignoring mobility. In contrast, this is where numerical models, particularly DES, shine.

Despite the fact that analytical models also simulate the modeled environment throughout their execution, numerical models are often referred to as "simulators". This dissertation adheres to the following convention: The term "simulator" is used only to refer to numerical models. Indeed, the simulators discussed in Sections ‎2.3.1 and ‎2.3.2 are built on a DES, and there are reasons for that. The major distinction between DES and continuous simulation is the time dimension. In a DES, time varies from event to event, and the intervals may be uneven. The state of a system in a DES will only change when an event occurs, and it is by no means directly influenced by time. An edge computing system may be a suitable example of a discrete event approach. The system state only changes when the computation is offloaded, which can happen at any given period of time. Here, the offloading of workload represents the events. In such a scenario, DES may as well be utilized for optimization. It can allow lowering cost and time, satisfying the QoS, solving tasks scheduling problems, improving resources utilization, and so on.

Aside from the temporal difference, DES models differ from continuous models in a few other respects. DES is more concerned with entities than with values that flow through a model. Furthermore, the flow is not required to be uniform, and every entity in the process has its own properties. As a result, DES entities may be sorted using a variety of principles like FIFO, Priority, Time-delayed, or any other sorting methods. Additionally, it allows monitoring every entity to determine service time, resource utilization, among other things.

In summary, DES may be used for the following purposes: designing and analysis of new processes, enhance the performance of current ones and proposing optimization policies or scheduling and planning algorithms (Wang et al., 1995). However, in contrast to analytical models that always produce accurate -although more abstract- results, with DES, sensitive simulations should be performed many times and their output aggregated to minimize the effect of randomness. This situation shows a drawback of numerical models: Their output is not as easily examined. Additionally, numerical models may be very time-consuming, while the execution of analytical ones is typically quick. At this stage, it seems that a hybrid approach combining both analytical and numerical modeling is the best option for modeling large-scale edge computing environments realistically. Nevertheless, as we said previously, modeling itself has a number of drawbacks as well: inaccurate or skewed assumptions might result in an erroneous outcome. This is particularly true in edge computing, as there are nearly no comparable real-world infrastructure deployments (Kecskemeti et al., 2017; Svorobej et al., 2019). Hence, we are building a simulator for an upcoming system for which no real data is available for comparison, complicating the validation process. One way around this is to rely on an existing network simulator, but as we shall see later, such simulators model the network in such depth that performance is sacrificed. Because edge computing systems can be thought of as a kind of cloud close to the ground, the alternative way, which we opted for, is to rely on a renowned and reliable cloud computing simulator. This will result in a robust codebase for modeling computational tasks, which can subsequently be extended to model edge computing heterogenous and distributed resources.

1.5 Conclusion Throughout this chapter, we have presented a quick introduction to edge computing and simulation and outlined the relevant concepts. We started with an overview of this computing paradigm, the rationale behind it, and its similarities and differences with respect to the many related computing models. Then we discussed the fundamental notions and definitions of simulation and modeling in general. We also emphasized the key elements of our approach, such as the adoption of a hybrid model and the extension of a renowned and reliable cloud computing simulator. The following chapter will present the existing work and place the contributions of this dissertation in the proper context.