Sign In

Communications of the ACM

Contributed Articles

ACE: Toward Application-Centric, Edge-Cloud, Collaborative Intelligence

bright light behind a row of computer servers, illustration

Credit: Shutterstock

In recent years, machine learning (ML), especially deep learning (DL), has been applied to various domains—for example, computer vision, speech recognition, and video analytics. Emerging intelligent applications (IAs), such as image classification based on deep convolutional neural networks (CNNs);21 traffic-flow prediction based on deep recurrent neural networks (RNNs);42 and game development based on deep generative adversarial networks (GANs)20 are demonstrating superior performance in terms of accuracy and latency. Such performance, however, requires tremendous computation and network resources to deal with the increasing size of ML/DL models and the proliferation of vast amounts of training data.27

Back to Top

Key Insights


Cloud computing is indisputably attractive to IA developers as the pre-dominating high-performance computing (HPC) paradigm.5 Cloud providers typically offer services such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) to facilitate application implementation—resources such as high-performance computation, massive elastic storage, and reliable network services are allocated according to user requirements. Intuitively, mainstream IAs are deployed on the cloud to leverage centralized resources for computationally intensive artificial intelligence (AI) tasks, such as data processing, ML/DL model training, and inference. For instance, the distributed training of AlphaGo37 is a typical cloud intelligence (CI) representative.

However, novel challenges to CI emerge when modern IAs rapidly proliferate and need to be in production in practice, where high end-to-end service latency, high network-bandwidth overhead, and severe privacy leakage threats are among the most critical ones.45 Instead of concentrating on the cloud, increasing efforts attempt to exploit heterogeneous resources distributed at the network edge to address such issues. Some IAs offload DL tasks to edge servers (for example, Nvidia's Jetson TX2 board)44 for privacy preservation and timely responses. Such an edge offloading of relatively simple AI tasks, or edge intelligence (EI),13,32 alleviates the controversy between broadened requirements of modern IAs and the conventional CI paradigm.

The rapid development of EI and corresponding prototypes demonstrates that, due to edge devices' heterogeneous resource constraints, the cloud is still critical to modern production-level IAs with multi-faceted performance requirements.45 Increasingly, IA developers have begun focusing on efficiently leveraging edge resources under cloud coordination to collaboratively conduct AI tasks with optimized performance,1,38 or edge-cloud collaborative intelligence' (ECCI). ECCI relies on pivotal interdisciplinary technologies of cloud and edge computing (supporting ECCI infrastructure and runtime) and ML/DL-based AI (introducing rich IA workloads).

Existing ECCI applications (for example, HOLMES16 for healthcare, EdgeRec10 for e-commerce, SurveilEdge41 for urban surveillance, and general solutions such as CLIO17 and SPINN22) are individually developed and deployed either by academic researchers or industrial communities, where both the application design and system implementation are highly developer dependent and scenario specific. For example, SurveilEdge41 is a typical ECCI application for real-time, intelligent, urban surveillance video query. In its proto-typical implementation, developers depend on relatively higher edge computation capabilities (that is, x86 PCs) to support system scaling without subtly designing an ECC infrastructure management scheme. For the ease of implementation, they hard-code the load-balancing policy with the video query workload for latency reduction. Additionally, to achieve intelligent video query, the entire solution is specifically designed to support CNN training and inference workloads, where dedicated service links (for example, message service links) among all application components are individually configured to achieve edge-cloud collaborations. However, without impacting application performance, such developer-dependent design and implementation impedes others to migrate the application to general ECC infrastructures (for example, resource-constrained industrial IoTs) or to pursue customizable performance optimizations (for instance, joint optimization of latency and bandwidth consumption). Moreover, if others want to adopt SurveilEdge (or other existing applications) as the backbone for other applications driven by different DL models and deployed at different infrastructures, corresponding DL runtimes and different ECC services must be thoroughly designed and implemented by the adopters themselves. Such a non-generic manner is severely hindering the proliferation of production-level ECCI applications.

Novel challenges to cloud intelligence emerge when modern intelligent applications rapidly proliferate and need to be in production in practice.

Therefore, for the cost-efficient implementation of high-performance, production-level ECCI applications, it is necessary to construct a unified platform to handle both ever-increasing edge and cloud resources and emerging IA workloads with increasing scale and complexity. Particularly, to construct such a platform, the following four challenges must be explicitly addressed:

Support for unified management of hierarchical and heterogeneous infrastructures. The efficient implementation of ECCI applications requires unified management of not only infrastructures offered by traditional centralized cloud providers but also heterogeneous computation, storage, and network resources geographically dispersed at the edge. Developing and deploying ECCI-application components on edge devices is extremely inefficient due to the lack of a unified platform. Furthermore, it is infeasible to directly migrate IaaS and PaaS technologies in cloud computing to the management of inherently distributed edge resources.4

Support for user-transparent ECC services. ECCI application developers require services providing user-transparent edge-edge and edge-cloud collaborations. In most cases, components of existing ECCI applications are independently deployed on edge nodes, only interacting through services deployed on the cloud. This increases both bandwidth cost and response latency. Few existing edge services (for example, Dapra) can improve edge autonomy and application performance to a certain extent. However, due to the lack of links between edge and cloud services, they cannot provide user-transparent, collaborative services to developers.

Support for complex IA workloads. Efficient ECCI application implementations require comprehensive system-level supports to complex IA workloads, such as ML/DL model training and inference, which cannot be provided by existing cloud- and edge-computing platforms. For instance, in edge-computing systems for IoT data processing, the message-driven communication solution for transmitting kilobyte-level sensor data cannot effectively handle the transmission of DL models as large as hundreds of megabytes. Moreover, most existing distributed ML/DL solutions are designed for datacenter networks with high bandwidth and low transmission latency. Such methods are inefficient in ECC systems with inherent constraints, such as prolonged and unstable end-to-end (E2E) communication latency.

Support for unified optimization of ECCI applications. Unified performance optimization mechanisms are important to efficient ECCI application implementations. For most existing edge-computing applications, resource-utilization efficiency highly depends on the developer's design, where effective optimizations require a profound understanding of system architectures and optimization theories.14 For existing ECCI applications, except for the multi-component development and cross-device deployment of inherently complex IA workloads, developers must also deal with the overall performance optimization across ECC infrastructures by themselves, not to mention the difficulties in application debugging, monitoring, and profiling caused by the distributed and heterogeneous environment. Such a requirement is quite challenging to not only developers of emerging ECCI applications, but also to those who want to migrate existing IAs to ECC infrastructures.

Back to Top

ECCI Application Patterns

Currently, there exists no commonly accepted abstraction of general ECCI application patterns, which are critical to improving the efficiency of ECCI application development and deployment. As the foundation of the unified platform, considering the subject of different application tasks, we extract four common patterns: ECC processing, ECC training, ECC inference, and hybrid collaboration.

ECC processing of data is the basis of various ECCI applications. Collaborative data-processing applications are often built as pipelines or directed acyclic graphs (DAGs). For example, the Steel framework31 deploys a streaming analytics pipeline of different data-processing tasks, such as filtering, anomaly detection, and storage, for ECC IoT anomaly-detection applications.

ECC training refers to conducting ML/DL model training based on edge-cloud collaborations. Unlike ECC processing, ECC training requires complex interactive and iterative data and control flows between edges and the cloud—for example, training data, model, and hyper-parameter exchanges. For instance, federated learning (FL) is a typical ECC training application that conducts ML training across devices to protect data privacy (for example, Gboard Mobile Keyboard11 and Apple QuickType Keyboard3) and to bridge data silos (for example, model training for bank fraud detection43).

ECC inference focuses on ML/DL model inference, where only forward propagation is conducted. ECC inference is generally achieved either through intra-model or inter-model collaborations. In intra-model solutions, a single DL model is decomposed, by layers, into two parts (neural network partitioning) deployed at edges and the cloud respectively for collaborative inference—for example, Neuro-surgeon,19 SPINN,22 and JointDNN.9 In inter-model versions, however, multiple DL models with different functionality or performance are deployed at edges and the cloud respectively for collaborative inference—for example, VideoEdge18 and SurveilEdge.41

Hybrid collaboration combines at least two of the three ECCI application patterns mentioned or integrates additional CI/EI capabilities into ECCI applications. For example, ShadowTutor8 enables robust HD video semantic segmentation with significant throughput improvement and network data-transmission reduction. Here, cloud servers conduct both the inference of the heavy and general 'teacher' model and the training of the lightweight 'student' model. Mobile edge devices conduct the 'student' model inference.

Back to Top

ECCI Platform Design Principles

In this article, we aim to construct a unified platform for the scalable, reliable, robust, and efficient development and deployment of ECCI applications. Doing so requires the efficient management of heterogeneous ECC infrastructures, user-transparent ECC services, and customizable performance optimizations. The desired platform should be treated as ECCI-as-a-Service (ECCIaaS), like the concept of Machine Learning-as-a-Service (MLaaS). Specifically, we extract five essential design principles:

Principle One: An instance of an ECCI application should be an integrated entity that can be managed in a scalable way. This principle requires the unified management of typical edge and cloud infrastructures, including hardware nodes, such as edge gateways; clusters, such as Kubernetes;b virtual machines; and third-party cloud services, such as Azure IoT Hub.28 Any operation of ECCI applications (for example, deployment and monitoring) should be carried out on large-scale collaborative infrastructures organized as a unity. ECCI applications should be able to provide continuously available services when the infrastructures are scaled or upgraded.

Principle Two: ECCI application components at edges and the cloud should be able to operate in both collaborative and autonomous manners. Unlike the datacenter network on the Cloud, the edge-cloud network has limited capabilities (for example, bandwidth), and may perform unstably. While supporting collaborations with the cloud, edges should be able to cache data and provide partial services autonomously to mitigate the impact of network partitioning.

Principle Three: Orchestration is essential to ECCI applications. Except for edge-cloud separations, modularized ECCI application components have specific requirements of computation and storage resources as well as deployment locations. Moreover, there can be multiple applications co-located at the same infrastructure. Therefore, component orchestration is necessary to ensure that all application resource and user requirements can be satisfied.

Principle Four: Provide in-app control of ECCI applications. In most cases, offloading computation to edges may not directly improve application performance. Here, in-app control optimization has been demonstrated to be effective in various aspects such as bandwidth saving30 and E2E latency reduction,34 which should be seriously considered for application performance enhancement.

Principle Five: Support as many types of ECCI application workloads as possible. ECCI application scenarios are ever-increasing, such as federated model training and ECC model inference. It is essential for the platform to support common application patterns and services, facilitating efficient development and deployment of a broadened spectrum of ECCI applications.

Back to Top

Application-Centric ECCI Platform

Driven by the five principles, the explicit design of our Application-Centric ECCI (ACE) platform is as follows.

Overview. In Figure 1, we illustrate the general ECCI application development and deployment procedure based on ACE. For application developers, this procedure comprises three major phases: user registration, application development, and application deployment.

Figure 1. The general architecture of ACE.

In the user registration phase, any ECCI application developer can register at ACE as a platform user. The user first requests the registration of an ECC infrastructure at ACE and registers all edge and cloud nodes to form an infrastructure according to operational instructions replied by ACE. Here, a node can be either a physical device or a virtual service, such as an edge gateway; a cloud server; a private or public cloud; and so on. The user can also choose to deploy different resource-level services based on service components provided by ACE on the infrastructure, which can be shared among all the user's ECCI applications.

Then, in the application development phase, the user implements applications in a modularized manner. Specifically, for each application, different components are separated according to user-defined business logic or functionality. Meanwhile, requested by ACE, the user deliberately decouples application control flows with workload flows for collaboration optimization and component reuse. All components are then implemented using the ACE software development kits (SDKs) and encapsulated into containers that can be deployed on edge or cloud according to a component's resource and user requirements. For each application, the user constructs a topology file describing component relations and resource and user requirements of each component. All component images and corresponding topology files are then submitted to ACE.

Finally, in the application deployment phase, ACE determines a deployment plan for all components of a specific application according to the topology file, guaranteeing that all resource and user requirements are satisfied. According to the plan, the application can be deployed on the user's ECC infrastructure through ACE. All deployed applications are continuously monitored by ACE for maintenance, and corresponding users can upgrade, monitor, and remove their applications at any time.

To achieve the procedure above, we construct our ACE platform in a hierarchical manner with three layers—the platform layer, resource layer, and application layer. The general architecture of ACE is illustrated in Figure 1. Details of each layer are as follows:

Platform layer. This layer manages the ACE platform, all registered users, and user nodes and applications. It also offers platform-level services for users and their applications.

Platform management. Our platform-layer manager comprises a controller, orchestrator, API server, pub/sub service, monitoring service, and user interfaces:

The controller manages platform users, their nodes, and their applications—for example, registers and deletes users, shields failed nodes, and controls node component deployment.

The orchestrator determines a deployment plan for all components of each application based on the topology file, ensuring that resource (for example, computing) and user (for instance, location) requirements of all components are satisfied.

The API server provides uniform APIs for querying and manipulating the status of ACE entities (for example, users, nodes, applications) to other platform manager components (for instance, orchestrator, controller).

The pub/sub service provides a bi-directional data transmission channel between ACE and user nodes and applications (for example, delivering deployment instructions from the controller to user nodes).

The monitoring service collects the status, performance metrics, and runtime logs of ACE, user nodes, and applications.

The user interfaces enhance ACE's user-friendliness with a command-line interface (CLI) and web-based dashboard. For example, the dashboard with a drag-and-drop visual application editor can be used for handy application development.

Platform-level services. Platform-level services are not ACE's internal features. They can be implemented as requested to improve the efficiency of ECCI application development and deployment based on ACE.

Two typical examples include:

  • Image registry hosts ACE-provided images (for example, controller, orchestrator), generic runtime images (for instance, Python runtime), and user-provided customized application images.
  • Validation testbed allows users to develop, debug, and monitor ECCI applications on an SDN-based application validation testbed. For example, the impact of edge-cloud channel dynamics (for example, bandwidth, delay, jitter) on the testbed can help users understand the actual performance of an ECCI application in real-world networks.

Resource layer. This layer manages the ECC infrastructure of each user. It also provides resource-level services shared among applications deployed on the same infrastructure.

Developing and deploying ECCI-application components on edge devices is extremely inefficient due to the lack of a unified platform.

Infrastructure organization. Considering Principles One and Two, ACE requires all nodes of each user to be organized as several edge clouds (ECs) and one central cloud (CC) to host scalable ECCI applications and to enable autonomous operation of edge components. All edge nodes are divided into several groups according to a specific user's preferences—for example, in terms of nodes' geographical locations or resources. Each group is organized as an EC, serving all end nodes (for instance, IoT sensors and cameras) that can access the EC through local area network (LAN). All cloud nodes of the user are organized as a single CC, and it can interact with each EC through wide area network (WAN). For each EC and the CC, internal nodes are organized as a cluster (similar to Kubernetes ideally or a node pool for simple implementation).

Treating each EC and the CC as a resource-level operational unit allows ACE to effectively manage the infrastructure and deploy applications on such an infrastructure without considering the explicit management of potentially massive underlay nodes. Moreover, when there is no cloud coordination caused by either CC or edge-cloud communication failure, each EC as a cluster remains (partially) functional, enabling local area collaborations among edge components based on corresponding edge services.

Receiving the user's registration request, ACE assigns a unique infrastructure ID to the user and establishes a node information record for infrastructure organization. Meanwhile, ACE assigns a unique second-layer ID affiliated with the infrastructure ID to each EC and the CC claimed by the user, where corresponding node registration instructions are automatically generated. Replied from ACE, such instructions are executed by the user on nodes. An agent is deployed on each node, informing ACE of the node information and the EC or CC to which the node belongs. ACE assigns a unique third-layer ID affiliated with the EC's or CC's ID to each node. The agent is also used for application deployment and application status collection.

Resource-level services. ECCI applications with the typical patterns previously discussed commonly require essential services such as small/big packet communication and data caching/storage.14,23,29 In a specific ECC infrastructure, existing services supporting ECCI applications are conventionally deployed on both ECs and the CC, serving EC and CC clients (that is, application components) respectively to ensure the autonomy of ECs. Each service is accessible to all applications deployed on the same infrastructure. However, due to the lack of links between edge and cloud services, conventional services require application developers to handle complex edge-cloud interactions. Treating conventional message services for small packet communication as an example, as shown in Figure 2, for edge-cloud uni-cast communications, all EC clients must directly access the message service at CC (Figure 2, Step 1) to communicate with CC clients. Here, the developer must handle the CC message service authorization to each EC client individually, which is quite expensive for large-scale ECCI applications.

Figure 2. Illustration of ACE-provided resource-level services.

Considering Principle Five, to facilitate efficient application development, ACE prefers to provide E2E resource-level services with unified interfaces to EC and CC clients, respectively. Therefore, long-lasting links between services on ECs and the CC must be established. Some conventional services support the direct establishment of such links (for example, service bridging for the message service). Specifically, as shown in Figure 2, ACE implements a resource-level message service, where the long-lasting link between EC and CC message services (Figure 2, Step 2) is established using MQTT topic-bridging.25 Here, edge-cloud interactions are conducted by an ACE-provided SDK, and each client only needs to interact with its local service with a dedicated interface. For other services, directly establishing long-lasting links is expensive or even infeasible. For example, the link between edge and cloud file services could be established using file synchronization, which induces additional requirements on network condition, computation, and access authorization. Instead, ACE uses the resource-level message service to establish long-lasting links for other services. ACE implements a resource-level file service, whose control flow (for example, Figure 2, Steps 3 and 4) is separated from the data flow and handled by the resource-level message service. Furthermore, the proverbial object storage service handles the data flow (for example, Figure 2, Steps 5 and 6) for transmission simplification. Note that, as shown in Figure 2, resource-level services use three types of links: ad hoc links (grey) for repetitive interactions, ad hoc links (orange) for one-off interactions, and long-lasting links (green) established once the service is deployed. Besides, resource-level services should provide basic operations for applications through their lifecycle—for example, temporary storage for intermittent models and data, and permanent storage for final trained models in the file service.

Application layer. This layer supports user applications through the entire lifecycle.

ACE-supported ECCI application life-cycle. As a unified platform, ACE supports each application through its entire lifecycle—that is, designing, coding, building, testing, deploying, and monitoring. For designing, ACE provides a standard specification (the topology file) to achieve modularized development for ACE-organized ECC infrastructures. For coding, ACE provides the SDKs with access to resource-level services for application components and the user interface to access the essential integrated development environment (IDE). For building, ACE provides the image registry to enable efficient image management and distribution. For testing, ACE provides the validation testbed for application verification and evaluation. For deployment, ACE provides the orchestrator and the controller for automatic deployment. For monitoring, ACE provides the monitoring service collecting the status of application components and nodes where they are deployed. Such supports from ACE enable users to develop and deploy basic ECCI applications efficiently. For applications with specific performance requirements (for example, the minimal E2E latency) or with advanced architectures (for instance, large-scale components with complex topology), ACE provides two extra supports: reusable development and deployment automation.

Reusable development. Considering Principles Four and Five, ACE requires developers to decouple and separate control and workload planes of all application components. The control plane conducts in-app control operations, component monitoring, and policy execution—for example, choosing the best partition point for intra-model inference solutions.9,19 The workload plane conducts computation, storage, and transmission instructed by the control plane—for example, deep feature compression module7 or hybrid collaboration pipeline for data processing and inference tasks.39 Such a separation allows ACE to construct a reusable in-app controller, enabling developers to concentrate on implementing ECCI workloads and efficiently contribute to the ACE-based ECCI ecosystem.

For the reusable in-app controller, ACE constructs a series of general in-app control operations (for example, start, filter, aggregate, and terminate), component monitoring operations, and a basic control policy. Determined by the ECC infrastructure, the controller is constructed at the resource level in an ECC manner. The CC controller conducts global coordination-related operations, and the EC controller coordinates components within the EC. Resource-level services support interactions between CC and EC controllers. For applications with specific performance requirements, developers can inherit the general in-app controller and override optimization methods as advanced control policies—for example, the rate control-based optimal edge-cloud bandwidth allocation.2

Deployment automation. Considering Principle One, ACE needs to support efficient application deployment regardless of topology complexity and infrastructure scale. To achieve this, ACE constructs an automatic application deployment method only requiring the application topology file containing information such as application specification, component clarifications, parameters, relations, and deployment requirements. Such a manner prevents users from handling complex component-infrastructure mapping. Specifically, to deploy an application, the user submits the topology file through the user interface to ACE and triggers the orchestration process. According to component deployment requirements, the ACE platform-layer orchestrator binds each component with specific nodes in the infrastructure and resource-level services required, generating the deployment plan. When the user triggers the deployment process, the ACE platform-layer controller generates the instruction to deploy each component instance on the corresponding node according to the deployment plan and sends the instruction to the node agent for execution. Note that users can manage applications (for instance, update and delete) by modifying the topology file. For example, for updates, such as submitting an updated topology file, the user can trigger a thorough update—that is, ACE deletes the previous application and repeats the entire deployment process. An incremental update can also be triggered; ACE automatically deploys updated components according to the new topology file.

Back to Top

How It Works: Intelligent Video Queries Using ACE

To validate our platform in supporting efficient and high-performing ECCI application development and deployment, we first present the entire development and deployment process of an intelligent video-query application based on ACE, then compare the performance of the application implemented with ACE, CI, and EI, respectively.

Application development and deployment. Video query24,41 is one of the killer ECCI applications. To fulfil latency-sensitive, user-specific video query requests (for example, query about the existence of a type of object in video streams from a geographic area), it generally uses edge and cloud resources to retrieve targeted objects from the video streams with a proper tradeoff between query accuracy and response latency under practical edge-cloud bandwidth limitations. In an earlier section, we developed and deployed a video query application (based on Wang et al.41) using ACE.

User registration. As an ACE user, we first mounted all our nodes and organized our ECC infrastructure instructed by ACE. Our infrastructure comprised a CC (one node—a GPU workstation) and three ECs (each with four nodes—an x86 Mini PC and three Raspberry Pis). Figure 3 provides node details. For each EC, all edge nodes are connected to an individual 100Mbps WLAN. Each EC connected to CC via WAN (campus network) with software-limited bandwidth (20Mbps uplink and 40Mbps downlink) and oneway delay (0ms and 50ms as ideal and practical networks, respectively). Let each Raspberry Pi receive the real-time video stream from a surveillance camera. We deployed the resource-level message service on the infrastructure.

Figure 3. ACE-based intelligent video query workflow.

Application development. Our application40 aimed to fulfill user-specific video query requests accurately and rapidly through edge-cloud collaborations under practical network limitations—that is, bandwidth and delay. We developed the following components: Data Generator (DG), providing the real-time video stream to the edge node; Object Detector (OD), rapidly extracting video frame crops potentially containing moving objects from the video stream; Edge Object Classifier (EOC), conducting lightweight, query-specific binary object classification; Cloud Object Classifier (COC), conducting accurate multi-class object classification; In-app Controller (IC), executing the control policy; and Result Storage (RS), saving all query results. OD on edge nodes was implemented using frame differencing41 (that is, cropping regions with salient pixel differences across frames) instead of an accurate but complex object detector, such as YOLOv3,33 for rapid crop extraction on resource-limited edge nodes. COC on CC was a ResNet15215 pre-trained on ImageNet ILSVRC1535 (4.49% Top-5 error rate). EOC was a MobileNetV236 rapidly trained on the fly by CC whenever there were user-specific queries. To form its query-specific training set, video frame crops containing different classes of objects were extracted on CC by a YOLOv3 pre-trained on COCO26 (57.9% mAP measured at 0.5 IOU) from historical video data uploaded by cameras at (or nearby) the queried area at leisure time, then labelled by COC. The trained EOC (training details are in Wang et al.41) was then deployed on edge nodes in the queried area. We used real video clips from YouTube Livec (30 fps, 1920 x 1080 resolution, various durations) as historical video data and real-time video streams to query. For a motorcycle query task, EOC's training set had 14,000 crops extracted from clips (170 hours total duration) from 14 surveillance cameras at or nearby the queried area—that is, historical video data. Another 6,433 'motorcycle' and 68,749 'non-motorcycle' crops were extracted as the EOC's test set, where EOC achieved an 11.06% error rate under 80% object identification confidence, tending to be less accurate than COC. Another three five-minute-long video clips were used as real-time video streams. Each node in the three ECs had one of the three clips.

The video query workflow after EOC's deployment is shown in Figure 3. For each edge node receiving the real-time video stream from DG (Figure 3, Step 1), OD selected three consecutive frames with a specific interval (for example, 0.5s) and rapidly extracted crops potentially containing moving objects. Such crops were classified by EOC (Figure 3, Steps 2 and 3) and the results were used by IC for crop scheduling based on the Basic Policy (BP) (Figure 3, Steps 4 and 5). If the object identification confidence of a crop was above 80%, a targeted object was identified (predicted as positive due to the lack of ground truth of the real-time video), and its metadata was sent to RS (Figure 3, Steps 3, 6, and 7). If confidence was below 10%, the crop was dropped. Otherwise, the crop was sent to COC (Figure 3, Steps 3, 6, and 8). If the Top-5 classification results of the crop on COC contained the targeted label, a targeted object was identified (that is, predicted as positive) and its metadata was sent to RS (Figure 3, Steps 8 and 7). Since BP may induce queue backlog at EOC and frequent reprocessing at COC, we constructed an Advanced Policy (AP) (Figure 3, Steps 4 and 10) based on BP as a customized IC to further reduce E2E Inference Latency (EIL). AP collected and estimated EILs of EOC (Figure 3, Steps 5 and 4) and COC (Figure 3, Steps 9, 11, and 4) to guide crop uploading from OD (that is, load balancing,41 always sent to the one with a lower estimated EIL; Figure 3, Steps 2, 6, and 8). Then, AP reduced crops uploaded from EOC to COC by shrinking the identification confidence thresholds when either EOC's or COC's EIL got deteriorated.

Application deployment. As shown in Figure 4, we submitted a topology file to ACE, which was an extended YAML file containing meta information of both the application and all components. We illustrate the deployment of component OD as an instance. Receiving the topology file (Figure 4, Step 1), the orchestrator determined the node(s) (for example, Raspberry Pi 'ec-1-rpi1' on edge cloud 'EC-1') satisfying all requirements of OD—that is, 'connections' implying OD's dependencies with components Local In-app Controller (LIC), EOC, and COC; 'resources' implying CPU and memory required by OD; and 'labels' implying that OD should be deployed on edge nodes connected to cameras. Such decisions were recorded in the deployment plan ('instances'), a topology replica modified by the orchestrator. Note that, to manage nodes in an EC as a cluster, ACE can delegate node-level orchestration to the EC. Receiving the deployment plan, (Figure 4, Step 2), the controller transformed information of OD instances into specific deployment instructions in a standard Docker-compose YAML file, which was distributed to the node agent (for example, the container engine at 'ec-1-rpi1') for OD deployment.

Figure 4. Automatic application deployment.

Impact of implementation paradigm on intelligent application performance. We compared the performance of our application implemented with different paradigms. For CI, COC was deployed on CC. For EI, EOCs were deployed on ECs. For ECCI based on ACE, two versions of the application with BP (ACE) and AP (ACE+) were deployed. Different system loads were simulated by varying the sampling interval of frame differencing in OD from 0.5s to 0.1s. Since all comparatives used the same OD, we compared their video query performance using their object classification performance. Particularly, we used F1-score,12,d edge-cloud Bandwidth Consumption (BWC), and E2E Inference Latency (EIL)e as evaluation metrics. We conducted the motorcycle query task under different system load and edge-cloud network delay (0ms for ideal and 50ms for practical) settings. Results are illustrated in Figure 5.

Figure 5. Intelligent video query performance.

When the system load increases, F1-scores of CI and EI basically remain the same; CI using only COC and EI using only EOC achieve the highest and lowest F1-scores under all system loads, respectively. ACE and ACE+ using COC and EOC collaboratively manage to achieve F1-scores slightly lower than CI but significantly higher than EI. Unlike EI, in ACE and ACE+, many crops that cannot be confidently classified by EOCs (with a confidence from 10% to 80% and dropped by EI) are uploaded to COC. Compared with CI, few crops are dropped by EOCs (with a confidence below 10%) in ACE and ACE+. Besides, the higher the system load, the better ACE+ performs compared with ACE. Under higher system loads, more crops are directly uploaded from ODs to COC by IC with AP for load balancing in ACE+, reducing crops dropped by IC with BP in ACE. Furthermore, when the system load increases, ACE+ achieves higher F1-scores under practical than ideal network delay. In ACE+, under practical network delay, fewer crops are uploaded from EOCs to COC to avoid higher EILs by shrinking the confidence thresholds and more are from ODs to COC for load balancing.

When the system load increases, BWCs increase for all except EI. ACE and ACE+ induce significantly lower BWCs than CI since considerable objects are identified by EOCs. Furthermore, the higher the system load, the higher BWCs of ACE+ compared with ACE. In ACE+, some crops (increase with system load) are directly uploaded by IC with AP for load balancing; however, only some are uploaded by IC with BP in ACE (with identification confidence from 10% to 80%), inducing higher BWCs.

When the system load is low, CI induces the lowest EIL under different network delay settings, benefiting from COC's fast processing—the inference time of COC is about 32.3ms on CC while EOC on edge node is above 44ms). When the system load increases, CI's EIL increases significantly (unlike EI, ACE, and ACE+) due to the large queue backlog aggregated from all ODs (normal in large-scale, edge-cloud systems). The practical network delay also enlarges CI's EIL in a more obvious way (significantly higher than the 50ms network delay). Compared to CI, EILs of EI, ACE, and ACE+ are not obviously impacted by both system load (that is, low queue backlog at EOCs) and network delay (no/low uploading). ACE's EIL is slightly higher than EI since EOCs manage to identify most objects, and only a few crops are uploaded to COC. Furthermore, the higher the system load, the lower EIL of ACE+ compared with ACE. Some crops (increase with system load) are directly uploaded to COC for load balancing by IC with AP in ACE+.

Compared with CI and EI, ACE-based video query manages to better fulfill query requests accurately and rapidly with efficient bandwidth consumption. ACE also facilitates developers for customized optimization—that is, EIL reduction with customized AP.

Back to Top

Future of Ace

As a prototype for cost-efficient ECCI application development and deployment, ACE is still in its infancy. The construction of ACE reveals fundamental challenges which must be addressed and sheds light on the vision of an ACE-based ECCI ecosystem deserving exploration.

Challenges. Agile ECCI application orchestration is challenging but critical to improving ACE-based application performance. ACE's platform-layer orchestrator manages to allocate application components to proper nodes, satisfying basic (node-level) resource and user (edge/cloud deployment) requirements. However, finegrained orchestration under more explicit constraints is still hard to achieve, which is significant to full infrastructure use. Furthermore, ACE's static application orchestration cannot adjust to application or infrastructure changes. A dynamic orchestrator is also necessary.

Compared with CI and EI, ACE-based video query manages to better fulfill query requests accurately and rapidly with efficient bandwidth consumption.

Resource-contention prevention must be investigated further to ensure the performance of ECCI applications co-located at the same infrastructure. Currently, ACE achieves component-level resource isolation through containerization and supports inter-component resource allocation optimization through the customized in-app controller, where, however, application-level resource isolation and allocation is still an open issue. Critical resources such as edge-cloud bandwidth should be allocated appropriately to co-located applications under ACE's coordination. It is also promising to integrate the serverless technology6 for elastic resource allocation that cannot be directly achieved by container-based solutions.

Security is another critical issue. ACE now contains no security module, where state-of-the-art encryption and authentication techniques can be directly integrated for fundamental secrecy. The actual challenge, however, is access control. In our design, ACE users have full access to their infrastructure and ECCI applications, where no user collaboration is currently supported. For specific applications (for example, federated learning) that must be developed and deployed collaboratively by multiple users, ACE is required to provide a fine-grained access-control mechanism. It must ensure each collaborator has limited access to the shared application and infrastructure without jeopardizing others' privacy.

Vision. ACE demonstrates the potential in supporting closed-loop DevOps of ECCI applications. ACE facilitates the cost-efficient, effective development and deployment of ECCI applications. Taking a step further, we believe it is viable to integrate proper operation and maintenance modules into ACE, aiming at the closed loop of continuous ECCI application development, deployment, monitoring, delivery, and testing. Such complete DevOps support will enable ACE to act as the foundation of the approaching ECCI ecosystem.

ACE promises to promote a broad spectrum of production-level ECCI applications. ECCI applications, especially high-performing ones, are difficult to design, develop, and deploy, which hinders such a paradigm from contributing to the rapidly expanding IA market. ACE supports the entire ECCI application lifecycle, helping general users to conduct unified and user-friendly application development and deployment. Besides, ACE can also ease the migration of existing IAs based on CI and EI to ECCI applications, satisfying specific practical requirements.

Back to Top


ML/DL-based IAs with harsher practical requirements cast challenges on conventional CI implementations. The emerging ECCI paradigm can support proliferating IAs that are, however, currently developed and deployed individually without generality. We envision systematic designs of a unified platform for cost-efficient development and deployment of high-performing ECCI applications, guiding us to construct the ACE platform handling heterogeneous resources and IA workloads. Our initial experience shows that ACE helps developers and operators along the entire lifecycle of ECCI applications, where customizable optimizations can be efficiently conducted. More research is still required, and we have discussed both the challenges and visions of the newborn ACE.

Back to Top


This work was supported in part by the National Key Research and Development Program of China under Grant 2020YFA0713900; the National Natural Science Foundation of China under Grants 61772410, 61802298, 62172329, U1811461, U21A6005, 11690011; the China Postdoctoral Science Foundation under Grants 2020T130513, 2019M663726; and the Alan Turing Institute.

Back to Top


1. Abdelzaher, T.F. et al. Five challenges in cloud-enabled intelligence and control. ACM Transactions on Internet Technology 20, 1 (2020), 3:1–3:19.

2. Alvar, S.R. and Bajic, I.V. Pareto-optimal bit allocation for collaborative intelligence. IEEE Transactions on Image Processing 30 (2021), 3348–3361.

3. Apple. Private federated learning. NeurIPS 2019;

4. Bagchi, S., Siddiqui, M-B., Wood, P., and Zhang, H. Dependability in edge computing. Communications of the ACM 63, 1 (2019), 58–66.

5. Bianchini, R. et al. Toward ML-centric cloud platforms. Communications of the ACM 63, 2 (2020), 50–59.

6. Castro, P., Ishakian, V., Muthusamy, V., and Slominski, A. The rise of serverless computing. Communications of the ACM 62, 12 (2019), 44–54.

7. Choi, H. and Bajic, I.V. Deep feature compression for collaborative object detection. In 2018 25th IEEE Intern. Conf. on Image Processing, 3743–3747.

8. Chung, J-W., Kim, J-Y., and Moon, S-M. ShadowTutor: Distributed partial distillation for mobile video DNN inference. 49th Intern. Conf. on Parallel Processing (August 2020), 8:1–8:11.

9. Eshratifar, A.E., Abrishami, M.S., and Pedram, M. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2021), 565–576.

10. Gong, Y. et al. EdgeRec: Recommender system on Edge in mobile Taobao. 29th ACM Intern. Conf. on Information and Knowledge Mgmt. (October 2020), 2477–2484.

11. McMahan, B and Ramage, D. Federated learning collaborative. Google Research (April 6, 2017);

12. Goutte C. and Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. European Conf. on Information Retrieval 2005: Advances in Information Retrieval. Springer, 345–359.

13. Greengard, S. AI on Edge. Communications of the ACM 63, 9 (2020), 18–20.

14. Harchol, Y. et al. Making edge-computing resilient. 2019 Master's thesis. EECS Dept., University of California, Berkeley.

15. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. 2016 IEEE Conf. on Computer Vision and Pattern Recognition. 770–778.

16. Hong, S. et al. HOLMES: Health OnLine Model Ensemble Serving for deep learning models in intensive care units. 26th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. (August 2020), 1614–1624.

17. Huang, J. et al. CLIO: Enabling automatic compilation of deep learning pipelines across IoT and cloud. In Proceedings of the 26th Annual Intern. Conf. on Mobile Computing and Networking. (April 2020), 58:1–58:12.

18. Hung, C. et al. VideoEdge: Processing camera streams using hierarchical clusters. 2018 IEEE/ACM Symp. on Edge Computing, 115–131.

19. Kang, Y. et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the 22nd Intern. Conf. on Architectural Support for Programming Languages and Operating Systems (April 2017), 615–629.

20. Kim, S.W. et al. Learning to simulate dynamic environments with GameGAN. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 1228–1237.

21. Krizhevsky, A., Sutskever, I., and Hinton, G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.

22. Laskaridis, S. et al. SPINN: Synergistic progressive inference of neural networks over device and cloud. 26th Annual Intern. Conf. on Mobile Computing and Networking (2020), 37:1–37:15.

23. Li, S. and Lan, T. HotDedup: Managing hot data storage at network edge through optimal distributed deduplication. IEEE Conf. on Computer Communications (2020), 247–256.

24. Li, Y. et al. Reducto: On-camera filtering for resource-efficient real-time video analytics. In Proceedings of the ACM SIGCOMM (July 2020), 359–376.

25. Light, R. Mosquitto man;

26. Lin, T-Y. et al. Microsoft COCO: Common objects in context. In Proceedings of the European Conf. on Computer Vision, (2014).

27. Mayer, R. and Jacobsen, H-A. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing Surveys 53, 1 (2020), 3:1–3:37.

28. Microsoft. Azure IoT Hub;

29. Monga, S.K., Ramachandra, S.K., and Simmhan, Y. ElfStore: A resilient data storage service for federated edge and fog resources. IEEE Intern. Conf. on Web Services (2019), 336–345.

30. Nigade, V., Wang, L., and Bal, H. Clownfish: Edge and cloud symbiosis for video stream analytics. 2020 IEEE/ACM Symposium on Edge Computing, 55–69.

31. Noghabi, S.A., Kolb, J., Bodik, P., and Cuervo, E. Steel: Simplified development and deployment of edge-cloud applications. In Proceedings of the 10th USENIX Conference on Hot Topics in Cloud Computing (July 2018), 1–7.

32. Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A. Enabling AI at the edge with XNOR-Networks. Communications of the ACM 63, 12 (2020), 83–90.

33. Redmon, J. and Farhadi, A. YOLOv3: An incremental improvement. CoRR abs/1804.02767 (2018);

34. Ren, P. et al. Edge-assisted distributed DNN collaborative computing approach for mobile web augmented reality in 5G networks. IEEE Network 34, 2 (2020), 254–261.

35. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Intern. J. of Computer Vision 115, 3 (2015), 211–252.

36. Sandler, M. et al. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018);

37. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484–489.

38. Song, M. et al. In-situ AI: Towards autonomous and incremental deep learning for IoT systems. 2018 IEEE Intern. Symp. on High Performance Computer Architecture, 92–103.

39. Ulhaq, M. and Bajic, I.V. ColliFlow: A library for executing collaborative intelligence graphs. NeurIPS 2020;

40. Wang, L. ACE-Evaluation;

41. Wang, S., Yang, S., and Zhao, C. SurveilEdge: Real-time video query based on collaborative cloud-edge deep learning. IEEE Conf. on Computer Communications (2020), 2519–2528.

42. Wang, Z., Su, X., and Ding, Z. Long-term traffic prediction based on LSTM encoder-decoder architecture. IEEE Trans. on Intelligent Transportation Systems 22, 10 (October 2021), 6561–6571.

43. WeBank and Swiss Re signed Cooperation MoU. (2019);

44. Zhang, D. et al. EdgeBatch: Towards AI-empowered optimal task batching in intelligent edge systems. 2019 IEEE Real-Time Systems Symp., 366–379.

45. Zhou, Z. et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. In Proceedings of the IEEE 107, 8 (August 2019), 1738–1762.

Back to Top


Luhui Wang is a Ph.D. candidate with the National Engineering Laboratory for Big Data Analytics, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Cong Zhao is an associate professor with the National Engineering Laboratory for Big Data Analytics, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Shusen Yang ( is a professor with the National Engineering Laboratory for Big Data Analytics, and the Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Xinyu Yang is a professor with the School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.

Julie McCann is a professor with the Department of Computing, Imperial College London, U.K.

Back to Top


a. See the Dapr Community website at

b. See the Kubernetes Community website at

c. See

d. Since real-time video streams to query were not labeled, we classified all crops extracted by OD during the entire query task with COC after the task was finished and treated COC's predicted labels as the query ground truth for F1-score calculation.

e. Time from when a crop is transmitted by OD to its predicted label is given by EOC or COC.

©2023 ACM  0001-0782/23/01

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.


No entries found