With recent big data analytics (BDA) proliferation, enterprises collect and transform data to perform predictive analyses on a scale that a few years ago were not possible. BDA methodologies involve business, analytics and technology domains. Each domain deals with different concerns at different abstraction levels, but current BDA development does not consider the formal integration among these domains. Hence, deployment procedure usually implies rewriting code to be deployed on specific IT infrastructures to obtain software aligned to functional and non-functional requirements. Moreover, previous surveys have reported a high cost and error-prone transition between analytics development (data lab) and productive environments. ACCORDANT is a domain-specific model (DSM) approach to design, deploy, and monitor performance Quality Scenarios (QS) in BDA applications bridging the gap between analytics and IT architecture domains.
This approach uses high-level abstractions to describe deployment strategies and QS enabling performance monitoring. Our experimentation compares the effort of development, deployment and QS monitoring of BDA applications in different use cases which combine performance QS, processing models, and deployment strategies. Our results show shorter (re)deployment cycles and the fulfillment of latency and deadline QS for micro-batch and batch processing.
ACCORDANT is An exeCutable arChitecture mOdel foR big Data ANalyTics to deal with architectural inputs, functional, and deployment. Our proposal comprises a design and deployment process, and a DSM framework to support such process.
The ACCORDANT process comprises seven steps: the business user defines 1.1) business goals and 1.2) QS which will guide the next steps. 2) The data scientist develops data transformations, build and evaluates analytics models. The resulting analytics models are exported as PMML files. 3) Architect design the software architecture using ACCORDANT Metamodel in terms of Functional Viewpoint (FV) and Deployment Viewpoint (DV). FV model makes use of PMML models to specify the software behavior. 4) FV and DV models are interweaved to obtain an integrated model. 5) Code generation of software and infrastructure is performed from integrated models. 6) The code generated in the previous step is executed to provision infrastructure and install the software. 7) QS are monitored in operation to be validated, and design adjustments can be made to achieve QS, if necessary.
ACCORDANT Metamodel is composed of three packages: Architectural inputs, functional, and deployment viewpoint. The functional viewpoint imports sensitivity points from architectural inputs, and use PMML models from PMML (Predictive Model Markup Language) package to describe the analytics structure and behavior. The deployment view package imports sensitivity points from architectural inputs; and components and connectors from the functional view package. The details of the metamodels, languages, and generators are publicly available in the ACCORDANT GitHub Repository.
Use Case | Description | Domain | Analytics Model | Processing Model | QS Metric |
---|---|---|---|---|---|
UC1 | Transport Delay Prediction | Transportation | Regression Tree | Stream | Update time, Latency |
UC1 | NMAC Risk Analysis | Avionics | K-means | Batch | Deadline |
UC3 | NMAC Detection | Avionics | Decision Tree | Micro-batch | Latency |
UC4 | El Nino/Southern Oscillation | Weather | Polynomial Regression | Batch | Deadline |
More information about the use cases is available in the ACCORDANT Use cases GitHub Repository.