With recent big data analytics (BDA) proliferation, enterprises collect and transform data to perform predictive analyses in a scale that few years ago were not possible. BDA methodologies involve business, analytics and technology domains. Each domain deals with different concerns at different abstraction levels, but current BDA development does not consider the formal integration among these domains. Hence, deployment procedure usually implies rewriting code to be deployed on specific IT infrastructures to obtain software aligned to functional and non-functional requirements. Moreover, previous surveys have reported a high cost and error-prone transition between analytics development (data lab) and productive environments. ACCORDANT is a domain specific model (DSM) approach to design, deploy, and monitor performance Quality Scenarios (QS) in BDA applications bridging the gap between analytics and IT architecture domains.
This approach uses high-level abstractions to describe deployment strategies and QS enabling performance monitoring. Our experimentation compares the effort of development, deployment and QS monitoring of BDA applications in different use cases which combine performance QS, processing models, and deployment strategies. Our results show shorter (re)deployment cycles and the fulfillment of latency and deadline QS for micro-batch and batch processing.
Castellanos Camilo, Correal D, Juliana-davila R. (2018) Executing Architectural Models for Big Data Analytics. European Conference on Software Architecture (ECSA) 2018 Software Architecture (ISBN 978-3-030-00761-4) Madrid, España. – 2018
Lambda architecture has gained high relevance for big data analytics by offering mixed and coordinated data processing: real time processing for fast data streams and batch processing for large workloads with high latency. However, concrete implementations over cloud infrastructures and cost comparisons are still not being sufficiently analyzed. This paper presents a cost comparison of Lambda architecture implementations using Software as a Service (SaaS) to support IT decision makers when streaming-analytics solutions must be implemented. To do that, a case study of transportation analytics is developed on three public cloud providers: Google Cloud Platform, Microsoft Azure, and Amazon Web Services Cloud. The evaluation is carried out by comparing deployment, configuration, development, and performance costs in a public-transportation delay-monitoring case study assessing various concurrency scenarios.
The prevention of students dropping out is considered very important in many educational institutions. In this paper we describe the results of an educational data analytics case study focused on detection of dropout of Systems Engineering (SE) undergraduate students after 6 years of enrollment in a Colombian university. Original data is extended and enriched using a feature engineering process. Our experimental results showed that simple algorithms achieve reliable levels of accuracy to identify predictors of dropout. Decision Trees, Logistic Regression, Naive Bayes and Random Forest results were compared in order to propose the best option. Also, Watson Analytics is evaluated to establish the usability of the service for a non expert user. Main results are presented in order to decrease the dropout rate by identifying potential causes. In addition, we present some findings related to data quality to improve the students data collection process.