Software: Practice and Experience - Authorea

https://onlinelibrary.wiley.com/journal/1097024x

by author

by title

by keyword

Searching Long Patterns with BNDM

Jorma Tarhio

March 08, 2024

A document by Jorma Tarhio. Click on the document to view its contents.

Enabling Continuous Deployment Techniques for Quantum Services

Javier Romero-Álvarez

and 4 more

July 22, 2023

Early advances in the field of quantum computing have provided new opportunities to tackle intricate problems in diverse areas as cryptography, optimization, and simulation. However, current methodologies employed in quantum computing often require, among other things, a broad understanding of quantum hardware and low-level programming languages, posing challenges to software developers in effectively creating and implementing quantum services. This paper advocates the adoption of Software Engineering principles in the field of quantum computing, thereby establishing a higher level of hardware abstraction that allows developers to focus on application development. With this proposal, developers will be able to design and deploy quantum services with less effort, similar to the facilitation provided by Service-Oriented Computing in the development of conventional software services. The present study introduces a Continuous Deployment strategy adapted to the development of quantum services, which covers the creation and deployment of such services. For this purpose, an extension of the OpenAPI Specification is proposed that allows the generation of services implementing quantum algorithms. The proposal was validated through the creation of an API with diverse quantum algorithm implementations, and evaluated through a survey of various developers and students who were introduced to the tool, with positive results.

Parsing Millions of URLs per Second

Yagiz Nizipli

and 1 more

June 02, 2023

URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following WHATWG URL standard (e.g., Servo’s rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.

SCARS: Suturing Wounds due to Conflicts between Non-Functional Requirements in Roboti...

Mandira Roy

and 5 more

June 02, 2023

Conflicts among non-functional requirements for robotic systems heavily depend on features of actual execution contexts. The main objective of this work is to design and experimentally evaluate a framework, called SCARS, providing: (a) a domain-specific language extending the ROS2 Domain Specific Language (DSL) concepts by considering the different environmental contexts in which the system has to operate, (b) support to analyze their impact on non-functional requirements, and (c) the computation of the optimal degree of non-functional requirement satisfaction that can be achieved within different system configurations. The effectiveness of SCARS has been validated on the Gazebo simulation for iRobot ® Create ®3 robot.

Grammar-based Fuzzing of Data Integration Parsers in Computational Materials Science

Jan Arne Sparka

and 4 more

March 17, 2023

Context: Computational materials science (CMS) focuses on in silico experiments to compute the properties of known and novel materials, where many software packages are used in the community. The NOMAD Laboratory1 offers to store the input and output files in its FAIR data repository. Since the file formats of these software packages are non-standardized, parsers are used to provide the results in a normalized format. Objective: The main goal of this article is to report experience and findings of using grammar-based fuzzing on these parsers. Method: We have constructed an input grammar for four common software packages in the CMS domain and performed an experimental evaluation on the capabilities of grammar-based fuzzing to detect failures in the NOMAD parsers. Results: With our approach, we were able to identify three unique critical bugs concerning the service availability, as well as several additional syntactic, semantic, logical, and downstream bugs in the investigated NOMAD parsers. We reported all issues to the developer team prior to publication. Conclusion: Based on the experience gained, we can recommend grammar-based fuzzing also for other research software packages to improve the trust level in the correctness of the produced results.

On the Interaction between the Search Parameters and the Nature of the Search Problem...

Isis Roca

and 3 more

January 24, 2023

The use of Search-Based software engineering to address Model-Driven Engineering activities (SBMDE) is becoming more popular. Many maintenance tasks can be reformulated as a search problem, and, when those tasks are applied to software models, the search strategy has to retrieve a model fragment. There are no studies on the influence of the search parameters when applied to software models. This paper evaluates the impact of different search parameter values on the performance of an evolutionary algorithm whose population is in the form of software models. Our study takes into account the nature of the model fragment location problems (MFLPs) in which the evolutionary algorithm is applied. The evaluation searches 1,895 MFLPs (characterized through five measures that define MFLPs) from two industrial case studies and uses 625 different combinations of search parameter values. The results show that the impact on the performance when varying the population size, the replacement percentage, or the crossover rate produces changes of around 30% in performance. With regard to the nature of the problems, the size of the search space has the largest impact. Search parameter values and the nature of the MFLPs influence the performance when applying an evolutionary algorithm to perform fragment location on models. Search parameter values have a greater effect on precision values, and the nature of the model fragment location problems has a greater effect on recall values. Our results should raise awareness of the relevance of the search parameters and the nature of the problems for the SBMDE community.

On the Relative Value of Imbalanced Learning for Code Smell Detection

Xiao Yu

and 5 more

January 10, 2023

Machine learning-based code smell detection has been demonstrated to be a valuable approach for improving software quality and enabling developers to identify problematic patterns in code. However, previous researches have shown that the code smell datasets commonly used to train these models are heavily imbalanced. While some recent studies have explored the use of imbalanced learning techniques for code smell detection, they have only evaluated a limited number of techniques and thus their conclusions about the most effective methods may be biased and inconclusive. To thoroughly evaluate the effect of imbalanced learning techniques on machine learning-based code smell detection, we examine 31 imbalanced learning techniques with seven classifiers to build code smell detection models on four code smell data sets. We employ four evaluation metrics to assess the detection performance with the Wilcoxon signed-rank test and Cliff’s δ. The results show that (1) Not all imbalanced learning techniques significantly improve detection performance, but deep forest significantly outperforms the other techniques on all code smell data sets. (2) SMOTE (Synthetic Minority Over-sampling TEchnique) is not the most effective technique for resampling code smell data sets. (3) The best-performing imbalanced learning techniques and the top-3 data resampling techniques have little time cost for code smell detection. Therefore, we provide some practical guidelines. First, researchers and practitioners should select the appropriate imbalanced learning techniques (e.g., deep forest) to ameliorate the class imbalance problem. In contrast, the blind application of imbalanced learning techniques could be harmful. Then, better data resampling techniques than SMOTE should be selected to preprocess the code smell data sets.

DRS: A Deep Reinforcement Learning enhanced Kubernetes Scheduler for Microservice-bas...

Zhaolong Jian

and 5 more

January 04, 2023

Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud-native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube-scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. To improve resource utilization and reduce load imbalance, we first present the Kubernetes scheduling problem as a Markov decision process and elaborately designed the state, action, and reward. Then, we design and implement DRS mointor to perceive six metrics about resource utilization to construct a comprehensive global resource view. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube-scheduler and achieve the expected scheduling target with three workloads. Compared with Kube-scheduler, DRS brings an improvement of 27.29% in resource utilization and reduce the load imbalance by 2 .90× on average, with only 3.27% CPU overhead and 0.648% communication latency.

Transcoding Unicode Characters with AVX-512 Instructions

Daniel Lemire

and 1 more

December 09, 2022

Intel includes on its recent processors a powerful set of instructions capable of processing 512-bit registers with a single instruction (AVX-512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF-8 and UTF-16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF-8 to UTF-16 at more than 5 GiB s − 1 using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open source library.

Security-based code smell definition, detection, and impact quantification in Android

Mengyu Shi

and 4 more

October 25, 2022

Android occupies a high market share, and its broad functions make Android security matter. Research reveals that many security issues are caused by insecure coding practices. As a poor design indicator, code smell threatens the safety and quality assurance of Android applications (apps). Although previous works revealed specific problems associated with code smells, the field still lacks research reflecting Android features. Moreover, the cost and time limit developers to repairing numerous smells timely. We conducted a study, including definition, detection, and impact quantification for Android code smell (DefDIQ): (1) define 15 novel code smells in Android from a security programming perspective; meanwhile, we provide suggestions on how to eliminate or mitigate them; (2) implement DACS to automatically detect the custom code smells based on ASTs; (3) investigate the correlation between individual smells with DACS detection results, and select suitable code smells to construct fault counting models, then quantify their impact on quality, and thereby generating code smell repair priorities. We conducted experiments on 4,575 open-source apps, and the findings are: (i) Lin’s CCC between DACS and manual detection results reaches 0.9994, verifying the validity; (ii) the fault counting model constructed by ZINB is superior to NB (AIC = 517.32, BIC = 522.12); some smells do indicate fault-proneness, and we identify such avoidable poor designs; (iii) different code smells have different importance and the repair priorities constructed provide a practical guideline for researchers and inexperienced developers.