Publications | Alain Villagrasa Labrador

2025

MALVADA: A framework for generating datasets of malware execution traces

Razvan Raducu, Alain Villagrasa-Labrador, Ricardo J. Rodríguez, and 1 more author

SoftwareX, 2025

Abs

Malware attacks have been growing steadily in recent years, making more sophisticated detection methods necessary. These approaches typically rely on analyzing the behavior of malicious applications, for example by examining execution traces that capture their runtime behavior. However, many existing execution trace datasets are simplified, often resulting in the omission of relevant contextual information, which is essential to capture the full scope of a malware sample’s behavior. This paper introduces MALVADA, a flexible framework designed to generate extensive datasets of execution traces from Windows malware. These traces provide detailed insights into program behaviors and help malware analysts to classify a malware sample. MALVADA facilitates the creation of large datasets with minimal user effort, as demonstrated by the WinMET dataset, which includes execution traces from approximately 10,000 Windows malware samples.
A dataset of windows malware execution traces

Razvan Raducu, Alain Villagrasa-Labrador, Ricardo J. Rodríguez, and 1 more author

Data in Brief, 2025

Abs

Malware continues to be a major cybersecurity concern, with increasing volume and sophistication making effective detection methods essential. Behavior-based approaches rely on high-quality execution trace data to analyze how malicious software interacts with systems during runtime. Publicly available datasets often lack sufficient detail, contain limited family diversity, or provide only simplified API call sequences. In this paper, we present a dataset that addresses this gap by offering a large collection of richly detailed Windows malware execution traces generated in controlled environments. It has been generated through automated dynamic analysis, executing the malware samples in a controlled virtualized environment, specifically, in the CAPEv2 Sandbox on Windows 10 virtual machines. The raw sandbox analysis reports have been then processed using the MALVADA framework, a modular Python-based pipeline that filters, structures, labels, and standardizes execution traces. The resulting dataset consists of 31,844 JSON execution trace files where each trace contains static metadata, dynamic behavioral information, and labelling fields. The dataset is suitable for reuse in multiple research contexts, including the development and benchmarking of malware detection methods, behavioral clustering, dynamic analysis of malicious software, and automated labelling studies. Its standardized JSON structure facilitates integration with existing data analysis and machine learning pipelines, as well as combination with other datasets for extended studies.