To cite this paper use one of the standards below:
As cyberspace grows, so does the damage caused by malware, which is one of the main tools used by malicious agents. Machine learning algorithms have been consolidated as important tools for detecting threats. Models used by these algorithms depend on data for training and testing. In this sense, malware datasets have become valuable in the deployment of modern anti-malware systems. However, these datasets face problems with the quality of the samples, as well as not keeping up with the speed of technological evolution and becoming obsolete. In addition, many of the datasets used in research are not publicly accessible. This paper proposes a quality assessment framework based on metrics focused on sampling and data temporality. It also incorporates criteria aligned with the FAIR principles, with the aim of encouraging the publication of more reliable and reusable datasets.
With nearly 200,000 papers published, Galoá empowers scholars to share and discover cutting-edge research through our streamlined and accessible academic publishing platform.
Learn more about our products:
This proceedings is identified by a DOI , for use in citations or bibliographic references. Attention: this is not a DOI for the paper and as such cannot be used in Lattes to identify a particular work.
Check the link "How to cite" in the paper's page, to see how to properly cite the paper