AMLA: Adaptive Meta-Learning Architecture for Automated Dataset Characterization, Predictive Algorithm Selection, and Feature Augmentation Advising
DOI:
https://doi.org/10.64751/Abstract
Algorithm selection remains a critical and persistent
bottleneck in applied machine learning: practitioners routinely
resort to exhaustive trial-and-error experimentation or subjective
domain intuition—approaches that are computationally
expensive, methodologically inconsistent, and inaccessible to nonspecialist
users. This paper presents the Adaptive Meta-Learning
Architecture (AMLA), a unified, domain-agnostic framework
that automates algorithm recommendation for structured tabular
datasets. AMLA integrates three tightly coupled modules: (i) a
Dataset Characterization Engine that extracts a multi-layered,
60-dimensional numerical fingerprint—termed Dataset DNA—
encoding statistical, structural, information-theoretic, landmarking,
and complexity features; (ii) a Predictive Algorithm Selector,
a trained meta-learner that maps Dataset DNA vectors to ranked
algorithm recommendations supported by SHAP-based explanations;
and (iii) a Feature Augmentation Advisor that diagnoses
structural weaknesses within a dataset and prescribes targeted
transformations. Unlike existing AutoML systems that operate
as opaque black boxes relying on brute-force pipeline search,
AMLA delivers interpretable, evidence-backed recommendations
through a self-improving Meta-Knowledge Base seeded from
OpenML community experiments, augmented by a local validation
pipeline. Evaluated across 50 benchmark classification
datasets, AMLA achieves a meta-learner Precision@1 of 72%,
a 48-percentage-point improvement over random baseline selection
and a 21-percentage-point improvement over the mostfrequent-
algorithm heuristic (both significant at p < 0.001,
Wilcoxon signed-rank test). The system is deployed as a fullstack
interactive web application built with Python, scikit-learn,
XGBoost, FastAPI, and React. AMLA makes six original contributions
to the meta-learning literature, including the Dataset
DNA fingerprinting scheme, predictive feature gap analysis, and
a closed-loop self-improvement mechanism—capabilities absent
from existing open-source tooling.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.






