Description: Explanatory models in the multimodal category are tools designed to provide clear and understandable explanations of their predictions. These models integrate multiple types of data and modalities, such as text, images, audio, and other formats, to offer a more complete and accurate view of a phenomenon or behavior. Their main characteristic is the ability to combine different sources of information, allowing them to capture the complexity of data and improve the interpretation of results. By doing so, they not only generate predictions but also explain the reasoning behind them, which is crucial in contexts where transparency and interpretability are essential. This is especially relevant in fields such as artificial intelligence and machine learning, where understanding how and why a model reaches a conclusion can influence user trust and decision-making. In summary, multimodal explanatory models are fundamental for advancing the understanding of complex systems and fostering more effective interaction between humans and technology.