Description: Neural compression is the process of reducing the size of neural network models, allowing these models to be more efficient and run on resource-limited devices, such as mobile phones and IoT devices. This process involves techniques that eliminate redundancies and optimize data representation while maintaining model accuracy and performance. Neural compression can include methods such as pruning, which removes unnecessary connections in the network, and quantization, which reduces the precision of model weights without significantly affecting performance. By making models lighter, neural compression facilitates their implementation at the edge (edge computing), where latency and bandwidth usage are critical. Additionally, this technique contributes to sustainability by reducing energy consumption during processing. In a world where artificial intelligence is increasingly integrated into everyday life, neural compression becomes an essential tool for bringing the power of AI to devices that previously could not support complex models.