Core image processing tasks, such as super-resolution, denoising, deblurring, pansharpening, and atmospheric correction, underpin all optical remote sensing (RS) pipelines. Errors at this stage propagate through downstream applications, distorting land-cover maps, change detection, and climate records. Classical physics-based models capture sensor optics, radiometry, and geometry but struggle with complex noise and scene variability. In contrast, deep learning (DL) methods offer powerful data-driven solutions yet often act as closed boxes, ignoring physical constraints and overfitting to spurious patterns. Hybrid DL (HDL) approaches bridge this gap by integrating physical models with neural architectures, combining interpretability and data adaptivity. This article surveys the emerging landscape of HDL methods in RS image processing, outlining their theoretical foundations, motivations, and design philosophies. We categorize fusion strategies, from model-embedded schemes (e.g., plug-and-play (PnP) and unrolling) to model-guided learning (e.g., deep image prior (DIP) and unsupervised frameworks), and discuss how they enhance trust, robustness, and physical consistency in RS image analysis.