Training job hangs for 10 minutes on iGPU
Created by: piotrgrubicki
- Install Geti (2.13.0 in my case) on a machine with Arrow Lake and no additional GPU.
- Ensure the correct device is present in the CM (see #1400) to execute the training on iGPU.
Every training (checked on detection and classification) hangs for 10 minutes before starting the calculations (the training pod, only the "training" step).
It seems to be waiting just after this line of log:
2025-10-17T11:57:54.432659497+05:30 /interactive_ai/workflows/train/trainer/.venv/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Attaching the training log for DFine-X.