FGRPB1G mem alloc problem; low memory_bound for new LATeah3012L06 tasks on older iGPUs

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 856
Credit: 16202827
RAC: 6783
Topic 230053

The recently discussed computation errors in Gary's long "Validate error" thread are off topic there. So we should use this new thread instead.

@Giorgio: a link to a failing task is sufficient as these logfiles tend to be veeeery long...  sometimes...   ;-)

Looking into Giorgios error log of some of his failed tasks (for example this one) and still successfully computing these FGRPB1G tasks myself on my own Intel iGPU, which seem to be despised by the professional crunchers around here, I have a strong deja-vu:

There is a new bunch of FGRPB1G tasks downloaded since some days, namely the "LATeah3012L06..." analyzing a different raw data file "LATeah3012L06.dat" than the previous "LATeah4021L..." ones. They also run 10...15 % faster on my (old) iGPU than the previous ones. So it seems analysis is a little different, different parameters... These new bunch of LATeah3012L06... workunits state their ressource bounds (see workunits list via 'boinccmd --get_state' or task details in BOINC manager) as follows:

   name: LATeah3012L06_796.0_0_0.0_33049719
   FP estimate: 5.250000e+14
   FP bound: 1.050000e+16
   memory bound: 429.15 MB
   disk bound: 19.07 MB

So, RAM allocation shouldn't exceed 429 MB. Intels iGPU uses a reserved part of RAM as its VRAM. Giorgios failed tasks logs the following exit status (error code):

   Exit status:198 (0x000000C6) EXIT_MEM_LIMIT_EXCEEDED

So, the memory allocation exceeded the specified mem_bounds of this workunit. That happened before with the last bunch of O3MD1 (gravitational waves tasks, behemoth tasks allocating ~3.4 GiB instead of the upper limit of 1.6 GiB  according to memory_bounds. But that's not the reason for the task to fail here. BOINC clients don't strictly monitor and limit memory allocation of science apps to memory_bounds limit. If there's sufficient memory to allocate, there won't be a problem. I have no such memory allocation problem with MY iGPU (Intel Core i7-4770 CPU resp. INTEL HD Graphics 4600 (1297MB)). It seems my iGPU can allocate sufficient VRAM which is different for Giorgios computer. Giorgio's error log states:

   Using OpenCL device "Intel(R) HD Graphics 520" by: Intel(R) Corporation
   Max allocation limit: 841357312

That's a reserved amount of ONLY ~800 MB of the total 16 GiB RAM for Giorgio's iGPU (Intel Core i3-6006U CPU resp. INTEL HD Graphics 520 (1604MB). This GPU string of Giorgio's iGPU states: "...HD Graphics 520 (1604MB)". But Giorgio's VRAM isn't ... up to 1604 MB ... but only ~800 MB. I don't know if this is somehow further limited by the Windows GUI which consumes a significant proportion of available (reserved) VRAM to display the (modern) Windows 11 GUI; I use an ancient Windows; all 3D GUI features disabled. Or it is some BIOS configuration at Giorgio's computer limiting VRAM for iGPU. I don't know.

So, the current bunch of FGRPB1G (LATeah3012L06...) tasks require up to ~925 MiB RAM resp. reserved VRAM within RAM. Example of a successfully finished task from my own iGPU HD4600:

   Peak working set size (MB): 923.56
   Peak swap size (MB): 922.51

That clearly exceeds their memory_bounds value which is set in the "LATeah3012.." workunits (during workunit generation). My iGPU can handle such larger mem allocation, Giorgio's iGPU can't. More specifically: Giorgio's setup of CPU/GPU and Windows 11 OS can't. So tasks fail on some types of iGPUs (maybe only for modern Desktop GUIs) unexpectedly.

I think some project admin may want to have a look at these memory_bound values. Remark: The same problem happened with the last bunch of O3MD1 CPU tasks which required way more RAM than previous ones. Their memory bounds value was also set too low, some months ago.