Hey everyone.
I've done some checking around the forums and I thought I saw a few other folks having the same problem, but I hadn't seen any resolutions since?
Every single work unit that I process for gravitational waves turns into a computational error once it gets to the 2:14-2:16 mark.
My computer is pretty solid and I process all kinds of data from a number of boinc projects. I usually let Einstein run on its own cause it takes up a little bit more processing power.. but does anyone have any type of explanation as to why this happens and/or how to fix it?
Thanks in advance and I'm sorry if this was answered before somewhere.
-Dan
Copyright © 2024 Einstein@Home. All rights reserved.
Dan V wrote: Hey
)
I believe those units require a gpu with 4gb of ram to process correctly, no idea why they still send them to gpu's that don't meet that. Try the Gamma Ray Pulsar tasks instead, they shoudl do very well on your gpu.
Dan V wrote:Every single work
)
The way to know why this is happening is to examine what gets sent back to the project with the failed result. Just go to your tasks list and pick a failed task that is listed and click the Task ID for the task in question. On the page that opens, start looking below the "Stderr Output" heading until you see words like "Error" or "failed", etc. For the failure I linked to, here is the specific line, with the very long (and irrelevant) path string replaced with "(...)".
XLAL Error - XLALComputeECLFFT_OpenCL (...): Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
In other words, your GPU doesn't have enough memory to hold all the data needed to allow the Fast Fourier Transforms (FFT) to be computed.
I am in the process of writing a very detailed guide about ways to process these tasks without having compute errors. I am using a 4GB RX 570 GPU to experiment with. It has a feature called UMA (Unified Memory Architecture - I think) which seems to mean that if a task exceeds the available GPU memory, it can pinch some system RAM - with the penalty that the crunching rate really slows quite a bit. However the tasks don't fail. My guess is that your nvidia card doesn't have a similar feature - I really don't know but someone else may comment.
The name of your failed task is "h1_1539.40_O2C02Cl4In0__O2MDFV2h_VelaJr1_1540.30Hz_411_1". In that name, there are two frequency terms - 1539.40Hz and 1540.30Hz. The delta frequency gap (DF) between the two is 0.90Hz. There is a direct relationship between the DF and the number of large data files that need to be stored in memory to allow the FFTs to be processed. The higher the DF the more data files are needed - some very large number of files are involved. So by looking at the DF value, you can know in advance if a task will fail or not. I have no nvidia cards to test on but I believe you may be able to process tasks where the DF value is 0.65 or less (maybe even 0.70 if you're lucky) since these will need a much smaller number of large data files to be stored in memory than your current failures.
In your 'in progress' list, you have some tasks with a DF of 0.70Hz. It would be interesting to know if these will run. Try one, and if it fails you should abort the others with the same DF. If you get a few more, you should start to get some with a DF of 0.65Hz or less. You could then try one of those. The DF will eventually go down to as low as 0.30 for the frequency bin (1539.40Hz) that you are currently processing.
Be sure to give us a report on how things go :-).
EDIT: The GPU in question here had 3GB VRAM. If anyone else with a 3GB card wanted to test this, please do so. It could even be that a DF of 0.75 might be OK. It would be good to know the cut off point. I've seen DFs between 0.25 and 0.95. For any given frequency bin, way more than half the issued tasks will have a DF of 0.75 or less. In general terms, the crunch time does not increase with the memory requirements but does have variations. For example (from rather limited testing), tasks with a DF of 0.95 actually crunch noticeably faster than tasks with a DF of 0.7 or 0.75.
We need a 'better' server-side method to limit the tasks going to 3GB GPUs. Until then, those owners wishing to support GW GPU crunching could just allow those tasks with a 'good' DF to run and simply abort those that would fail if allowed to proceed. It would be really good to know a 'safe' DF for those 3GB cards.
Cheers,
Gary.
Just a quick followup on my
)
Just a quick followup on my previous message. Although Dan V hasn't responded, at least he did allow the remaining GW GPU tasks to run. Those with a DF value of 0.75Hz or above all failed but those where the value was 0.70Hz, all completed successfully and validated.
So now we know. If you have a 3GB nvidia card, you will be able to process the current VelaJr tasks with a DF of 0.70 or below. This is in line with what I guessed would be the case as described in detail in the previous message. I suspect this value may be frequency dependent but it should work for frequency bins up to at least 1550Hz or so. To be safe at higher frequencies, it might need the limit for DF to be reduced to 0.65Hz.
Cheers,
Gary.
I had also been getting the
)
I had also been getting the out of GPU memory error:
CL_MEM_OBJECT_ALLOCATION_FAILURE.
See https://einsteinathome.org/task/966440317
But I think I should have 8Gb memory available; I have:
Card name: NVIDIA GeForce MX250
Manufacturer: NVIDIA
Chip type: GeForce MX250
DAC type: Integrated RAMDAC
Device Type: Render-Only Device
Device Key: Enum\PCI\VEN_10DE&DEV_1D13&SUBSYS_85E5103C&REV_A1
Device Status: 0180200A [DN_DRIVER_LOADED|DN_STARTED|DN_DISABLEABLE|DN_NT_ENUMERATOR|DN_NT_DRIVER]
Device Problem Code: No Problem
Driver Problem Code: Unknown
Display Memory: 10086 MB
Dedicated Memory: 1983 MB
Shared Memory: 8102 MB
I think the problem is related to a HP / Windows 10 operating system update last week. I suspect the update replaced the driver for the graphics card, so I've installed the latest driver from Nvidia. That doesn't seem to have worked, as since then, GW jobs don't even download to my laptop anymore. I probably messed something up, but can't figure out what.
mcz wrote:... Display Memory:
)
I don't know anything about this type of card but does the above imply that the device has 2GB but can access main system memory (shared) once the device memory is full? If so, perhaps memory accesses above 2GB will be painfully slow due to the limitations of the PCIe bus? As I say, I don't know - I'm just guessing.
Did you do a 'clean install' and did you make sure the OpenCL libs were installed as well? I don't use Windows at all so can't give instructions but if you use the forum search function with the search term (exactly as written but minus the quotes) of "nvidia AND driver AND clean AND install" you're sure to find a whole bunch of previous examples of this problem and the instructions from the experienced Windows users who helped solve it. If you allow Microsoft to update your drivers you're bound to lose the OpenCL compute capability.
Cheers,
Gary.
Gary, Thanks for the info
)
Gary,
Thanks for the info and pointers -- I've been learning a lot from following your hints. For one thing, I finally figured out how to successfully do a clean install of the NVIDIA drivers (I think it helps to log in as administrator first, ha!). Haven't figured out about OpenCL libs yet. I'm not sure yet whether the install fixed the problem -- several more GW tasks failed today, but this evening a couple finally completed successfully, the first in a couple of weeks. But I see that the memory call still thinks there's only 2Gb of memory available instead of 8Gb "available", so maybe these only succeeded because they need less than 2Gb memory. However, I'll note that one of the successful jobs and one of the failed jobs had the same DF of 0.7 (frequencies were 1646.9 and 1647.6 for both jobs), so maybe need they needed the same amount of memory, implying my problem is fixed. Not sure; here's hoping...
I also finally figured out that there's a 16-task limit per day, which explains why I wasn't getting tasks. Duh. A good idea when every job was failing.
Anyway, I appreciate the help. If I figure out anything more useful, I'll post later.
--Martin
There is an issue with the
)
There is an issue with the Nvidia implementation of their OpenCL 1.2 API. The API limits the amount of available RAM to about 25-27% of total available RAM on the card.
So a 8GB card only allows use of a little over 2GB of memory. This what Ricks-Lab application gpu-ls --clinfo reports for my RTX 2080 card that has 8GB of memory.
Device OpenCL C Version: OpenCL C 1.2
Device Name: GeForce RTX 2080
Device Version: OpenCL 1.2 CUDA
Driver Version: 440.100
Max Compute Units: 46
SIMD per CU: None
SIMD Width: None
SIMD Instruction Width: None
CL Max Memory Allocation: 2092515328
Max Work Item Dimensions: 3
Max Work Item Sizes: 1024 1024 64
Max Work Group Size: 1024
Preferred Work Group Size: None
Preferred Work Group Multiple: 32
Keith, Thanks for the
)
Keith,
Thanks for the info, interesting.
Looks like the clean install has made a difference, but doesn't solve everything. 10 of the last 13 GW tasks (since that install) completed correctly, while 3 still had the insufficient memory error. But much better than getting 0 out 13.
--Martin
Keith Myers wrote:There is
)
Are you sure is a % limit of total RAM? It more like a hardcoded limit of 2 GiB.
Like if they are still using 32bit memory pointers/mapping somewhere in OpenCL libs. Same happens with running 32bit apps on 64bit CPU & 64bit OS: all RAM can be used in general, but one 32bit app(process) can access only 2GiB of RAM max regardless of the total installed RAM volume.
CL Max Memory Allocation: 2092515328 is <2GiB (its 1.95 Gib)
No it is a percentage. If you
)
No it is a percentage. If you run an Nvidia card with 11GB of memory like a 1080Ti or similar, the API limits it at 25-27% and that gives you almost 3GB of memory to play with.
[Edit]
why is CL_DEVICE_MAX_MEM_ALLOC_SIZE never larger than 25% of CL_DEVICE_GLOBAL_MEM_SIZE only on NVIDIA?