Does anyone know why I get these errors, on Einstein GPU tasks, on my laptop's Quadro FX 3800M?
The machine is admittedly strapped for RAM at the moment, and I'll upgrade that in the future, but... why do some of the CUDA tasks fail with this error, but some process successfully??
http://einsteinathome.org/host/10383741
Please help.
7.3.15
The window cannot act on the sent message.
(0x3ea) - exit code 1002 (0x3ea)
Activated exception handling...
[21:35:17][976][INFO ] Starting data processing...
[21:35:18][976][ERROR] Failed to enable CUDA thread yielding for device #0 (error: 2)! Sorry, will try to occupy one CPU core...
[21:35:18][976][ERROR] Couldn't acquire CUDA context of device #0 (error: 2)!
[21:35:18][976][ERROR] Demodulation failed (error: 1002)!
21:35:18 (976): called boinc_finish
]]>
Copyright © 2024 Einstein@Home. All rights reserved.
CUDA errors - The window cannot act on the sent message. (0x3ea
)
Are you leaving a cpu core free while the gpu crunches? Either that or are you getting an error in Windows too?
I am not getting any Windows
)
I am not getting any Windows errors. And I am not presently leaving a CPU core free for the GPU, do you honestly believe that would cause this error?
This honestly seems more like either an application error, or a CUDA programming resource allocation error... doesn't it?
Looks like you got several of
)
Looks like you got several of them last week but got some good ones after the 7th
http://einsteinathome.org/host/10383741/tasks&offset=0&show_names=1&state=5&appid=0
Never had the error myself but saw this http://www.errorfixes.net/1002-0x3ea.php
Thanks. I've done preliminary
)
Thanks. I've done preliminary research as well. I'm fairly positive that the problem I'm having is not due to an operating system installation error.
It is possible that it could be caused by either a) System runs out of memory maybe (It only has 4 GB, but is set to start 8 CPU tasks alongside 1 non-CPU-intensive task alongside 1 GPU task), ... or b) Maybe it gets too hot (but in that case I'd expect different types of errors, not the same one all the time), ... or c) Some error in the application itself (since I haven't noticed any SETI errors on that same GPU)
Is there any way an Einstein application developer could chime in with his/her opinion here?
Thanks,
Jacob
Edit:
Hmm... Looks like this laptop has had similar problems with SETI and SETI Beta. So, perhaps the error is in the drivers? Sometimes I restart the laptop and it goes better for a while.
http://setiathome.berkeley.edu/result.php?resultid=3447519609
http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=16477974
Edit:
Looks like 337.50 was actually released for this mobile Quadro GPU, so.. I'll be upgrading the drivers, and testing with them. If the issue resurfaces, I'll report back, but it sounds like a driver error the more I think about it.
RE: Thanks. I've done
)
The last SEVERAL Nvidia driver releases have NOT benefited crunchers, but HAVE benefited gamers instead. Some people are reporting a 10% decline in their rac after upgrading.
Yes, I know. But this thread
)
Yes, I know. But this thread is about a specific error, and not about performance. Also, the latest Quadro 337.50 drivers actually do include quite a bit of new functionality that's not meant for gaming.
I just wanted to chime in,
)
I just wanted to chime in, with an update to this issue.
I updated the Laptop's memory... boosting it from 4 GB RAM, to 20 GB RAM. I believe this has fixed the problem, as I haven't had any task failures since the upgrade. Though, I don't know why it wasn't working properly with just 4 GB RAM.
RE: I just wanted to chime
)
Hmmm since that isn't an option for most people maybe you can work with someone and setup a test to recreate the problem and find a workaround thru Boinc.
I'm not sure it's that easy
)
I'm not sure it's that easy to replicate/test.
I think what is happening is that Windows does not have enough free memory at the time that CUDA performs its mallocs (memory allocation requests), causing the CUDA app to fail.
It might have something to do with running VM tasks (from Test4Theory and Climate@Home) on that laptop, not sure. Previously I had found/reported/fixed a memory issue with VM tasks (where BOINC was overcommitting memory while VM tasks were running), but this was fixed in the latest public release of BOINC and the latest alphas. But now I'm wondering if there might still be some lingering issue still, like when a VM task is paused, or started and waiting to run.
I doubt I'll pull the RAM back out to do additional testing, unless I happen to find tons of extra time (and patience) to do so.
Regards,
Jacob
RE: I'm not sure it's that
)
I was just thinking that Boinc should have recognized the problem and said so instead of just keep trying. Someone else may get frustrated and not crunch, when it is just a minor problem with some units at some projects. And yes it could be a long process to figure out what went wrong where, and to track and log it. I guess it could be a project problem too, it not recognizing that your gpu has too few resources available to crunch a unit as it starts up. This may even go back to the previous problem we discussed of how Boinc itself handles gpu's in general. If so then hopefully a fix is 'in the works'.