I've about ~10, sometimes more, Pulsar Search # 1. 0.23 and Gravitational Wave S6GC search 1.01 (SSE2),
pauzed, while another 6 are running?
On this host.
.
ATI GPU's, 2x EAH5870, aren't used, a.t.m.
Are these WU's meant to run on (NVidia)GPU(s), or is there some other explanation?
Copyright © 2024 Einstein@Home. All rights reserved.
Too much pauzed Gamma-Ray Pulsar Search # 1.03 and Gravitational
)
Any BOINC messages upon suspending them?
MrS
Scanning for our furry friends since Jan 2002
RE: Any BOINC messages upon
)
No just pauzed, 10 WUs, now crunching another 6.
Some were just running (22%) and some are almost finished (96%)?!
I've seen this before, but last days, they've run with Rosetta and the
high RAM use can have some impact on this?
Well I really haven't a clue.....
Ehh and forgot to ad the use of an NVidia GPU! (Although overwritten by
BOINC(6.12.34;64bit) on another host.
It's host 4274431, an I7-2600
with 2 ATI 5870 GPUs. It does only CPU tasks, atm.
RE: No just pauzed, 10 WUs,
)
For the tasks that are running, are they running in high priority mode? When panic mode is invoked, BOINC will often suspend the tasks in progress and start a whole bunch of new tasks. It's quite counter-intuitive as the newly started tasks often seem to be in much less deadline stress than the 'closer-to-deadline' suspended tasks. I think BOINC actually does know what it is doing despite the apparent paradox.
If tasks are running in panic mode, you need to find out why.
Cheers,
Gary.
RE: It's quite
)
I agree with Gary's diagnosis, but not his prognosis.
Fred, your report reminds me of things I've seen when boincmgr thought I was in deadline trouble on a project. Sometimes it was right, and sometimes it was wrong because something had bumped up the completion time estimate way too high (such as suddenly thinking my host slow because of one fouled-up result, or thinking I was not going to use it at a decent rate in the near future because of a few day's vacation).
But the really, really strange thing is to see just what you described--task after task started up, usually NOT from the ones needed soonest, and then shortly paused to run yet another. I've seen this get above several dozen, which is pretty odd.
I, personally, have dealt with such episodes with direct intervention, even though I'm usually on the "don't fiddle" side of the endless argument. My preferred intervention in this case is to put the unstarted jobs on the project in trouble on suspend, so that boincmgr can't continue the mad dash through the list. Then I put all save a few of the soonest due on suspend as well, then exit boincmgr and restart, thus avoiding the memory and possibly other overhead of all that paused work.
Oh, yes, I suspend work fetch just in case boincmgr should be tempted to add to the chaos--probably don't need to, but I do.
Sadly, this approach requires some continuing followup attention during recovery, taking some tasks out of suspend enough in advance that it always has work available, still selecting stuff due soon, until the available work as currently estimated by boincmgr is little enough not to pose a threat.
RE: RE: It's quite
)
They indeed are running in HighPriorityMode and are running with Rosetta and LHC work. Rosetta work can take up as many as >700MByte RAM per job.
That can have been the cause of the trouble.
There are still 11 Einstein WUs waiting, pauzed?!
HT turned on means 8 cores with 8 GByte DDR3 1333MHz DRAM. 90% RAM can be used
if the host is idle, but when you use it, memory use has drop to 60%, as I've set it in My Preferences.
I've already noticed, coying large files (>1GByte) across internal (SATA2) or
to USB 2.0 or 3.0, crawls, when BOINC is running and I use a fast 8GByte SD, as ReadyBoostCashe, whiich does help.
I've seen almost the full 8GiG (7.6GByte), in use. But only with 4 Rosetta WUs.
with 4 Einstein WUs.
Maybe you were running out of
)
Maybe you were running out of memory while you were not using the machine (someone else besides windows wanted to squeeze into the 10% left by BOINC)? In this case the computations would start to crawl, which would make them take much longer than estimated and might have pushed your BOINC into panic mode.. which, as others have already said, it's not very good at.
MrS
Scanning for our furry friends since Jan 2002