Need HELP - getting many too many GPU WUs

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 70
Credit: 4502997165
RAC: 5582536
Topic 221701

I am a Newbie here, was doing SETI when they took it out from under me, so I decided to do Einstein.

 

I screwed around with Project prefs, using Generic. I changed "Run CPU versions of applications for which GPU versions are available: " from YES to NO, and started getting lots (unbounded, seemingly) of WUs for 2.07. I then changed it back to YES, where it is now, but it's still sending me lots of additional work, in the hundreds now.

 

WHAT can I do to stop this?

 

Thanks!

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 70
Credit: 4502997165
RAC: 5582536

I tried aborting the huge

I tried aborting the huge numbers of WUs, but they keep getting resent to me as "lost" WUs.

How can I stop this madness?

Must I reset the project? Or what?

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Hi ! Welcome. What are your

Hi ! Welcome.

What are your settings for work quota ? In the Boinc Manager - Options - Computing preferences...

"Store at least ___ days of work" and "Store up to an additional ___ days of work"

It would be good to start with extremely low values in there, until the project server learns how your computer will perform. Try something like "0.05".

This doesn't solve that 'lost tasks' problem, but additional forces are arriving to give help with that.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 70
Credit: 4502997165
RAC: 5582536

@RICHIE - thanks. I am

@RICHIE - thanks. I am getting overwhelmed with work. The defaults in the Default Project prefs were not touched by me, I think they are 0, 5 for days of work, add'l days of work. Should I change them to 0, 0.5?

I have been running for about 4 days now, successfully (or so I thought!), so the server should know me (?). The time estimates are in the ballpark.

 

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 70
Credit: 4502997165
RAC: 5582536

@RICHIE - by the way, how do

@RICHIE - by the way, how do I delete a message or thread? It's not obvious at all

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Jon Ravin wrote:@RICHIE - by

Jon Ravin wrote:
@RICHIE - by the way, how do I delete a message or thread? It's not obvious at all

It's not possible for a user to delete them. But you could send a private message to a moderator and ask him to delete them. For example Gary Roberts, https://einsteinathome.org/account/88912

But yes, I encourage you to adjust those default quota settings to much lower. It can take some time in the beginning until a good balance for the tasks flow will happen.

You could even set 'No new tasks' for now, until the current tasks in queue are mostly cruched... and then open the flow again.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109397596682
RAC: 35748395

Jon Ravin wrote:I screwed

Jon Ravin wrote:
I screwed around with Project prefs, using Generic. I changed "Run CPU versions of applications for which GPU versions are available: " from YES to NO, and started getting lots (unbounded, seemingly) of WUs for 2.07. I then changed it back to YES, where it is now, but it's still sending me lots of additional work, in the hundreds now.

Jon Ravin wrote:
I am getting overwhelmed with work. The defaults in the Default Project prefs were not touched by me, I think they are 0, 5 for days of work, add'l days of work. Should I change them to 0, 0.5?

Hi Jon, Welcome to Einstein!

I've picked a couple of quotes to respond to.  First of all, I'm guessing that the work cache settings of 0, 5 days probably would have been what you were using at Seti.  They are certainly not default values for someone starting at Einstein with no previous history at some other project.  The first value of zero says that your BOINC client should allow your work on hand to fall to zero and at that point should request 5 full days of work in one hit in order to 'fill up to the max'.  For any normal project, that's pretty crazy.  It just means that when you first started (with zero work), your client would have valiantly kept requesting and requesting in an attempt to get to the full 5 days worth.  With your very powerful GPUs, that's a hell of a big bunch of tasks - probably thousands rather than hundreds.

If you think you are getting too many tasks, just select 'No new tasks' until you work out the cause.  It's pointless to abort tasks without using that setting because your client will just keep asking for replacements.

The best work cache settings to use until you get a feel for how things work here is something like 0.1 days for the first setting and zero for the second.  Unless you really want your work on hand to oscillate up and down like crazy all the time, just use the minimum value to be exactly what you want.  Once you understand things, you can easily adjust that first setting upwards to give some protection against a work outage - something that's quite rare at this project.  You should be careful about going too high - the deadline is just 7 days for the GW GPU tasks.

There are lots of additional complications besides work cache size.  I'm going to attempt to explain those by creating a "Guide for Seti Refugees" sticky thread in the Getting Started forum.  There's quite a bit to cover but hopefully within the next 12 hours or so.

As far as deleting messages is concerned, there has to be a really good reason for that to happen.  There is an 'edit window' (an hour or two after posting) if you want to get rid of something you regret posting.  In this particular case, I don't see a good reason for deleting anything in this thread so far.  The 'fault' is not yours - you appear to be a 'victim of circumstances' that is probably happening to many others as well.  If anything, the 'fault' belongs to the project because the estimates for the two main GPU searches create a particular problem that can severely bite the unsuspecting new recruit.  More about that in the above-mentioned "Guide".

So, this thread will remain for others to learn from - by seeing the problem and what was recommended to fix it.

Cheers,
Gary.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 70
Credit: 4502997165
RAC: 5582536

@Gary - thanks for your

@Gary - thanks for your lengthy comment; I changed the numbers to 0.2, 0.1 3 days ago and things appear to have settled down - I have a few tasks (got the name straight now - NOT WUs!) waiting like 20-40 on each machine; I can handle that.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109397596682
RAC: 35748395

Jon Ravin wrote:... things

Jon Ravin wrote:
... things appear to have settled down - I have a few tasks (got the name straight now - NOT WUs!)

I'm very happy you got it under control.  Thanks for letting us know!

You can call them what you want - it'll be understood.  In the state file the entry for a task is called a 'result' - when a task is crunched, information about what happened is inserted into the result block and returned.  A WU (workunit) is a group of identical tasks, sent to different computers, sometimes referred to as a 'quorum'. 

When you look at your hosts on the website, you see two links - one to 'details' and the other to 'tasks'.  If you look at your tasks, you see information about those particular copies of WUs that have been sent to you.  The WU is really a collective term for all the identical copies that exist - both the ones that succeed and the ones that fail.  The tasks list on the website gives you quick links to both tasks, so you can see the stderr.txt log for a task, and WUs, so you can see what happened to every task making up a quorum.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.