No Valid Results in over a week

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0
Topic 223988

I recently got a new video card and added  "Use AMD  GPU" to my profile 

 

Since then not a single 

Gravitational Wave search O2 Multi-Directional GPU v2.07 () windows_x86_64

job has completed successfully 

 

 

They run to 48000 seconds elapsed time (give or take 5 seconds and then end with an unspecified "error while computing"   I've looked at the logs, but nothing jumps out at me 

 

I have deselected that option for now 

 

 

Also, the project has sent me a TON of work that will never be completed by deadline   I currently have 583 tasks pending, most of which will be cancelled by server later today 

 

Any help greatly appreciated 

 

(apologies if this comes up as a duplicate post    I posted the same comments a while ago, but don't see them on the boards)

 



Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Hi! Neither of your GPU's

Hi!

Neither of your GPU's will be able to run tasks from 'Gravitational Wave search O2 Multi-Directional GPU' succesfully.

"AMD Radeon HD 7800 Series (2048MB)" is not compatible with the GW GPU app at all. The chip is too old.
In addition to that, both of your GPUs have 2GB of memory. That is currently not enough memory for a GW GPU task.

Your should direct both of your GPUs to run tasks from only 'Gamma-ray pulsar binary search #1 on GPUs'. Both of the GPUs should be able to run that work just fine.

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0

Thanks   I've made the

Thanks

 

I've made the change   

 

I guess I should just go ahead and abort the others. since they will either time out or end with errors 


Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

keputnam wrote:I guess I

keputnam wrote:
I guess I should just go ahead and abort the others. since they will either time out or end with errors 

That would be alright. Cancelled tasks will disappear relatively quickly from your task list. In the future it would always be a good thing to set a somewhat lower work cache before trying out a new type of work (or for new type of hardware). But no harm done. It happens. Nice November for you I wish from this side of the planet.

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0

The scheduler did it

The scheduler did it again 

 

I have, today, downloaded over 800 WUs with an estimated 3hr run time

 

The deadline is 12/27    There is no way in HELL that I will complete all of those within deadline

 

WTF?


San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10209623455
RAC: 22866833

keputnam wrote: ... 

keputnam wrote:

...  downloaded over 800 WUs ...

My two cents (you probably know all the following, but I'll try anyway):

You have nerver mentioned the size of your caches.

 

I mean the settings in BOINC Manager under "Options",

then under "Computing prefernces ...",

then check what is set under section "Other" on the line "Store at least x" and on the line "Store up to ... x".

Try setting 0.01 for both and then do

"Save" and then under

same "Options" tab click on "Read local prefs file".

 

Maybe this will stop the wild over-downloadig of tasks.

You can then slowly increase the cache size till you're happy with the amount of downloading.

 

Have a nice Sunday.

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0

My cache is 1.0/.5 which

My cache is 1.0/.5 which works fine with all the other projects I run

 

I'll try cutting that in half and  see  what happens 

 

Thanks for the suggestion 

 


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117671965987
RAC: 35161739

keputnam wrote:The scheduler

keputnam wrote:
The scheduler did it again

No, it didn't.

keputnam wrote:
I have, today, downloaded over 800 WUs with an estimated 3hr run time

Either you didn't have a cache size of 1.0/0.5 at the time the work request was made, Or the estimate was NOT 3 hrs at the time of the work request (but much, much smaller), Or perhaps there is some sort of impact from the fact that you have 2 very dissimilar GPUs installed and BOINC seems confused about which is the better performer.

keputnam wrote:
There is no way in HELL that I will complete all of those within deadline.

Actually, you probably could if you properly configured your machine, but that would probably involve putting other projects on hold and I'm not suggesting you do that.

Please understand that the scheduler just responds to the request it is given and (if it can) delivers exactly what was asked for and nothing more.  If you consider 800 tasks to be manifestly excessive, it's up to you to figure out why your client made that excessive request.

Here are a couple more things for you to consider.  Your machine shows as having both an nvidia GT 1030 and an AMD 7800 series - perhaps 7850 or 7870.  Whatever it is, it would be more powerful than a GT 1030.  If you had bothered to check a random task that took 3 hours or if you had even just looked at BOINC Manager to see what app was running, you could have seen thisGamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl-nvidia)which shows that BOINC was using just the much poorer performing nvidia GPU and NOT your much better AMD GPU.

BOINC is supposed to detect and use the most capable GPU when you have a mixture like this.  For some reason it seems to be identifying the wrong one.  This is nothing to do with the Einstein scheduler.  It believes what BOINC tells it.  If you check BOINC configuration options for <use_all_gpus> you will see how to configure BOINC to use both.  There is also an <exclude_gpu> option where you could exclude the weaker GPU if you wished.  You do need to read the documentation before making complaints that are incorrect.

To give you some idea of the performance difference, a HD 7850 GPU can crunch a GRP task in about 20 mins.  That's a hell of a lot better than 3 hrs.  At 20 mins, a 14 day deadline actually represents a total of 1008 tasks which is why (tongue in cheek) I mentioned that it would be possible to get rid of 800 without aborting them.

A final comment about your 1.0/0.5 day combination.  This means that initially BOINC will 'fill up' with 1.5 days worth and then wont ask again until the remainder is less than 1 day.  Then it will take a big drink - back to 1.5 days total.  If something is way too low with task estimates, that big drink could be an excessive number of tasks for a later time when the estimate is corrected. I always think it is safer to pick a suitable size for just the first setting (with the second at zero) which helps avoid any possibility of a 'big drink' event.

Cheers,
Gary.

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0

"Either you didn't have a

"Either you didn't have a cache size of 1.0/0.5 at the time the work request was made, Or the estimate was NOT 3 hrs"

 

Both are incorrect statements   Of course I can't prove it, but they are 

Don't want to use  the ATI card, because one of my other projects is, quite happily

I have "NO AMD" set in preferences on the project page Have had since I added the NVIDIA card   I tried adding the excude_GPU option, to app_config.xml,  but for some reason can't make it work    I'll try again in the morning 

 

BOINC is running FGRPopencl-nvidea

 

I'll try setting cache to .5/0  but won't see any real effect til the project cancels the ones that won't complete before deadline  12/27

 

 

I apparently nickel and dimed myself with that NVIDIA GPU, and plan on "gifting" myself with a much more powerful one for Christmas 

 

 

 


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117671965987
RAC: 35161739

keputnam wrote:"Either you

keputnam wrote:

"Either you didn't have a cache size of 1.0/0.5 at the time the work request was made, Or the estimate was NOT 3 hrs"

 

Both are incorrect statements   Of course I can't prove it, but they are

Unfortunately, these two of the alternatives I suggested are correct descriptions of possible ways you were sent 800 tasks.  There is nothing "incorrect" about them.  It's up to you to work out if either were applicable to your problem.

What you don't seem to understand is that the scheduler cannot send you work you didn't ask for.  Your client must have asked for that amount of work for some reason.  Period.

It's really in your best interests to work out why this happened so that you don't get caught again in the future.  It could even be something really trivial like missing a decimal point - eg. instead of 1.0/0.5 you might have had 10/0.5.  I'm NOT saying this was the case but I am saying you should stop blaming the scheduler and try to find the real reason.

Ultimately, a full cache setting of 1.5 days should be fine if that's what you'd like to have.  Just set it at 1.5/0 rather than 1.0/0.5 if you want to avoid sudden bursts of downloading new work at odd times.  If you're fiddling with cache sizes, don't make changes in big steps.  Much safer to make a partial change and observe what happens.

keputnam wrote:
...  but won't see any real effect til the project cancels the ones that won't complete before deadline  12/27

The project doesn't cancel tasks - you should abort the excess so that you don't force your 'wingmen' to wait till possibly the new year for replacement tasks (issued after current deadlines expire) to eventually be returned.  If you are allowing the GT 1030 to crunch at ~3hrs a task, you could finish around 8 tasks per day for the next 12 days so say 100 tasks max.  If you have ~800 left, just choose the earliest deadline ~700 tasks and abort them so that they can be reissued ASAP.  If your 'Chrissie Pressie' new GPU is nvidia, and you could install it soonish, you might get through the whole bundle, if you're quick :-).

What sort of GPU were you thinking of getting?  If you quote the model, people would quickly tell you the expected output in tasks per day.

Cheers,
Gary.

keputnam
keputnam
Joined: 18 Jan 05
Posts: 47
Credit: 84110314
RAC: 0

"The project doesn't cancel

"The project doesn't cancel tasks - you should abort the excess so that you don't force your 'wingmen' to wait till possibly the new year for replacement tasks (issued after current deadlines expire) "

 

Last time I tried that I kept getting replacement WUs   Guess I could set No New work until the new card is installed

 

For my new NVidia card I was looking at a GTX1050 Ti 4GB


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.