Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062252642
RAC: 65313307

my test bench is using a

my test bench is using a vanilla version of boinc client that I compiled from the latest source (7.17.0). the only thing "edited" is the coproc file, and was done for testing purposes(as I mentioned, no GPU <3GB), using YOUR method. you don't trust your own methods? not that the results would be any different with another working client.

nice try deflecting though lol. it's hysterical to see you to try this hard to disagree with me when we are saying the same things LOL. do you have anything other than strawman arguments?

 

if you didnt catch the estimates that were pushed for tasks that ended up using more then 3GB, then you need to keep watching for them. Which is what I'm doing. it's only the test bench running 1 GPU with 1 WU with a resource share of 0, so it only has 1 WU at a time on the host. makes it easy to cross reference that the WU being processed is the same WU as in the log, since the log gets wiped every connection it seems. just yesterday they were sending out those big tasks that failed on my 3GB 1060. today they seem to be only sending out the <2GB ones. ah well. all I can do is wait I suppose.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062252642
RAC: 65313307

here we go, this is the kind

here we go, this is the kind of proof we're looking for. my hunch was right. the scheduler thinks the task only needs ~1800MB GPU ram, but this is one that actually tries to use like ~3200. either the scheduler is hard coded for this task type, or there's some bug in the code that estimates how much it needs.

screenshot proof with the scheduler log entry, nvidia-smi output showing full memory on a 3GB card, and corresponding WU task name shown. can be no doubt this is what is happening.


_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519320352
RAC: 14087

Ian&Steve C. wrote:here we

Ian&Steve C. wrote:
here we go, this is the kind of proof we're looking for. my hunch was right. the scheduler thinks the task only needs ~1800MB GPU ram, but this is one that actually tries to use like ~3200. either the scheduler is hard coded for this task type, or there's some bug in the code that estimates how much it needs. screenshot proof with the scheduler log entry, nvidia-smi output showing full memory on a 3GB card, and corresponding WU task name shown. can be no doubt this is what is happening.

Excellent work.  Please post this somewhere as a bug report, I don't think there's any tech guys in this thread.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062252642
RAC: 65313307

i already posted the info for

i already posted the info for "the people who matter" (Bernd) in the technical news forum in the relevant GW thread where he was talking about the scheduler. hopefully he sees it there.

someone could PM him directly, but I think he'll probably see it sooner or later since he was posting in that thread.

_________________________________________________________________________

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592299016
RAC: 771430

I have a question about what

I have a question about what is better for the project. I have a host with 2 GTX1060 3GB cards. Obviously all the high freq tasks bomb out in very short order, the rest process with no problem. It doesn't bother me because little time is wasted and a lot of good work is being done. 

Is the project better off with this host doing what it can or should I exclude that host from this search so the servers don't have to deal with the carnage? 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062252642
RAC: 65313307

personally I would just

personally I would just exclude that card from the GW search. the tasks that failed to run on my 3GB 1060, it would sit there cycling over and over not really doing anything, until it hit the timeout, all the while reporting 100%. very odd behavior. I think it's better for the project to not have to deal with all the resends which might go out to another 3GB GPU and have the same problem all over again, then be resent again. the scheduler doesnt do a proper job of estimating GPU memory required as outlined in my previous post. 

if you remove your 3GB card from the GW search, its just one less card causing resends of failed tasks. there seems to be a lot of them. nearly all of my GW tasks i've processed recently have been resends.

_________________________________________________________________________

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592299016
RAC: 771430

That's one view point and may

That's one view point and may very  well be valid but I want to hear what the project wants. 

Edit, I looked and that host has 431 valid tasks showing, 217 errors showing and 10 pending which will all validate. 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062252642
RAC: 65313307

erroring out 1/3 of your

erroring out 1/3 of your tasks seems like a bit much. hosts like that are probably the reason there are so many GW tasks needing validation and so many resends. this inevitably delays the science results from getting back to the project.

but if you want to hear from someone actually at the project, you might need to reach out to them directly. apparently they don't post on the user forums too often.

_________________________________________________________________________

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

 A GW GPU task has started on

 A GW GPU task has started on my GTX 1060 with its 3 GB Video RAM and seems OK also on GPU-Z. Memory used is 1962 MB. Task has completed and is pending. 11 GB of system memory used.

Tullio

Another task has completed and a third is running. Yesterday I had a cumulative Windows 1903 update on my PC, home edition.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Six of my GW tasks are

Six of my GW tasks are completed and waiting for validation. Their wingman uses a GTX 750 Ti board with 2 GB video RAM. Since GPU-Z says that the project uses 1962 MB, no wonder they all fail on the 750 Ti board.

Tullio

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.