Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3960

Credit: 47061212642

RAC: 65302771

my test bench is using a

6 May 2020 20:12:18 UTC

Message 177561

(moderation:

)

my test bench is using a vanilla version of boinc client that I compiled from the latest source (7.17.0). the only thing "edited" is the coproc file, and was done for testing purposes(as I mentioned, no GPU <3GB), using YOUR method. you don't trust your own methods? not that the results would be any different with another working client.

nice try deflecting though lol. it's hysterical to see you to try this hard to disagree with me when we are saying the same things LOL. do you have anything other than strawman arguments?

if you didnt catch the estimates that were pushed for tasks that ended up using more then 3GB, then you need to keep watching for them. Which is what I'm doing. it's only the test bench running 1 GPU with 1 WU with a resource share of 0, so it only has 1 WU at a time on the host. makes it easy to cross reference that the WU being processed is the same WU as in the log, since the log gets wiped every connection it seems. just yesterday they were sending out those big tasks that failed on my 3GB 1060. today they seem to be only sending out the <2GB ones. ah well. all I can do is wait I suppose.

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3960

Credit: 47061212642

RAC: 65302771

here we go, this is the kind

6 May 2020 22:06:18 UTC

Message 177563

(moderation:

)

here we go, this is the kind of proof we're looking for. my hunch was right. the scheduler thinks the task only needs ~1800MB GPU ram, but this is one that actually tries to use like ~3200. either the scheduler is hard coded for this task type, or there's some bug in the code that estimates how much it needs.

screenshot proof with the scheduler log entry, nvidia-smi output showing full memory on a 3GB card, and corresponding WU task name shown. can be no doubt this is what is happening.

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519320352

RAC: 14087

Ian&Steve C. wrote:here we

6 May 2020 23:09:53 UTC

Message 177567 in response to message 177563

(moderation:

)

Ian&Steve C. wrote:

here we go, this is the kind of proof we're looking for. my hunch was right. the scheduler thinks the task only needs ~1800MB GPU ram, but this is one that actually tries to use like ~3200. either the scheduler is hard coded for this task type, or there's some bug in the code that estimates how much it needs. screenshot proof with the scheduler log entry, nvidia-smi output showing full memory on a 3GB card, and corresponding WU task name shown. can be no doubt this is what is happening.

Excellent work. Please post this somewhere as a bug report, I don't think there's any tech guys in this thread.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3960

Credit: 47061212642

RAC: 65302771

i already posted the info for

6 May 2020 23:45:07 UTC

Message 177568

(moderation:

)

i already posted the info for "the people who matter" (Bernd) in the technical news forum in the relevant GW thread where he was talking about the scheduler. hopefully he sees it there.

someone could PM him directly, but I think he'll probably see it sooner or later since he was posting in that thread.

_________________________________________________________________________

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1592272350

RAC: 771030

I have a question about what

13 May 2020 22:07:55 UTC

Message 177748

(moderation:

)

I have a question about what is better for the project. I have a host with 2 GTX1060 3GB cards. Obviously all the high freq tasks bomb out in very short order, the rest process with no problem. It doesn't bother me because little time is wasted and a lot of good work is being done.

Is the project better off with this host doing what it can or should I exclude that host from this search so the servers don't have to deal with the carnage?

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3960

Credit: 47061212642

RAC: 65302771

personally I would just

13 May 2020 23:24:19 UTC

Message 177749

(moderation:

)

personally I would just exclude that card from the GW search. the tasks that failed to run on my 3GB 1060, it would sit there cycling over and over not really doing anything, until it hit the timeout, all the while reporting 100%. very odd behavior. I think it's better for the project to not have to deal with all the resends which might go out to another 3GB GPU and have the same problem all over again, then be resent again. the scheduler doesnt do a proper job of estimating GPU memory required as outlined in my previous post.

if you remove your 3GB card from the GW search, its just one less card causing resends of failed tasks. there seems to be a lot of them. nearly all of my GW tasks i've processed recently have been resends.

_________________________________________________________________________

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1592272350

RAC: 771030

That's one view point and may

13 May 2020 23:56:29 UTC

Message 177752 in response to message 177749

(moderation:

)

That's one view point and may very well be valid but I want to hear what the project wants.

Edit, I looked and that host has 431 valid tasks showing, 217 errors showing and 10 pending which will all validate.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3960

Credit: 47061212642

RAC: 65302771

erroring out 1/3 of your

14 May 2020 0:01:45 UTC

Message 177753

(moderation:

)

erroring out 1/3 of your tasks seems like a bit much. hosts like that are probably the reason there are so many GW tasks needing validation and so many resends. this inevitably delays the science results from getting back to the project.

but if you want to hear from someone actually at the project, you might need to reach out to them directly. apparently they don't post on the user forums too often.

_________________________________________________________________________

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

A GW GPU task has started on

14 May 2020 10:22:06 UTC

Message 177758

(moderation:

)

A GW GPU task has started on my GTX 1060 with its 3 GB Video RAM and seems OK also on GPU-Z. Memory used is 1962 MB. Task has completed and is pending. 11 GB of system memory used.

Tullio

Another task has completed and a third is running. Yesterday I had a cumulative Windows 1903 update on my PC, home edition.

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

Six of my GW tasks are

14 May 2020 12:53:53 UTC

Message 177763

(moderation:

)

Six of my GW tasks are completed and waiting for validation. Their wingman uses a GTX 750 Ti board with 2 GB video RAM. Since GPU-Z says that the project uses 1962 MB, no wonder they all fail on the 750 Ti board.

Tullio

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner