Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Since changing the GTX 1050

Since changing the GTX 1050 Ti with a GTX 1650 I never had a single failure and I have many pending jobs. I had installed the GTX 1050 Ti since it had a Display port connector and my terminal had a VGA monitor cable. I could convert the VGA to Display port but not to the HDM1 connector of the GTX 1650 board. So I bought a new monitor with the HDM1 cable and now everything works fine.

Tullio

[TA]Assimilator1
[TA]Assimilator1
Joined: 22 Jan 05
Posts: 12
Credit: 189971366
RAC: 8560

My 2nd rig has errored all

My 2nd rig has errored all  Gravitational Wave search O2 Multi-Directional GPU v2.07 () windows_x86_6

WUs (4 before I stopped it), they were taking ages, and then I got this "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED". Bearing in mind my 2nd rig's GPU has nearly double the DP power of my main rig and it's not having problems. GPU driver was 19.9.1, updated it to 20.4.2 and no change.

Seeing as updating the driver made no difference, until/if I can fix this, I'm going to disable that app as it's a total waste of time & power atm for that rig :(.

Any ideas anyone?

Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RTX 3060Ti 8GB, Win10 64bit

2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win 7 64bit

Ken_g6 - TeAm Anandtech
Ken_g6 - TeAm A...
Joined: 18 Apr 05
Posts: 2
Credit: 33303618
RAC: 7561

I had a weird experience with

I had a weird experience with these.  Linux with a 1060, on a pre-400 driver.  I downloaded a bunch of GWs, they started running fine, then I got a bunch of pulsar work.  I was curious how the pulsar work would work, so I suspended the GWs that hadn't started yet.  The pulsar work also went fine - 2 at a time via an app_config.xml file.  But when those finished and it went back to the GWs, nothing worked!  I wonder if it related to my driver?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117791868571
RAC: 34678781

Ken_g6 - TeAm Anandtech

Ken_g6 - TeAm Anandtech wrote:
I wonder if it related to my driver?

No it's probably not.  You have 3 computers listed, all with GTX 1060 3GB GPUs.  Some tasks in the current VelaJr series require a lot of memory and will fail on 3GB cards.  Some will be quite OK.

This problem has been raised quite a few times in recent weeks by quite a few different people with low memory cards.  3GB 1060s have been quite common in the mix.

If you take a look at this response I gave to an example of this problem, you'll find information about how to tell which tasks will be able to crunch fine and which ones will run out of memory and fail.  The example links I gave in that response will probably not work because the results pointed to will have been removed from the online database - unless you're very lucky :-).

I had a look at one of your hosts (the one with the highest RAC) and saw successfully crunched GW tasks and just two compute errors.  The information in the above link will tell you why those two needed more memory than your card had.  The remainder of the GW tasks on that host were aborted.  If you look at that full list, most of them would have been able to be crunched successfully.  Just 4 would have failed.

However, unless you're prepared to examine tasks as they are sent and abort those that will fail, your best course of action is to run only the gamma-ray pulsar search until the memory hungry VelaJr tasks have finished.  Even then, I'm sure there will be repeat sessions with these tasks in the future when the current batch is finished.  We have yet to start on LIGO observation run 3 (O3) data.  The current work is still O2.

Cheers,
Gary.

[TA]Assimilator1
[TA]Assimilator1
Joined: 22 Jan 05
Posts: 12
Credit: 189971366
RAC: 8560

Err, so I assume that applies

Err, so I assume that applies to my AMD HD 7870 XT too then? (It is 3GB, not 2 GB as these normally are, it's a dev sample version).

And I assume they take so long because the grx card is having to swap out the work to system RAM? [edit] Maybe not, apparently E@H wasn't using UMA, is it now?

What's velajr? [update] Seems to be a type of WU
(I'll read your link now)

Looking at an errored task of mine (not an aborted one), I saw nothing about running out of memory. I did see this :-
Error occured on Thursday, July 2, 2020 at 18:43:03.

E:\DC\BOINC\Data\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.07_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location fd910932 in module C:\Windows\system32\KERNELBASE.dll.

FD910932  C:\Windows\system32\KERNELBASE.dll:FD910932  DebugBreak

Whatever that means....
That WU does have a DF of 0.9Hz though. The other 2 WUs that finished at 21,16xs also have that exact same error.
The one that finished at 19,821s errored after installing a newer driver, so I assume that was responsible (it's error was

E:\DC\BOINC\Data\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.07_windows_x86_64__GW-opencl-ati.exe caused an Access Violation at location dbbb8e57 in module C:\Windows\system32\amdocl64.dll Writing to location 00000000.

Gary - would you rather I posted the above in the thread you linked?

Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RTX 3060Ti 8GB, Win10 64bit

2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win 7 64bit

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

I have download a bunch of GW

I have downloaded a bunch of GW GPU tasks after crunching many gamma-ray binary pulsar search. They seem to work OK, fingers crossed.

Tullio 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117791868571
RAC: 34678781

{TA}Assimilator1 wrote:Err,

{TA}Assimilator1 wrote:
Err, so I assume that applies to my AMD HD 7870 XT too then? (It is 3GB, not 2 GB as these normally are, it's a dev sample version).

Err, ... No :-).

As the "in reply to" information shows, I was only replying to Ken's comment.  In your case, I first saw your problem report in the Technical News section and I'd already given a full response there.  I had assumed (I know - it's bad to assume) that you would see the response there so wouldn't be expecting a further reply here :-).

To reiterate - your problem is that your card is GCN 1st gen.  BOINC lists it as "Tahiti" but in light of your current description of 7870 XT, I wondered if it might be Pitcairn.  I had a look here and sure enough, it is something called a Tahiti LE so I'm quite impressed that BOINC actually got it right :-).  This is all irrelevant, since the key point is that all of these variants are GCN 1st gen and so they just can't handle the GW GPU tasks, as I explained in the Technical News response.  Unfortunately, memory limitations aren't the issue in your case.

{TA}Assimilator1 wrote:
What's velajr? [update] Seems to be a type of WU

Yes, it's one of the fields in the full task name of current tasks.  E@H is trying to detect continuous GW emissions that are theorised to be emitted by massive, rapidly rotating neutron stars (pulsars).  There are three likely candidates, relatively nearby and the Vela pulsar is one of these.  The multi-directed search being used is targeting all three but the current (and most demanding) work is just looking at Vela - so you'll find that string in a task name.

{TA}Assimilator1 wrote:
Gary - would you rather I posted the above in the thread you linked?

No, not at all - because your problem isn't memory related.  I suggested it to Ken because his was.

Keith made a sensible suggestion about not posting your problem report in Technical News and I agree that it wasn't the best place to post it.  However, I answered it there because that's where I first saw it.

Technical News is used by staff to make announcements of all sorts.  When it is the announcement of a new or updated search, it is expected that anybody seeing an immediate problem should make the report about it there.  The Devs will be looking.  Once things have settled and become routine, they tend to stop paying attention.  At that point unless you're absolutely sure that the problem relates to the initial announcement, you're much better off posting a new report in the problems section where volunteers (not staff who are too busy) will give you the quickest response.

If there is an existing problem report about the same issue, by all means add your report to that thread - as long as you're sure it's the same issue.  If in doubt, the safest thing to do is to start a new thread.

I've been in the habit of starting 'Discussion' threads when major new or updated searches come along.  This thread is an example of that.  There are two particular problems with this thread.  The first is that the GW search has morphed several times as an initially untested and immature app was progressively 'fixed' as the teething problems slowly were identified and were dealt with.  You can see evidence of that in the thread title which I changed when the name of the search morphed at one point.  The comments in the earlier pages essentially relate to a totally different scenario than what we see now.

The second problem is that some people don't care if what they post is relevant to the threads purpose or not.  It's supposed to be about helping new people find out how best to participate in and cope with any particular oddities that the search may have.  Tips, etc, on how to best run the search on particular hardware.  Things to avoid that are specific to this search, etc.  Benefits or issues with running concurrent tasks.  Performance data showing what hardware works best.

People destroy that usefulness if they use it as a general chat thread.  I've already removed close to 100 such posts.  Instead of doing that, I've decided to spend my available time answering questions for which I'm pretty sure I have some knowledge about - like your question :-).  I have a Tahiti GPU and (long ago) I've already tried running GW tasks on it.  A definite fail!! :-)

Cheers,
Gary.

[TA]Assimilator1
[TA]Assimilator1
Joined: 22 Jan 05
Posts: 12
Credit: 189971366
RAC: 8560

As the "in reply to"

As the "in reply to" information shows, I was only replying to Ken's comment. In your case, I first saw your problem report in the Technical News section and I'd already given a full response there. I had assumed (I know - it's bad to assume) that you would see the response there so wouldn't be expecting a further reply here :-).

Hi, thanks for your reply :), I was keeping an eye on that news thread (somehow I missed your reply there!, sorry about that), also I was PM'ed by someone suggesting I shouldn't put a tech post in a news thread, I hadn't noticed it was the news section :o, also I didn't know that thread was referring to CPU tasks (as Keith mentioned).
Anyway that's why I posted here too, I was simply going to post to wherever you posted, but then I missed your reply, doh! And I initially thought you were indeed just replying to Ken, which I was puzzled by as I'd posted here 1st (2 pages at least), so I then thought you were doing a joint answer, but you had in fact replied to the other thread which I had missed, hence the ensuing confusion ;) :o, lol, sorry again for that.

...a Tahiti series which does have very good DP capability.  However, the GPU searches at Einstein don't need DP capability.  The problem is that it's a GCN 1st generation card that just can't handle the current GW search at all.  About a year ago, a number of us tested 1st gen cards and all found the same result - TIME LIMIT EXCEEDED - without even getting any significant "real" progress.  There were a number of messages about this at the time.

If you put that GPU on the gamma-ray pulsar GPU search it will do very well.  I run a couple of 7950s running 2 concurrent tasks and each pair of tasks get completed in about 25 mins or so.  Just change your prefs to exclude the GW search and set the FGRPB1G GPU search for it.  To not affect your other machine, you might put them in different 'locations' - aka 'venues'.

(I added your reply here in case others looking at this issue want to see the whole story in one).
Anyway, yea the 7870 XT is a Tahiti GPU :), plus mine is a developers sample with the full 3GB and memory bandwidth of the HD 7950/70, it's performance is virtually the same as an 7950 (which is what it was sold to me as on ebay!). I thought that E@H did use DP a lot, wasn't that the case in the past?


I haven't crunched E@H in a while, and I hadn't been here in years so I wasn't aware of those messages about GW and Tahiti, Google's 1st hits brought me here & the other thread. I've since disabled the grav wave search (good point about setting different locations and apps, I'll do that after the sprint is over, GW gives much less points :p). [edit] And now I can't find where the apps are listed again! :/ Nm, found it in preferences>preferences>project!, I wished they'd left the default BOINC setup :p

Odd that the Tahiti's are particular bad at a certain E@H app, I guess it's lacking a particular h/w function, and I guess it's not a short answer or easy to explaining that ;).
And I also guess that it's not easy or possible to get BOINC to not send certain app's WUs to incompatible GPUs?

Thanks for your help :) Tahiti's still going strong (ish, ;)) after 8+yrs!

Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RTX 3060Ti 8GB, Win10 64bit

2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win 7 64bit

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

i think that if they want to

i think that if they want to find a source of continuous GW emission they should look for a pulsar with a decreasing frequency. This would mean that a part of the rotational energy of the neutron star is being transferred to GW.

Tullio

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230351493
RAC: 1161840

A few hours ago Bernd posted

A few hours ago Bernd posted on the Technical News forum a notification that the GW GPU search is now suspended while the memory issue is investigated.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.