The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,379,323,473
RAC: 2,871,445

cecht wrote:Things are

cecht wrote:
Things are rolling now! As of now, 22 of my V1.07 tasks have been validated (81 pending, 0 invalid), BUT everything validated overnight (starting 15 Aug. CMT) was done so with a minimum quorum of 1, instead of the usual 2.

Nothing doing here.  I've currently got 34 pending 1.07 tasks, some each run at 2X, 3X, and 4X.  Zero invalid.  Zero valid.

Maybe tasks sent out more recently than any of mine yet returned have the minimum quorum 1 set (is this set at dispatch, or perhaps they can revise it later?).  Next time I switch back from GRP to GW I'll prioritize some most recently sent work.

Anyway, I've turned on a second 570 machine and downloaded some fresh work just now.  This one has a rather different CPU (just two hyperthreaded cores.)  Again Windows 10, and again an AMD 570.  Running 2X.

Surely sometime soon I'll get either a validation or an invalid result.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 1,837

Gary Roberts wrote:Matt White

Gary Roberts wrote:
Also, I just looked at your host and the GPU is identified as an RX 460 which is a bit slower again than the 560.  Are you running 2 tasks concurrently?

That makes sense, I don't know why I was thinking I had a 560. Must be getting senile. Yes, I am running both GPU's at a utilization factor of .5, two tasks per processor.

The two NVIDIA tasks I had waiting in the pending queue, validated overnight. The GT1030 crunches at half the rate of the 460.

On another note: Ubuntu did an update yesterday, and I lost access to my desktop, a conman occurrence from what I have read. I managed to reload the AMD drivers through the command line and get the box crunching again, but I'm still fighting the login loop trying to get to the desktop. Fortunately, I can still control BOINC using BOINC Tasks from this machine. I'll do some more investigating and deal with the desktop issue later.

Clear skies,
Matt
cecht
cecht
Joined: 7 Mar 18
Posts: 719
Credit: 793,362,086
RAC: 497,963

Matt White wrote:On another

Matt White wrote:
On another note: Ubuntu did an update yesterday, and I lost access to my desktop, a conman occurrence from what I have read. I managed to reload the AMD drivers through the command line and get the box crunching again, but I'm still fighting the login loop trying to get to the desktop. Fortunately, I can still control BOINC using BOINC Tasks from this machine. I'll do some more investigating and deal with the desktop issue later.

I had that login problem on Ubuntu 18.04 after I uninstalled amdgpu drivers in preparation for updating drivers. I got it fixed without reinstalling Ubuntu, but can't precisely recall the relevant steps. If you're feeling adventurous, it went something like this:
1) From the stalled Ubuntu login screen, I logged into my user account by calling up a terminal window by Ctrl-Alt-F3 (or F1 or F2, see what works)
2) From the Terminal, I installed an alternate to the GNOME Display Manager, lightDM, with
sudo apt-get update && sudo apt-get install lightdm
I think I was then given the option to login with lightdm and that login worked. At some point from the Terminal, I switched screens back to the first Ubuntu login screen with Alt-right arrow, or Ctrl-Alt-right arrow or something like that, but don't recall what I did once I got back there (i.e. whether lightDM login was an option at that point, sorry :(
3) Once logged in to my Desktop, I didn't particularly like the lightDM Display Manager. I think it was then that I went to Settings application ->Details ->Users and saw that my automatic login was unchecked. I moved the slider to auto login and everything was fine after a reboot. But BEFORE rebooting, I uninstalled lightDM with sudo apt-get remove lightdm. You may need to unlock the Settings app (upper right corner) to change that slider if it's greyed out. Everything has worked as expected since then.
4) Some web pages suggest getting out of the login loop by changing ownership for .Xauthority from root to you, $USER. I didn't have that file at the time, although it's in my home directory now.(?) I did, however, change ownership of .ICEauthority by
~$ sudo chown craig.craig .ICEauthority
but am not sure whether that helped. You can also change .Xauthority the same way.
4) Good luck, LOL

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,379,323,473
RAC: 2,871,445

archae86 wrote:Surely

archae86 wrote:
Surely sometime soon I'll get either a validation or an invalid result.

Well, the result is in and the winner is that my first 1.07 task to get a reported final disposition was found to be invalid.  The comparison was with two Linux CPU runs on 1.01.  Base on elapsed time, this was a task which ran at 4X.

I currently have 51 1.07 tasks pending, most from the first, and a few from a second Windows 10 570 box.  I'm afraid the portents don't look very favorable to me at the moment.  The second box, which has only a two-core CPU, is running 2X, so if somehow 4X was an issue in my reported invalid task maybe it will fare better.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 1,837

Nothing invalid, yet, about

Nothing invalid, yet, about 10 validated, the balance (about 12) pending. I have about 67 in the wu queue. At the pace my two boxes crunch these, that is going to take some time. I got a bit concerned when I saw another invalid task show up, but it turned out to be one of the old 1.01 CPU tasks.

Cecht, I solved my desktop issue by purging all the drivers and reinstalling. I purged the unused NVIDIA drivers while I was at it. Thanks for the input, though.

Somehow, in the process, the file BOINC writes in the user directory, got corrupted. I deleted it, and BOINC Manager came back up, all nice and happy.

 

 

Clear skies,
Matt
Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 380
Credit: 201,949,179
RAC: 5,942

On my RX 570 (Win7), I have

On my RX 570 (Win7), I have two invalids and ten pending.

https://einsteinathome.org/host/12599270/tasks/5/0

And both the invalids were validated by Linux machines.  So it appears that there may still be a GPU-CPU validation problem.

 

solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 316

Matt White schrieb:Nothing

Matt White wrote:

Nothing invalid, yet, about 10 validated,  ...

 

Those ran with a quorum of one, so probably no wonder there. Same here.

 

On another issue: do you happen to know why the run time of your tasks is more than twice as long as the cpu time?

 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,946
Credit: 201,612,376
RAC: 47,920

A bit of information that

A bit of information that might help clear up some misconceptions about validation

- For the OSAS search we are using BOINC's "adaptive replication". This means that only one in ten results sent to "trustworthy" hosts is actually replicated for ultimate comparison (quorum of 2). In most cases, it is accepted as it is (quorum of 1).To prevent cheating, for "adaptive replication" applications, the BOINC web code only reveals the quorum for a workunit after validation.

- On E@H we (normally) ensure that a "Beta test" app version is "distrusted", i.e. a task being assigned to such an app version enforces a quorum of 2.

- Furthermore, a workunit should only have one one "Beta test" task at max. This ensures that "Beta test" (in O2AS: GPU) app versions are always validated by "established" (here: CPU) app versions. As the Beta test app versions usually run faster than the established versions, you'll have to wait a long time. Also too this restriction may lead to a limitation of the supply of workunits available for Beta test app versions, if all available workunits have already a taks assigned to a Beta test App (as we recently observed in BRP4).

- Due to an unexpected delay in updating the DB some 1.04 and 1.07 app versions erroneously weren't treated as "Beta test", which lead to ~300 tasks being accepted without comparison (quorum of 1) or after (successful) comparison to another GPU app version. This wasn't intentional and shouldn't happen again.

BM

cecht
cecht
Joined: 7 Mar 18
Posts: 719
Credit: 793,362,086
RAC: 497,963

Matt White wrote:...Cecht, I

Matt White wrote:

...Cecht, I solved my desktop issue by purging all the drivers and reinstalling. I purged the unused NVIDIA drivers while I was at it. Thanks for the input, though.

Somehow, in the process, the file BOINC writes in the user directory, got corrupted. I deleted it, and BOINC Manager came back up, all nice and happy.

Those two tricks are good to know. Thanks for sharing.  I'm glad your system is healthy and happy again.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 719
Credit: 793,362,086
RAC: 497,963

archae86 wrote:archae86

archae86 wrote:
archae86 wrote:
Surely sometime soon I'll get either a validation or an invalid result.

Well, the result is in and the winner is that my first 1.07 task to get a reported final disposition was found to be invalid.  The comparison was with two Linux CPU runs on 1.01.  Base on elapsed time, this was a task which ran at 4X.

I currently have 51 1.07 tasks pending, most from the first, and a few from a second Windows 10 570 box.  I'm afraid the portents don't look very favorable to me at the moment.  The second box, which has only a two-core CPU, is running 2X, so if somehow 4X was an issue in my reported invalid task maybe it will fare better.

I just looked at your v1.07 results for computer 10706295 and see that application listed as "Continuous Gravitational Wave search O2 All-Sky v1.07 () windows_x86_64".  Shouldn't that be "Continuous Gravitational Wave search O2 All-Sky v1.07 (GW-opencl-ati) windows_x86_64"?

Of my v1.07 (GW-opencl-ati) Linux tasks that have been validated by a Windows wingman, there has been one  v1.07 (GW-opencl-nvidia) GPU validation, the rest were v1.01 CPU validations. Nothing from a Windows GW-opencl-ati machine yet. Hmmmmmm.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.