The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 316

DF1DX schrieb:Version 1.04

DF1DX wrote:
Version 1.04 is significantly faster than 1.03 yesterday.

Right, but two out of three tasks that I received with the first batch errored out after a few seconds. I don't know yet whether that's related to my Linux system or to the app.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,236
Credit: 44,741,177,177
RAC: 37,939,187

The date of the app on the

The date of the app on the applications page is about 2 hours ago.  DF1DX has 4 tasks dated about an hour ago.  The OS is Linux and the GPU is a 1050Ti.  No sign of any failed tasks yet so maybe it's something to do with your setup.  The 1.03 task times were about 27ksecs.  For the "significantly faster" comment, they must have made some appreciable progress (and not died quickly) so perhaps it's a problem at your end.

EDIT:  One other observation.  The 1.03 tasks from yesterday (4 of them) were all downloaded at pretty much the same time (just a minute or two apart) and then returned as a group within just a small number of minutes of each other.  It looks like all 4 were crunched concurrently.  It will be interesting to see if the same thing happens to the current batch.

Cheers,
Gary.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 63
Credit: 1,325,240,339
RAC: 1,828,920

Not here.My Linux host has

Not here.

My Linux host has been running for over an hour with 1.04. Estimated runtime 2.5 hours with 2 WUs simultaneously, Nvidia 1050Ti.

Version 1.03 took about 7,5 hours, but with 4 WUs at a time.

Edit: Gary was (as always) faster.

I just started the Radeon VII with 1.04. You need 4 WUs to reach a load of about 60%...

But since more than 2 WUs at the same time lead to a too high error rate, I have to consider whether the calculations will ever be valid.

 

solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 316

May well be, I'll wait and

May well be, I'll wait and see. :-)  The one 1.04 task that's still running is still doing fine. It will finish in about 2 hours from now and will then have taken about half of the time that 1.03 had taken. Devs did a good job! :-)

solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 316

My first 1.04 finished now,

My first 1.04 finished now, ran just great. No wingman yet. Got a second batch of these now, same as before - first task errored out after few seconds, next task running perfectly. Will switch now to fallback system for comparison. 

 

EDIT: On my Linux fallback system I got a batch of two tasks, no errors there - great!

On another observation, I'm noticing that the 1.04 tasks I downloaded in the last minutes are running a lot faster than the previous tasks. I don't know yet whether that's due to variations of the task parameters (I guess not) or some settings in my systems in a way I should better be aware of. :-)

 

EDIT 2: Now I figured out those speed differences. I used to run one O2-500 GW task and one FGRP gamma task at a time - that is what the systems don't like. If I leave one GW task alone it runs *clearly* faster. :-)

solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 316

Well, eventually all my 1.04

Well, eventually all my 1.04 tasks either errored out or ended up with validate error. Quicker than I could write Bernd has submitted the 1.05 beta. Got a batch of those now: running smoothly.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 380
Credit: 201,949,179
RAC: 5,942

An RX 570 (Win7 64-bit,

An RX 570 (Win7 64-bit, 18.9.3 drivers) finished a 1.05 work unit in 1 hour 25 minutes.  So that is an improvement in time.  The GPU load shows as 42% on GPU-Z; better than before.  But the power for the GPU is 42 watts, plus another 8 watts for the CPU core supporting it.  I think for power efficiency, it is about the same as a Haswell CPU core.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,379,357,658
RAC: 2,870,784

My Windows 10 system with an

My Windows 10 system with an AMD RX570 GPU and an Intel i5-9400F CPU and no other tasks ran a 1.05 task to completion in 1:12:54 elapsed time. It was running at 1X multiplicity. But it generated an immediate Validate Error.

Observations:

1. Far higher GPU load than on version 1.02 on this same system.  In the mid-section from about 2 minutes elapsed to time about 69 minutes elapsed time GPU load averaged 44%.

2. CPU usage was slightly greater than one--implying that of the five threads Process Lasso showed for this application there were some moments at which more than one thread was active.

3. GPU usage started very low, but abruptly climbed to near the steady state a little before two elapsed minutes.

4. GPU usage dropped to near-zero when reported progress reached 99%.  That state persisted for about 4 additional minutes.

5. Despite the huge speed-up since 1.02, this application still requires far longer to complete a task than is expected by a system accustomed to processing Einstein Gamma-Ray pulsar GPU tasks of the flavor currently offered.  Therefore people running mixed-task type system may see severe excess fetching unless they practice some form of limitation.

6. It appeared that I had to enable the "run test applications" option under Einstein Project Preferences|Beta settings in order to get work.

Obviously the validate error is very disheartening.

My short-term intention is to let Gamma-Ray work run on this system until my Task Duration Correction factor shrinks enough to request work, then have a try at running two 1.05 tasks at once.  Possibly the tasks/hour will be greatly improved.  Of course, if the Validate error is a consistent problem this is not useful work to run at the moment.

 

cecht
cecht
Joined: 7 Mar 18
Posts: 719
Credit: 793,365,086
RAC: 496,844

On my Linux system with two

On my Linux system with two RX 570s, I'm seeing similar results to what Archae86 and Jim1348 report. My completion times are about 62-63 minutes with the GPUs running mining BIOS, no P-state mask, and the default 0.9 CPU & 1 GPU per task. Other than fluctuating GPU loads, it looks good, but my five completed tasks all have validation errors. Here is a plot of GPU usage over a few minutes, and CPU usage over one minute while running the 1.05 app. 


===================

Ideas are not fixed, nor should they be; we live in model-dependent reality.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 63
Credit: 1,325,240,339
RAC: 1,828,920

I got the same results with

I got the same results with my AMD Radeon VII on Windows with V. 1.04/1.05: Immediate validate error.

(Weird: I can run this host (12779294) for days or weeks with Milkyway and 7(!) Wus at the same time; with Gammy-Ray and 2 Wus I have a TDR error every few hours and have to reboot.)

Still no confirmation yet with 1.04 and nvidia (hosts 12241921 and 12756498).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.