Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1176960561
RAC: 978062

Richie wrote:GT 650M was

Richie wrote:
GT 650M was reported to work under Mac

I unfortunately have to correct that, while it worked in the beta I got 4 tasks for my MacBook with that GPU yesterday and all failed. So the 2.0.3 tasks seem to not work on that GPU on macOS...

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1176960561
RAC: 978062

Gary Roberts wrote:Did you

Gary Roberts wrote:
Did you even try opening a command prompt (terminal window) and issuing the clinfo command?

Yes but unfortunately I can't find any information on clinfo for macOS.

So I tried "man clinfo" but no entry for it.

 

It unfortunately seems that macOS is indeed a dead end for most computing work. Going to switch to Windows anways because this is only the tip of the iceberg. Thanks a lot for your help and input.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

B.I.G wrote:Richie wrote:GT

B.I.G wrote:
Richie wrote:
GT 650M was reported to work under Mac

I unfortunately have to correct that, while it worked in the beta I got 4 tasks for my MacBook with that GPU yesterday and all failed. So the 2.0.3 tasks seem to not work on that GPU on macOS...

Ok, thanks for the info!

Richie wrote:
I saw a host with GT 630 2GB (Fermi chip, GF108) and it had successful validations under Windows.

And I must correct that one. It seems there's been GT 630 cards with both Fermi and Kepler chips. That's a nice one, Nvidia.

https://www.geforce.com/hardware/desktop-gpus/geforce-gt-630/specifications

https://en.wikipedia.org/wiki/GeForce_600_series#GeForce_600_.286xx.29_series

That 2GB model might have been a Kepler model. It could still be that Fermi cards are not compatible. I've got a couple of them somewhere but am not able to test them until next week.

QuantumHelos
QuantumHelos
Joined: 5 Nov 17
Posts: 190
Credit: 64239858
RAC: 0

in regard to peoples GPU

in regard to peoples GPU resource usage questions here are some videos and photos https://is.gd/EinsteinGPU

With data, As will be seen CPU is used a lot also and highly relevant the use of https://is.gd/ProcessorLasso for optimisation.

****

evidence found here would suggest a power consumption range for the XBox S-One-X in the following ranges:

(Variable on component use)(+V)
Watching movies: variable 5watt to 30Watt+V
Lighter load GPU play (X360 games) 25Watt to 110Watt variable +V
Medium GPU play 60Watt to 130Watt+V
Heavy GPU play 80Watt to 220WATT+V maximum power on CPU + GPU

Steady frame count + DirectX12 & Vulkan GPU Systems do regulate power with their superb efficiency.

QE

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

After recent change to V2

After recent change to tasks flavoured with V2_VelaJr1 I've got valids but some errors too. Currently

11 validate errors, all for AMD (3 different hosts)
+
17 error while computing (3 different hosts) from which 4 are "exit time limit exceeded" and others have crashed in less than a minute with unknown error code 114. These runtime problems especially were not happening with the late G2 tasks.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592299016
RAC: 771430

Yep validate errors are back

Yep validate errors are back

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

Richie wrote:After recent

Richie wrote:

After recent change to tasks flavoured with V2_VelaJr1 I've got valids but some errors too. Currently

11 validate errors, all for AMD (3 different hosts)
+
17 error while computing (3 different hosts) from which 4 are "exit time limit exceeded" and others have crashed in less than a minute with unknown error code 114. These runtime problems especially were not happening with the late G2 tasks.

At least part of that is caused by lack of GPU memory. Some tasks take 1.5GiB now, maybe there are peaks even higher.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117707015741
RAC: 35080070

For the last couple of weeks,

For the last couple of weeks, I've been running O2MDF GPU tasks on a new build (amongst several others) - a Ryzen 2600 with a 4GB RX 570 GPU.  I started with tasks directed at the G347.3 supernova remnant and found the machine would run very nicely with a 4x multiplicity and produce 250 results per day.   Having settled on 4x, the machine ran continuously for 17 days 16 hours until the transition to data directed at the Vela pulsar.  There were 4329 results uploaded during that period with 0 invalids and 0 errors.  That works out to 245 results per day.

The transition to crunching Vela (frequencies just over 400Hz) happened at 6:45am 1/12/19 (UTC +10).  The big difference was a lot longer crunch time causing a drop to around 103 tasks for the first full day.  There were still no problems with errors or invalids - until just recently.

The machine moved from crunching tasks named h1_0461.20_......  to a higher frequency band, eg h1_0654.20_.......  and now the problems have started - 1 validate error and 3 TIME_LIMIT_EXCEEDED pretty quickly.  I actually saw the last one.  The % done was just under 98% and the elapsed time was 2h 8m 16s when the plug was pulled.  There was another task around 80% and approaching the 2hr mark so I suspended the entire cache of work except for that one and it was able to finish quickly without error.

I decided to transition back to 3x, and as a result, the TIME_LIMIT problem seems resolved.  There is a task about to finish now (98%) and the elapsed time is currently just over 1hr 20m.   There are two more following at lower % completed but with commensurately lower elapsed times.  They will be fine.

It just remains to be seen about validate errors.  It would appear that the memory requirements and/or crunch time of Vela tasks might increase with frequency, something that wasn't so noticeable with the G347.3 directed data.  It would be nice if we could be warned about this sort of stuff.

I remember a post from Bernd about getting back a lot more GW results than expected and that this was causing issues for the servers.  I wonder if they have deliberately increased the work content of Vela tasks so as to reduce the numbers of results, perhaps without adjusting the value that controls TIME_LIMIT.  Perhaps this also leads to increased memory requirements.  If that's the case, it sure would be nice to get a bit of advance warning.

Cheers,
Gary.

Tom Grinnell
Tom Grinnell
Joined: 13 Nov 05
Posts: 3
Credit: 59555915
RAC: 6759

My computer is 6 years old. 

My computer is 6 years old.  The graphics card has only one gig of memory.  I just started receiving the Gravitational wave search for the GPU last week.  They were working fine.  They finished in 1.5 hours.  Now I am working on h1_0625.10_O2C02Cl4In0__O2MDFV2_VelaJr1_625.50.  100% failure rate over the last 5 days.  They all abort after 3 hours with   "exit time limit exceeded".  I am going to turn off Gravitational waves for the GPU and go back to Gamme-ray pulsars.  Too bad.  I joinded this to help look for gravity waves.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

These Vela tasks really are

These Vela tasks really are challenging for many hosts to run well enough that results would validate. I picked up only WUs that had more than 5 hosts with problems so far.

 

https://einsteinathome.org/workunit/428858912 https://einsteinathome.org/workunit/428867328

https://einsteinathome.org/workunit/428880609 https://einsteinathome.org/workunit/428827983

https://einsteinathome.org/workunit/428928285 https://einsteinathome.org/workunit/428826224

https://einsteinathome.org/workunit/428905403 https://einsteinathome.org/workunit/428844213

https://einsteinathome.org/workunit/428870640 https://einsteinathome.org/workunit/428836736

https://einsteinathome.org/workunit/428838095 https://einsteinathome.org/workunit/428917860

https://einsteinathome.org/workunit/428981169 https://einsteinathome.org/workunit/428901856

https://einsteinathome.org/workunit/428928282 https://einsteinathome.org/workunit/428945911

https://einsteinathome.org/workunit/428852900 https://einsteinathome.org/workunit/428942977

https://einsteinathome.org/workunit/428883774 https://einsteinathome.org/workunit/428870563

https://einsteinathome.org/workunit/428939811 https://einsteinathome.org/workunit/428867226

https://einsteinathome.org/workunit/428863927 https://einsteinathome.org/workunit/428882211

https://einsteinathome.org/workunit/428894067 https://einsteinathome.org/workunit/428860984

https://einsteinathome.org/workunit/428853341 https://einsteinathome.org/workunit/428939840

https://einsteinathome.org/workunit/428870411 https://einsteinathome.org/workunit/428885317

https://einsteinathome.org/workunit/428901961 https://einsteinathome.org/workunit/428832633

https://einsteinathome.org/workunit/428867451 https://einsteinathome.org/workunit/428926864

https://einsteinathome.org/workunit/428857584 https://einsteinathome.org/workunit/428921184

 

There is a task ending with "736" ... robbing the table with 10 x "validate error" (plus 2 x "error while computing"... for incompatible gpus, runtimes 5 sec vs. 3+ hours). Then a mystical lonely rider with Windows 7 + NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 372.70 had arrived. It must have broken some physical constant while crunching, because: "Completed, waiting for validation". The 14th host is now trying to complete the game. Interesting to see how it will end.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.