GRP tasks not being generated for the last six hours?

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220564931
RAC: 970958
Topic 219557

I've noticed that my machines running GPU GRP tasks have had their queues drop over night.

On review I notice that the last time either machine received tasks with _0 or _1 suffixes was somewhat over six hours ago, with just a few _2 and such (reissues) delivered since then.

On the Einstein server status page the 14:20 UTC update shows zero tasks to send for FGRPB1G, and that work generator as "not running".

Something amiss?

 

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 401
Credit: 10140943455
RAC: 25947430

+1

+1

mikey
mikey
Joined: 22 Jan 05
Posts: 12680
Credit: 1839083536
RAC: 3919

archae86 wrote:I've noticed

archae86 wrote:

I've noticed that my machines running GPU GRP tasks have had their queues drop over night.

On review I notice that the last time either machine received tasks with _0 or _1 suffixes was somewhat over six hours ago, with just a few _2 and such (reissues) delivered since then.

On the Einstein server status page the 14:20 UTC update shows zero tasks to send for FGRPB1G, and that work generator as "not running".

Something amiss? 

Gary wrote in another thread:

It's nothing to do with tasks being "messed up".  Where does that idea come from?  It's entirely to do with the fact that there are 2 different applications - a long standing and trusted app running on a CPU core and a new application designed to perform parts of the calculations (hopefully much faster) by taking advantage of the parallel processing capabilities of a modern GPU.

The GPU app is under development and the testing of it is revealing that it's not always getting close enough answers to those of the trusted CPU app, in certain circumstances.  By analyzing the reasons for the discrepancies in the results, the Devs hope to achieve a GPU app that will both save time and mostly always get close enough to the same results.  It's an ongoing process and the invalid results should hopefully lead to newer versions of the app that can achieve the required computational accuracy.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117533260175
RAC: 35343331

mikey wrote:Gary wrote in

mikey wrote:
Gary wrote in another thread:

Mikey,
What I wrote in that "other thread" was in connection to a totally different science app - the test app for the O2AS GW search using GPUs.

Archae86 has started this thread because (for some as yet unexplained reason) the workunit generator for the Gamma Ray Pulsar search that uses GPUs (FGRPB1G) is currently not running and there are no tasks ready to send.

Even if you had simply confused the two searches, thinking they were one and the same, I still don't understand how a test app getting some inconclusive/invalid results could in any way explain the puzzling failure of a workunit generator to keep producing new tasks ready to send out.  I guess we'll get some information once the issue has been resolved.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18712606128
RAC: 6369877

The GRP generators have not

The GRP generators have not been running for a lot longer than 6 hours.

 https://einsteinathome.org/goto/comment/173245

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220564931
RAC: 970958

Finally not only are GPU GRP

Finally not only are GPU GRP tasks being generated, but the rest of the plumbing is intact enough that all of my machines have reported old work and received fresh work.

The Radeon VII is off to the "sit and wait a while" punishment box as it reached a daily task download quota, but the other two are fully caught up.

I was down to two tasks in queue on one machine when I got up (the plumbing and task availability had been OK for over two hours, but it was waiting out a deferral), so two days was a good cache size this time.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.