GW GPU Issue 22 oddity

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125
Topic 222965

I'm pretty new at running Einstein GW work on the GPU on one system.  I've been watching more closely than usual, and happened to spot something a bit odd.

Three separate times I've had a task with Issue number 22 take far less elapsed time and CPU time than adjacent work.  "Far less", like a quarter!  

These units have also thrown my queue size estimates into a tizzy.  Apparently the server knew they'd be shorties, and sent along a work content estimate that was way down from normal, but way down by even more than the actual observed effect.  So, paradoxically, the software saw my super-short elapsed time as considerably longer than expected, and promptly bumped up my DCF, thus the estimated elapsed times for my work in queue.  I think it triggered me into High Priority mode processing on at least one occasion.

The specific tasks on which I've directly observed this effect on my system are:

h1_1634.70_O2C02Cl4In0__O2MDFV2i_VelaJr1_1634.95Hz_22_0    
h1_1634.80_O2C02Cl4In0__O2MDFV2i_VelaJr1_1635.05Hz_22_0
h1_1634.85_O2C02Cl4In0__O2MDFV2i_VelaJr1_1635.10Hz_22_0

The longest elapsed time on any of these three was less than a third the elapsed time for any other Einstein GW task on my machine in the last week, so these are extreme outliers.

I mention this as an oddity, not a complaint or serious effect observation.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5009
Credit: 18892287507
RAC: 5992232

Went looking for similar

Went looking for similar tasks.  Didn't have any.  I have to say my crunch times are remarkably consistent across all task species.  The only difference in times is between slow cards and fast cards.  My GTX 1080Ti is half as slow as my RTX cards.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

Keith Myers wrote: Went

Keith Myers wrote:

Went looking for similar tasks.  Didn't have any.

While it is issue number 23 rather than 22, and the difference from others is somewhat under a factor of 2 rather than the extreme variation I saw, nevertheless I think this issue 23 task from one of your machines rates as a clear outlier.

My issue 22 tasks were the highest issue number still in DF .25 in their parent frequencies.  Your task is in DF .25, so I'll guess it was the highest of that DF in that parent frequency.

I suspect this effect is showing up at the boundary between DF .25 and DF .30, and that the depth and position of the notch may vary meaningfully with parent frequency.  Possibly there may be effects sometimes detectable at other DF boundaries.

 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3114
Credit: 5000140109
RAC: 1215789

I have been checking mine

I have been checking mine since I read your posts, and then I checked my log and found this curious bit:

<<snip>>

6/26/2020 4:07:16 AM | Universe@Home | Backing off 00:01:58 on upload of universe_bh2_190723_316_1008223992_20000_1-999999_225100_1_r964114263_3
6/26/2020 4:07:33 AM | Universe@Home | Backing off 00:01:54 on upload of universe_bh2_190723_316_1008223992_20000_1-999999_225100_1_r964114263_5
6/26/2020 4:07:34 AM | Einstein@Home | Backing off 00:01:53 on upload of h1_1651.50_O2C02Cl4In0__O2MDFV2i_VelaJr1_1651.95Hz_107_1_1
6/26/2020 4:07:35 AM | Einstein@Home | Backing off 00:01:32 on upload of h1_1651.50_O2C02Cl4In0__O2MDFV2i_VelaJr1_1651.95Hz_107_1_0
6/26/2020 4:07:39 AM |  | Project communication failed: attempting access to reference site
6/26/2020 4:07:39 AM | Universe@Home | Backing off 00:02:21 on upload of universe_bh2_190723_316_1008223992_20000_1-999999_225100_1_r964114263_2
6/26/2020 4:07:40 AM |  | Internet access OK - project servers may be temporarily down.

6/26/2020 4:12:08 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 4:31:40 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 5:19:43 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 5:31:46 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 6:27:08 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 6:31:52 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 7:29:29 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 7:31:57 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 8:32:00 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 8:39:56 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 9:32:02 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 9:48:45 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 10:32:08 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 10:51:07 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 11:26:08 AM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 11:32:14 AM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 11:52:58 AM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 12:30:50 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 12:32:17 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 1:32:20 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 1:32:26 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 2:32:27 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 2:33:40 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 3:32:30 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 3:34:33 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 4:32:37 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 4:36:00 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 4:42:15 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 4:45:29 PM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 5:32:39 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 5:42:55 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 6:32:45 PM | Universe@Home | Project requested delay of 11 seconds
6/26/2020 6:42:09 PM | Einstein@Home | Project requested delay of 60 seconds
6/26/2020 6:43:46 PM | Einstein@Home | [error] Error reported by file upload server: File uploads are temporarily disabled.
6/26/2020 6:43:46 PM | Einstein@Home | [error] Error reported by file upload server: File uploads are temporarily disabled.

6/26/2020 6:43:46 PM | Einstein@Home | Backing off 00:02:04 on upload of LATeah1064L18_468.0_0_0.0_35663446_0_0
6/26/2020 6:43:46 PM | Einstein@Home | Backing off 00:02:54 on upload of LATeah1064L18_468.0_0_0.0_35663446_0_1
6/26/2020 7:02:56 PM | Milkyway@Home | Project requested delay of 91 seconds
6/26/2020 7:32:50 PM | Universe@Home | Project requested delay of 11 seconds

<<snip>>

Aside from the shaded lines in red, I don't quite understand the "Backing off (time) on upload of..." and "Project requested delay of..."

Thoughts as to this?

George

Proud member of the Old Farts Association

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

archae86 wrote:Apparently the

archae86 wrote:
Apparently the server knew they'd be shorties, and sent along a work content estimate that was way down from normal, but way down by even more than the actual observed effect.

While I don't know a way to check up on the server-estimated relative work content for tasks already completed, it happens I have another of these in my queue.

It is task h1_1634.75_O2C02Cl4In0__O2MDFV2i_VelaJr1_1635.00Hz_22_2.  (yes, issue number 22 again)

At this minute, with the host DCF currently estimated at 2.126, and actually running at 4X on medium DF tasks, the task list estimated elapsed time for each of dozens of tasks from that parent frequency from issue number 14 up through issue number 77 shows at 34:50, while for this one task the estimated elapsed time shows at 01:23!  So when I run that task, I'll fail to achieve the estimated ET by a big enough margin to trigger the immediate DCF jump for big fails.  Again.

Actually it is not just near neighbors of this task which show estimated ET of 34:50, but every single other one of the 350-odd waiting to run GW GPU tasks on this machine.  So the server knew when it sent me this task that there was something special about it.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5009
Credit: 18892287507
RAC: 5992232

91 seconds is the normal

91 seconds is the normal scheduler timer interval at Einstein.  When the project can't be contacted normally, the client goes into a hurryup mode to try the connection again.

All normal events for the client.  You just hit the scheduler when it was overly busy or you have congestion in your internet connection to the project.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5009
Credit: 18892287507
RAC: 5992232

You said to look for 22

You said to look for 22 species which I did and did not find one.  Your link to my 23 task is bad as the task has already been purged.

I'd like to see one of these in play myself.  I'll keep watching out for any outliers.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

I tried looking at the task

I tried looking at the task lists for a couple of my recent quorum partners.  I selected valid GW tasks only, and sorted in ascending order of run time.  Twice in a very few tries I found an issue 22 task as the shortest run time, with less than a quarter the run time of the next task up the list.

from computer 12798769

WU 466514854

from computer 12780800

WU 466149407

These completed on the 27 and the 29th of June, so may be visible for a few days yet.

Both of these ran against the same quorum partner--me.  But the point is that these tasks were by far the shortest run times currently listed for a GW task on their computer.

But it does remain to be seen whether the project will be generating more of this sort or not.  These may have been especially extreme.

 

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5009
Credit: 18892287507
RAC: 5992232

Bingo.  I found one on my

Bingo.  I found two on my hosts.

969483879

969683495

Some of your 22/23 DF tasks.  I've looked at the task output and compared to a normal task and nothing jumps out at me for an unusual set of parameters.

The one thing that is consistent among these examples is that short run time equates to a non-standard credit award.

The second example of mine is not necessarily a short run though. Actually around the nominal value for my tasks.

[Edit]

But here is a 22 DF task that ran in the nominal amount of time, maybe on the lower percentile range but was awarded the standard credit.

971216827

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

Keith Myers wrote:But here is

Keith Myers wrote:

But here is a 22 DF task that ran in the nominal amount of time, maybe on the lower percentile range but was awarded the standard credit.  Any of these should be at the very beginning, and all the rest 2000.

971216827

Following your link I see that as an issue number 229 with DF of .65.  Possibly there was an end-of-line viewing oddity that split up the 229 into two components?  In any case, no surprise for it to run normally.

I had not spotted the granted credit difference which you point out.  That is a handy way to look for these things.  Go to Task list, select GW GPU tasks only, select valid tasks, and sort by rising granted credit.

Doing that, I spot four of these currently visible on my host, all issue number 22, but with granted credits of:

 

100

120

740

780

The 740 and 780 cases have elapsed time and run time much closer to normal than the extreme outliers I focused on at first.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7261171908
RAC: 1544125

Keith,The unstarted task

I posted a speculative thought here that a particular sequence of tasks might shed light on this, but it has been overtaken by events, as the tasks in question have been all returned and showed no symptoms.

So it is not the case that every parent frequency series of tasks includes even a modest case of this effect near the issue number 22 or the seam between DF of .25 and of .30.

So, I'm replacing that post with a new observation from actual results.

Using the method of looking for valid GW GPU results with less than 2000 credits awarded, I've come across examples so far with issue numbers 21, 22, and 23.  It appears that the affected issue number is often the same where the parent frequency is close.

 

 

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.