FGRP4 Observations and Problems

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 1135927797
RAC: 686261

I turned my beta-test work

I turned my beta-test work preference off, because looking at the Applications page, it seemed like this project was now in production because it had a non-beta application.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7231601375
RAC: 1160659

rbpeake wrote:I turned my

rbpeake wrote:
I turned my beta-test work preference off, because looking at the Applications page, it seemed like this project was now in production because it had a non-beta application.


I don't know how to find the official beta-test status of a work unit, but the ones newly made a few hours ago are still being given deadlines 48 hours after "sent time" which I think is generally associated with beta testing on this project.

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 1135927797
RAC: 686261

RE: I don't know how to

Quote:

I don't know how to find the official beta-test status of a work unit, but the ones newly made a few hours ago are still being given deadlines 48 hours after "sent time" which I think is generally associated with beta testing on this project.


Thanks, I have work after enabling the beta work option!

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7231601375
RAC: 1160659

RE: I don't know how to

Quote:
I don't know how to find the official beta-test status of a work unit
...


One other clue besides the 48 hour deadline, which I can see in the BoincTasks task page is that this stuff all has task names for which the leading characters are fgrp_test
For an full example, one of my six tasks from this latest burst created a bit over half a day ago has this task name:fgrp4_test_80.0_0_-5.21e-10_0
All six have 80.0 in that third position, where early releases of this fgrp3 work had either 48.0 or 16.0 in that position.

Reviewing jobs Jeroen's hosts have processed, it appears strongly that this number is correlated with the execution time of the job.

The 16.0's all seem to have been issued early on, and awarded an anemic 2.58 cobblestones. Early issue 48.0's got awarded 24.95 or 28.95, while later issue ones got 238.81. 80.0's so far have received 607.84.

Possibly the powers that be are experimenting with different bundle sizes or some other parameter. Anyone here actually know? Inquiring minds wish to learn.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7231601375
RAC: 1160659

While the previous batch of

While the previous batch of FGRP4 80.0s took a long time to get fully distributed, that was finished some hours ago. Now there is a new batch of 80.0s. but fewer in number this time (95 new ones, I think).

A distinguishing difference of this latest batch is a longer deadline. The previous batches, so far as I noticed, all came with deadlines 48 hours after "send" time. The newest batch have deadlines 6 days after send time. As on arrival they are getting execution time estimates several times longer than is probably the truth, all three of these that I have received have gone promptly into high priority execution, despite the longer deadline.

I suspect there are not very many hosts currently requesting work which are set up to accept this type. My little flotilla has gotten 13 tasks out of these last two batches. Bernd's post in technical news recently suggests they are still looking into issues, and hence delaying transition into production longer than originally expected.

Assuming that some of those issues may have been exposed through this beta testing, we folks accepting beta may be performing a useful service here.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4371
Credit: 3222487718
RAC: 2042396

RE: RE: I don't know how

Quote:
Quote:
I don't know how to find the official beta-test status of a work unit
...

One other clue besides the 48 hour deadline, which I can see in the BoincTasks task page is that this stuff all has task names for which the leading characters are fgrp_test
For an full example, one of my six tasks from this latest burst created a bit over half a day ago has this task name:fgrp4_test_80.0_0_-5.21e-10_0
All six have 80.0 in that third position, where early releases of this fgrp3 work had either 48.0 or 16.0 in that position.

Reviewing jobs Jeroen's hosts have processed, it appears strongly that this number is correlated with the execution time of the job.

The 16.0's all seem to have been issued early on, and awarded an anemic 2.58 cobblestones. Early issue 48.0's got awarded 24.95 or 28.95, while later issue ones got 238.81. 80.0's so far have received 607.84.

Possibly the powers that be are experimenting with different bundle sizes or some other parameter. Anyone here actually know? Inquiring minds wish to learn.

I have two of those test_80.0_ wus and the my own runtime prognosis indicates time increase higher than 80/48 would give. Now they are both at 80% and runtime so far is 30.5 hours, so my prognosis is about 38 hours till finished. For some reason Boinc did not start crunching the first one of them until about 10 hours after it was downloaded. So it is very close for being returned late. They both have been running in High Priority mode all the time, but in task manager the priority for the executables is still marked as low. Boinc's original estimate was about 500 hours runtime, now the estimate is still 14.5 hours to go.

So for this computer (I7-3770 hyper threading on, win 7 64, Boinc 7.2.42) the task size can not be increased while still having 48 hours deadline. At least considerable optimization would be required. Of course the production tasks will have a longer deadline.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: They both have been

Quote:
They both have been running in High Priority mode all the time, but in task manager the priority for the executables is still marked as low.


High priority will only affect the order of the tasks in queue. Normally tasks are run in FIFO order (First In First Out) but if a tasks is in danger of missing it's deadline Boinc will switch to High Priority mode and crunch that task first. It will not have any other effect on how things are run except that Boinc will not ask for more work for that resource until the "danger" has passed.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4371
Credit: 3222487718
RAC: 2042396

RE: RE: They both have

Quote:
Quote:
They both have been running in High Priority mode all the time, but in task manager the priority for the executables is still marked as low.

High priority will only affect the order of the tasks in queue. Normally tasks are run in FIFO order (First In First Out) but if a tasks is in danger of missing it's deadline Boinc will switch to High Priority mode and crunch that task first. It will not have any other effect on how things are run except that Boinc will not ask for more work for that resource until the "danger" has passed.

Thank you for the clarification.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7231601375
RAC: 1160659

An additional batch of test

An additional batch of test FGRP4 work was created today with creation dates I have seen around 30 Aug 2014 10:30 UTC or so.

The five units from this batch which I have received with creation dates between 10:30 and 10:40:38 all are 80.0 designated, while the single unit I received with a creation date of 10:48:10 is designated 112.0.

In the progression from 16.0 to 48.0 through 80.0, each step up in this number has corresponded with a considerable (but not simply proportional) increase in required execution time. Initial indications for my single 112.0 unit suggest it will require yet more time than the 80.0 units. I accidentally left throttling turned off yesterday, so for once simple comparison should give a decent relative execution time estimate. On the host in question two 80.0 units near 50% complete are indicating approximately 12:30 total run time, while the single 112.0 unit near 13% is indicating 15:42, which if borne out is a smaller increase than simple division of the labels could suggest.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117830391729
RAC: 34735331

RE: The five units from

Quote:
The five units from this batch which I have received with creation dates between 10:30 and 10:40:38 all are 80.0 designated, while the single unit I received with a creation date of 10:48:10 is designated 112.0.

If we go back to the days when FGRP2 was around, a given task needed two other 'temporary' files to be present. One of these was the data file - example LATeah0024U.dat. This file was just under 1MB in size. The second file was a skygrid file - examples

  • * skygrid_LATeah024U_0016.0.dat -- 8KB
    * skygrid_LATeah024U_0048.0.dat -- 72KB
    * skygrid_LATeah024U_0080.0.dat -- 199KB
    * skygrid_LATeah024U_0112.0.dat -- 340KB
    * skygrid_LATeah024U_0144.0.dat -- 644KB
    * ....
    * skygrid_LATeah024U_1200.0.dat -- 44.7MB

These temporary files would be deleted when the last task in the local work cache depending on them was returned. That was quite irrespective of the fact that a few minutes later you might get a further task that depended on the 'just deleted' files.

These files were in a regular progression with an increment of 0032.0 from 0016.0 to as high as 1424.0. You can see this in the above examples. I know all this because I became alarmed at the size of many of these files. I ended up caching them all and deploying them to all hosts on the LAN because the download bandwidth was out of control. Every host needed the same files and when resends arrived the same files would be needed again. So I wrote a pretty involved script to control it all and solved the problem that way.

I did write about it at the time (probably two years ago) and Bernd did take it into consideration for the design of FGRP3 by making the creation of the skygrid occur locally as part of the app and not via a download. So FGRP3 only downloads a single data file - example, most recent was LATeah0112C.dat - and these are still just under 1MB in size. These files 'last' for a time, usually around a few days. The longest running one I've seen was LATeah0092C.dat which first arrived on 11th May and was eventually replaced by LATeah0093C.dat on 9th July - almost 2 full months. 0093C only lasted 3 days. I still cache and deploy these files so I have a record of when things change.

With the removal of the separate skygrid files, I think it's possible to see information about the skygrid in the task name itself. (This is supposition - I don't know for sure). The sequence is slightly different - it seems to be steps of 16 rather than 32 - but I believe the numbers you are quoting are to do with the skygrid and not the task size per se. There is probably some experimentation with task size going on and (apart from 'short ends' which are always going to occur) the production run should have pretty much uniform duration tasks irrespective of the 0016.0, 0032.0, 0064.0, 0080.0, 0096.0, 0112.0, ... sequence. From past experience, the early members of this sequence seem to have a greater proportion of 'short ends' but you will see both full running and short varieties under the same numerical tag.

So, to cut a long story short, I don't think there will be any substantial difference between standard (full running) tasks labelled 0016.0 or 0048.0 or 0080.0, etc, once the production run starts up.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.