App_Info files for S5R3/S5R4 Dual Capability

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2769841021
RAC: 936695

RE: I'm looking at flushing

Message 83535 in response to message 83534

Quote:
I'm looking at flushing queue of S5R3s and understand that I surely will never get any S5R3s even though I use dual capability app_info.xml
Is there anybody who did get some work from S5R3 with dual app_info?
I mean who crunches S5R4 and then suddenly get S5R3 work a little.


Yes, I did - got about 5 S5R3 during my timing run, using dual capability app_info to keep the Windows power app available for use.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109956220620
RAC: 31312352

RE: I'm looking at flushing

Message 83536 in response to message 83534

Quote:
I'm looking at flushing queue of S5R3s and understand that I surely will never get any S5R3s even though I use dual capability app_info.xml
Is there anybody who did get some work from S5R3 with dual app_info?
I mean who crunches S5R4 and then suddenly get S5R3 work a little.

WARNING: The following thoughts are mostly conjecture on my part based on observation of a large number of hosts. I have no special "inside knowledge" other than my own observations.

In the last 24 hours, a number of my machines that I've directly observed have downloaded well in excess of 100 S5R3 resends in total, probably closer to 200. I've had several machines which were able to fill up 12 day caches on R3 resends only. One single dual cpu machine got 30, and probably could have got more if there had been space. There seem to be rather special conditions needed to get resends so I'll start a new thread to document and explain this as I see it. It's a complicated story so it might take me a while to finish it :-).

It's still very early days for the R4 run. There will probably be R3 resends for at least the next month or two and possibly longer. By now most hosts will be crunching R4 and most hosts will have been told by the scheduler to delete remaining R3 data. If you are in this category you probably won't get R3 resends just yet. That is because whilst there are plenty of resends, the scheduler is still able to dish them out to hosts that still do have the right data files. That, I believe, is the key.

This will change soon I believe. Probably in a couple of days time, there will be insufficient hosts left still with R3 data files and the scheduler wont be able to get rid of the resends within a comfortable time frame to these "right" hosts. I think there must be a "trigger" time interval that, if exceeded, causes the scheduler to pick the next host that comes along that doesn't prohibit R3 work and send it the whole kit and caboodle - including all the data files. This is when your host will be able to get some resends if it had previously been told to delete its R3 data files.

When that happens to you, if you leave the app_inf.xml file in place, at least you will be crunching the resend with the app of your choice.

The behaviour I described above wasn't possible just a few days ago. I surmise that, at that time, there were more "right" hosts than resends. So on average somebody else got lucky and not you. Now, just a few days later, I have just tried 10 - 15 hosts that still have R3 data intact, and virtually all of them are able to snag multiple resends immediately. To me that means there are now more resends than available hosts. Soon, I believe, the scheduler will reach the "trigger" and others without the "right" data will be able to get resends.

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 197

probably alot longer than

probably alot longer than just a few days. the last S5R2 Wu was only cleared a few days ago. Personally I'm shocked than the project didn't just direct the last few hundred onto their own clusters months ago to finish them out.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245207413
RAC: 13173

RE: probably alot longer

Message 83538 in response to message 83537

Quote:
probably alot longer than just a few days. the last S5R2 Wu was only cleared a few days ago. Personally I'm shocked than the project didn't just direct the last few hundred onto their own clusters months ago to finish them out.


Apparently 11 workunits of S5R2 got stuck in the system (i.e. had no canonical result but also no results unsent or in progress), possibly due to a scheduler or validator crash. I've watched this for a while, but I did't find the time to really look into that until a few days before S5R4.

BM

BM

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: einstein_S5R3

Message 83539 in response to message 83504

Quote:

einstein_S5R3
436
6.1.0

einstein_S5R3_4.46_windows_intelx86.exe



einstein_S5R3_4.46_windows_intelx86_0.exe


einstein_S5R3_4.46_windows_intelx86_1.exe


einstein_S5R3_4.26_windows_intelx86.pdb


einstein_S5R3_4.36_windows_intelx86.pdb


einstein_S5R3_4.46_graphics_windows_intelx86.exe
graphics_app

Not that I'm complaining, but I've found a couple of typos in the "436" section. You have the "446" apps listed instead of "436" apps.

I didn't notice until after I switched over to the new file, and then had three R3's crash with compute errors.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056114931
RAC: 1607637

RE: Not that I'm

Message 83540 in response to message 83539

Quote:

Not that I'm complaining, but I've found a couple of typos in the "436" section. You have the "446" apps listed instead of "436" apps.

I didn't notice until after I switched over to the new file, and then had three R3's crash with compute errors.

Not a typo. The whole concept of the file is to direct all S5R3 work, however branded, onto a single set of files--naturally the 4.46 set.

Hence the requirement, mentioned in the thread, to download those.

Of course, you can certainly create your own revision, taking advantage of knowing that a particular host has never had, and now never will have, work branded above a particular level, and direct that work to a functional version of the application you happen to have on hand, thus avoiding the ap download requirement.

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

RE: RE: probably alot

Message 83541 in response to message 83538

Quote:
Quote:
probably alot longer than just a few days. the last S5R2 Wu was only cleared a few days ago. Personally I'm shocked than the project didn't just direct the last few hundred onto their own clusters months ago to finish them out.

Apparently 11 workunits of S5R2 got stuck in the system (i.e. had no canonical result but also no results unsent or in progress), possibly due to a scheduler or validator crash. I've watched this for a while, but I did't find the time to really look into that until a few days before S5R4.

BM

So, now that we're seeing: "S5R2 0 results 0 workunits" does that mean the S5R2 validator and assimilator processes can be unceremoniously dumped now? :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.