Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

Finished another unit on my

2 Sep 2011 18:41:30 UTC

Message 105852

(moderation:

)

Finished another unit on my Linux box, awaiting for validation, CPU time vent from 53k s in version 22 to 48k s on version 23.
Tullio

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117480013861

RAC: 35443775

RE: Welcome back,

3 Sep 2011 3:49:44 UTC

Message 105853 in response to message 105850

(moderation:

)

Quote:

Welcome back, Gary!

Thank you! I haven't been too far away - just incapacitated. I still don't have the bulk of my hosts fired up yet but now that I'm getting back into the swing of things, I'll rectify that shortly. As I mentioned to you and Oliver when I talked with you in Hannover at the time of the open day, I'm really keen to do the gamma ray pulsar search work. I had shut down most of my machines for the duration when I left to attend the open day. I returned to Australia in early August and just had time to get about 30 machines set up with the 0.22 app before I had to go to hospital. (Its a long story - I'll relate it in another thread probably). So now that I'm out of hospital and am starting to feel alive again, the release of the 0.23 app spurred me on to bring that group of machines up to date since they were all set up with AP to do FGRP1 tasks only.

Quote:

I suspect that most of your machines are powered by AMD CPUs?

No, that whole group of machines are Intel only. just over half of them are Q8400 quads, whilst the others are a mixture of celeron dual cores (E1500s) and pentium dual cores (E6300s and E6500s). I'll probably get around to firing up my AMD hosts again in the near future. Before I shut things down at the end of June, I had a RAC of around 240K. My RAC is building up again and it's heading past 100K at the moment. As others have mentioned, part of the reason for that current low RAC is the low FGRP1 credit allocation but in my case, it is also due to quite a number of machines that are yet to be restarted.

Quote:

I did see larger speedups on our Intel machines here, as well as in the DB of the runtime averaged over all hosts.

I may have been a bit conservative with my 15% speedup estimate. It may actually be a little bit better than that. The really pleasing aspect is the much larger speedup for Windows hosts. I'm sure the Windows users will welcome that!

Quote:

Anyway, preparing for "production" we increased the deadline (10 days) and are ramping up the FGRP1 share of the project (i.e. sending out more FGRP work).

All good stuff and most welcome! Just set the credit allocation to around 280 or so and everybody will be very happy :-).

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117480013861

RAC: 35443775

RE: Indeed, welcome back,

3 Sep 2011 6:47:33 UTC

Message 105854 in response to message 105851

(moderation:

)

Quote:

Indeed, welcome back, Gary.

Thank you very much, Richard!

Quote:

I'm only running Windows on Intel here, but I'd noticed that very significant speedup with 0.23 - I can pull out figures if it helps, but I'd guess you have plenty available.

Yes, no need to dig out figures unless you want to. Initially, I had supposed that I must have misconfigured my Windows machine when I saw how long the new tasks were taking. It's a lot closer to Linux performance now with the new version.

Quote:

I'm still seeing FGRP1 taking ~30% longer than S6Bucket on all machines (I'm running SSE2 minimum, mainly Core2 or above, so can take advantage of the optimisation).

That may be so on Windows but I don't think it's quite that bad on Linux, although I don't know for sure until I fire up some more hosts and let both tasks run side by side. Maybe we should encourage Bernd to go for at least 300 as the appropriate credit allocation for a FGRP1 task?

Cheers,
Gary.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429459222

RAC: 75907

RE: Maybe we should

3 Sep 2011 15:49:12 UTC

Message 105855 in response to message 105854

(moderation:

)

Quote:

Maybe we should encourage Bernd to go for at least 300 as the appropriate credit allocation for a FGRP1 task?

Welcome back, Garry! +1 for the above.

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

I finished my first two FGPR

3 Sep 2011 17:35:50 UTC

Message 105856

(moderation:

)

I finished my first two FGPR work units last night on my Linux system. Both are awaiting validation and completion time is around 12.4k on my quad-core which is 16-18% longer than the completion time of the S6Bucket work units.

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

On my small dual core machine

3 Sep 2011 20:19:00 UTC

Message 105857 in response to message 105854

(moderation:

)

On my small dual core machine with windows XP the crunching time reduced from ~42.000 seconds to ~26.000 seconds. That's a big step forward. The GW S6Bucket tasks need ~18.000 seconds.

Let's see, what the I7 are doing.

By the way.
Why do the FGRP not start with the same priotity as the GW S6Bucket tasks. I have FGRP with a finishing date of Sept. 08, waiting for execution, where GW S6Buckets with a finishing date of Sept. 17th are processed. On my Laptop this leeds to a start of FGRP with high priority, but the start is so late, that the WU cannot be processed until the scheduled end. Does anybody know the reason and a solution?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117480013861

RAC: 35443775

RE: Why do the FGRP not

4 Sep 2011 4:21:54 UTC

Message 105858 in response to message 105857

(moderation:

)

Quote:

Why do the FGRP not start with the same priotity as the GW S6Bucket tasks. I have FGRP with a finishing date of Sept. 08, waiting for execution, where GW S6Buckets with a finishing date of Sept. 17th are processed.

When normal priority applies, tasks for a given project will be processed in the order in which they are sent to you. Until recently, the FGRP1 tasks have had a 5 day deadline whilst the app was in its testing phase. As Bernd mentioned just a couple of messages earlier in this very thread, that testing deadline of 5 days is now being increased to 10 days in anticipation of the app now being ready for production.

If you have a mixture of 5 day deadline and 14 day deadline tasks in your work cache, BOINC will attempt to process them in the order received until such time as a short deadline task becomes at risk of missing its deadline. At that time high priority mode will kick in and the 'at risk' task will be processed out of normal order immediately. If you don't want this behaviour, make sure your work cache setting is significantly less than the shortest deadline for any tasks you have on board. That way BOINC will always get to a short deadline task well before it becomes 'at risk' and there would be no need for high priority to be invoked. With the announced increase to 10 days, this should pretty much cease to be an issue in future, unless you have a really excessive cache size.

Quote:

On my Laptop this leeds to a start of FGRP with high priority, but the start is so late, that the WU cannot be processed until the scheduled end. Does anybody know the reason and a solution?

Unless BOINC has been fooled into a very unrealistic view of your laptop's compute capabilities, BOINC should always start an 'at risk' task with sufficient time to complete it before the deadline arrives. Your computers are hidden so it isn't possible to look for examples of the behaviour you describe. As an example of how BOINC could be fooled, imagine your laptop had been running 24/7 and BOINC was used to that. If your laptop was then switched off for a day or two and then restarted, there might now be some 'at risk' tasks for which the remaining time to deadline is no longer sufficient, even though BOINC would immediately start them in high priority mode. The way to protect yourself is to keep the work cache setting well below the shortest deadline of any task in your cache. Also, don't load up with tasks if there is any risk that your machine will crunch for less time in the future than it has in the past.

Please realise that there are other scenarios that might explain your particular problem. If you post a link to the host showing the problem, it might be easier to diagnose the actual cause.

Cheers,
Gary.

robertmiles

Joined: 8 Oct 09

Posts: 127

Credit: 29287549

RAC: 22618

My laptop has a possible

4 Sep 2011 6:23:29 UTC

Message 105859

(moderation:

)

My laptop has a possible cause for BOINC getting an unrealistic picture of its capabilities, so you may want to check if your has as well. I've found that I must set it to maintain a 3-day queue of workunits, or it will stop downloading any CPU workunits at all. Also, there's a problem with maintaining a reasonable temperature inside the laptop. With just BOINC, it frequently reaches 100 C, at which some program I have not identified automatically forces it into sleep mode. With Tthrottle added, I now have a choice between two conditions - running the GPU at near full speed with the CPU at perhaps 10% of its normal function, or with the GPU shut off but closer to normal CPU speed. I currently switch back and forth between those two conditions, and don't really expect BOINC to follow that switching well.

Nigel Garvey

Joined: 4 Oct 10

Posts: 51

Credit: 32296391

RAC: 86963

RE: We are still tuning the

4 Sep 2011 7:32:36 UTC

Message 105860 in response to message 105843

(moderation:

)

Quote:

We are still tuning the validator, and I am working on a change to the application that should make it a little bit faster but with the side effect of narrowing down the differences between platforms.

BM

Belated thanks for your reply and thanks too for the new app. I've now completed my first task with the Mac version and have had an inconclusive validation against a Linux machine also running v0.23. That machine's result is marked "Validate error".

http://einsteinathome.org/workunit/104244631

My task finished in about 0.85 the time taken by the last one I did with the old app., but I'm afraid I didn't keep a note of earlier timings.

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

RE: When normal priority

4 Sep 2011 7:37:08 UTC

Message 105861 in response to message 105858

(moderation:

)

Quote:

When normal priority applies, tasks for a given project will be processed in the order in which they are sent to you. Until recently, the FGRP1 tasks have had a 5 day deadline whilst the app was in its testing phase.

Gary: Thanks for the quick an detailed response. This is the answer. I thought tasks would be crunched in order depending on the scheduled finishing date. The sequence in order of sending date declares everything.

Your analysis of the situation, why Boinc switches to the "at risk" mode too late is correct. My Laptop is running on working days for 9-10 hours, but is mostly switched off over the weekend. And this two "switched off days" are missing, even with a small cache for WU's of only one day. It is not a problem and I will keep that all in mind, until FGRP uses a normal schedule like the other WU. I just wanted to understand, how that can happen.

To robertmiles: The temperature in my Laptop does not exceed 78 C. Running 2 CPU (with 80% CPU usage allowed) and one GPU task (with 95% GPU load) in parallel. That is not the problem.

Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

Forums › Technical News

Comment viewing options

Forums › Technical News