Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Finished another unit on my

Finished another unit on my Linux box, awaiting for validation, CPU time vent from 53k s in version 22 to 48k s on version 23.
Tullio

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117480013861
RAC: 35443775

RE: Welcome back,

Quote:
Welcome back, Gary!


Thank you! I haven't been too far away - just incapacitated. I still don't have the bulk of my hosts fired up yet but now that I'm getting back into the swing of things, I'll rectify that shortly. As I mentioned to you and Oliver when I talked with you in Hannover at the time of the open day, I'm really keen to do the gamma ray pulsar search work. I had shut down most of my machines for the duration when I left to attend the open day. I returned to Australia in early August and just had time to get about 30 machines set up with the 0.22 app before I had to go to hospital. (Its a long story - I'll relate it in another thread probably). So now that I'm out of hospital and am starting to feel alive again, the release of the 0.23 app spurred me on to bring that group of machines up to date since they were all set up with AP to do FGRP1 tasks only.

Quote:
I suspect that most of your machines are powered by AMD CPUs?


No, that whole group of machines are Intel only. just over half of them are Q8400 quads, whilst the others are a mixture of celeron dual cores (E1500s) and pentium dual cores (E6300s and E6500s). I'll probably get around to firing up my AMD hosts again in the near future. Before I shut things down at the end of June, I had a RAC of around 240K. My RAC is building up again and it's heading past 100K at the moment. As others have mentioned, part of the reason for that current low RAC is the low FGRP1 credit allocation but in my case, it is also due to quite a number of machines that are yet to be restarted.

Quote:
I did see larger speedups on our Intel machines here, as well as in the DB of the runtime averaged over all hosts.


I may have been a bit conservative with my 15% speedup estimate. It may actually be a little bit better than that. The really pleasing aspect is the much larger speedup for Windows hosts. I'm sure the Windows users will welcome that!

Quote:
Anyway, preparing for "production" we increased the deadline (10 days) and are ramping up the FGRP1 share of the project (i.e. sending out more FGRP work).


All good stuff and most welcome! Just set the credit allocation to around 280 or so and everybody will be very happy :-).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117480013861
RAC: 35443775

RE: Indeed, welcome back,

Quote:
Indeed, welcome back, Gary.


Thank you very much, Richard!

Quote:
I'm only running Windows on Intel here, but I'd noticed that very significant speedup with 0.23 - I can pull out figures if it helps, but I'd guess you have plenty available.


Yes, no need to dig out figures unless you want to. Initially, I had supposed that I must have misconfigured my Windows machine when I saw how long the new tasks were taking. It's a lot closer to Linux performance now with the new version.

Quote:
I'm still seeing FGRP1 taking ~30% longer than S6Bucket on all machines (I'm running SSE2 minimum, mainly Core2 or above, so can take advantage of the optimisation).


That may be so on Windows but I don't think it's quite that bad on Linux, although I don't know for sure until I fire up some more hosts and let both tasks run side by side. Maybe we should encourage Bernd to go for at least 300 as the appropriate credit allocation for a FGRP1 task?

Cheers,
Gary.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429459222
RAC: 75907

RE: Maybe we should

Quote:
Maybe we should encourage Bernd to go for at least 300 as the appropriate credit allocation for a FGRP1 task?

Welcome back, Garry! +1 for the above.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

I finished my first two FGPR

I finished my first two FGPR work units last night on my Linux system. Both are awaiting validation and completion time is around 12.4k on my quad-core which is 16-18% longer than the completion time of the S6Bucket work units.

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

On my small dual core machine

On my small dual core machine with windows XP the crunching time reduced from ~42.000 seconds to ~26.000 seconds. That's a big step forward. The GW S6Bucket tasks need ~18.000 seconds.

Let's see, what the I7 are doing.

By the way.
Why do the FGRP not start with the same priotity as the GW S6Bucket tasks. I have FGRP with a finishing date of Sept. 08, waiting for execution, where GW S6Buckets with a finishing date of Sept. 17th are processed. On my Laptop this leeds to a start of FGRP with high priority, but the start is so late, that the WU cannot be processed until the scheduled end. Does anybody know the reason and a solution?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117480013861
RAC: 35443775

RE: Why do the FGRP not

Quote:
Why do the FGRP not start with the same priotity as the GW S6Bucket tasks. I have FGRP with a finishing date of Sept. 08, waiting for execution, where GW S6Buckets with a finishing date of Sept. 17th are processed.


When normal priority applies, tasks for a given project will be processed in the order in which they are sent to you. Until recently, the FGRP1 tasks have had a 5 day deadline whilst the app was in its testing phase. As Bernd mentioned just a couple of messages earlier in this very thread, that testing deadline of 5 days is now being increased to 10 days in anticipation of the app now being ready for production.

If you have a mixture of 5 day deadline and 14 day deadline tasks in your work cache, BOINC will attempt to process them in the order received until such time as a short deadline task becomes at risk of missing its deadline. At that time high priority mode will kick in and the 'at risk' task will be processed out of normal order immediately. If you don't want this behaviour, make sure your work cache setting is significantly less than the shortest deadline for any tasks you have on board. That way BOINC will always get to a short deadline task well before it becomes 'at risk' and there would be no need for high priority to be invoked. With the announced increase to 10 days, this should pretty much cease to be an issue in future, unless you have a really excessive cache size.

Quote:
On my Laptop this leeds to a start of FGRP with high priority, but the start is so late, that the WU cannot be processed until the scheduled end. Does anybody know the reason and a solution?


Unless BOINC has been fooled into a very unrealistic view of your laptop's compute capabilities, BOINC should always start an 'at risk' task with sufficient time to complete it before the deadline arrives. Your computers are hidden so it isn't possible to look for examples of the behaviour you describe. As an example of how BOINC could be fooled, imagine your laptop had been running 24/7 and BOINC was used to that. If your laptop was then switched off for a day or two and then restarted, there might now be some 'at risk' tasks for which the remaining time to deadline is no longer sufficient, even though BOINC would immediately start them in high priority mode. The way to protect yourself is to keep the work cache setting well below the shortest deadline of any task in your cache. Also, don't load up with tasks if there is any risk that your machine will crunch for less time in the future than it has in the past.

Please realise that there are other scenarios that might explain your particular problem. If you post a link to the host showing the problem, it might be easier to diagnose the actual cause.

Cheers,
Gary.

robertmiles
robertmiles
Joined: 8 Oct 09
Posts: 127
Credit: 29287549
RAC: 22618

My laptop has a possible

My laptop has a possible cause for BOINC getting an unrealistic picture of its capabilities, so you may want to check if your has as well. I've found that I must set it to maintain a 3-day queue of workunits, or it will stop downloading any CPU workunits at all. Also, there's a problem with maintaining a reasonable temperature inside the laptop. With just BOINC, it frequently reaches 100 C, at which some program I have not identified automatically forces it into sleep mode. With Tthrottle added, I now have a choice between two conditions - running the GPU at near full speed with the CPU at perhaps 10% of its normal function, or with the GPU shut off but closer to normal CPU speed. I currently switch back and forth between those two conditions, and don't really expect BOINC to follow that switching well.

Nigel Garvey
Nigel Garvey
Joined: 4 Oct 10
Posts: 51
Credit: 32296391
RAC: 86963

RE: We are still tuning the

Quote:

We are still tuning the validator, and I am working on a change to the application that should make it a little bit faster but with the side effect of narrowing down the differences between platforms.

BM

Belated thanks for your reply and thanks too for the new app. I've now completed my first task with the Mac version and have had an inconclusive validation against a Linux machine also running v0.23. That machine's result is marked "Validate error".

http://einsteinathome.org/workunit/104244631

My task finished in about 0.85 the time taken by the last one I did with the old app., but I'm afraid I didn't keep a note of earlier timings.

NG

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

RE: When normal priority

Quote:

When normal priority applies, tasks for a given project will be processed in the order in which they are sent to you. Until recently, the FGRP1 tasks have had a 5 day deadline whilst the app was in its testing phase.

Gary: Thanks for the quick an detailed response. This is the answer. I thought tasks would be crunched in order depending on the scheduled finishing date. The sequence in order of sending date declares everything.

Your analysis of the situation, why Boinc switches to the "at risk" mode too late is correct. My Laptop is running on working days for 9-10 hours, but is mostly switched off over the weekend. And this two "switched off days" are missing, even with a small cache for WU's of only one day. It is not a problem and I will keep that all in mind, until FGRP uses a normal schedule like the other WU. I just wanted to understand, how that can happen.

To robertmiles: The temperature in my Laptop does not exceed 78 C. Running 2 CPU (with 80% CPU usage allowed) and one GPU task (with 95% GPU load) in parallel. That is not the problem.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.