Estimated vs Actual Computation Size

Timmehhh
Timmehhh
Joined: 22 Oct 12
Posts: 3
Credit: 1465354
RAC: 0
Topic 197980

When a task is downloaded it comes with an estimated copmutation size in usually tens of thousands, or hundreds of thousands of GFLOPs. For examples, Rosetta tasks are approximately 40,000 GFLOPs.

Is it possible to determine the actual amount of GFLOPs performed? I figured an approximation would be to multiply CPU time for the task by the floating point MIPS Whetstone benchmark. But it is still just a benchmark so it may not be accurate depending on the real work done? It also doesn't help with monitering the GPU which does not provide a benchmark in BOINC.

I figure that counting operations is requires its own processing and so doing so would increase load on CPU, skewing results, even if it were technically possible (which I assume it is not?).

Basically I want to track CPU and GPU performance based on real work done under real operating conditions (which I intend to change) and not benchmarks. Even if a bit manually.

And how accurate would Credit be for monitering this? Credit would probably be the best way to compare work by CPU and GPU as far as I can tell. Because from what I understand, granted Credit is some sort of average of claimed Credit for a particuar task. So to me it would seem like a fairer real world comparison of work done.

But some tasks apparently grant significantly different amounts of Credit for a similar amount of CPU time. Or is the discrepency for most tasks not as large as I think? And the GPU tends to run different tasks, so would comparing Credits be invalid?

Now a little tanget on the whole Credits thing...
Credit is measured in Cobblestones, which is one 200th of CPU time on a 1 GFLOPS comuter, which I presume means 1 credit = 1*24*3600/200 = 432 GFLOPs. But when I multiply claimed credit for a task done on my computer by this number, the GFLOPs end up being 18.1% over that determined from benchmark (and that percentage difference is basically identical across similar tasks). Which would suggest the CPU is doing more work than possible. Maybe the discrepency is related to how credit is calculated. Or maybe the Rosetta tasks are more optimised to my CPU than the Whetstone benchmark. Can anyone shed light on this for me?

I just recently bothered to do something about my broken BOINC installation so it has aroused my curiosity.

Cheers

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2978230925
RAC: 782158

Estimated vs Actual Computation Size

It's an interesting group of questions, and it will be interesting to see what answers the Einstein message-board users come up with. Here are some starters.

As you've said, BOINC attempts to measure the speed of CPUs by bechmarking. But the Whetstone benchmark used is deliberately designed to reject cheating by using compiler optimisations, so it comes up with a very conservative value. Modern science applications use SIMD optimisations whenever possible and appropriate, up to and including AVX, so real-world speeds can be significantly higher than benchmark results. (Except on the Android platform, where the BOINC benchmark does use hardware optimisation facilities - and much grief it has caused projects which don't use the same optimisations in their science applications)

For GPUs, as you say, BOINC doesn't attempt to benchmark the hardware, but it does report the theoretical maximum speed calculated from the manufacturers' specifications - or "Advertising FLOPs", as I tend to call them. The ratio of advertising flops to scientific computing flops is unknown, and probably depends on the skill of the individual programmer (and the degree to which the algorithm can be factored into parallel computations).

The relationship between FLOPs and Credits is defined, but not widely respected. I believe that this particular project does make an honest effort to award credits in line with the formal definition, which means that the more common reverse derivation of FLOPs performed from the amount of BOINC credit granted stands some chance of realism. But at other projects, the definition is thrown out of the window, especially for GPU applications. I have one computer with two identical GPUs, and as far as possible, I run it with one project having exclusive use of one GPU, and a different project having exclusive use of the identical twin GPU. The first project is awarding its GPU about 17,000 credits per day, and the other project is awarding 350,000 credits per day. Pick the flops out of that pair.

One project - SETI@Home - did for a while make an attempt to "count flops": not individually, but by making an assessment of how many flops were needed for each of the major calculation steps, and counting how many times each step was performed during a run. That approach - which continued into the early days of GPU processing - yielded stable, repeatable, and realistic credit awards, but BOINC decided not to adopt it for wider use - I think largely because they feared that smaller projects would lack the skill and stamina to maintain the "flops per calculation step" calibration for each application they deployed.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 758300704
RAC: 1150285

I have spent some time

I have spent some time thinking about these very questions myself, with little useful outcome.

Quote:
I figure that counting operations is requires its own processing and so doing so would increase load on CPU, skewing results, even if it were technically possible (which I assume it is not?).

Theoretically there is a way to count operations without (significantly) increasing the CPU load, and that is via the "hardware performance counters" that are built into modern CPUs. Without slowing down the CPU, they can count operations performed by the CPU (and other events like cache misses etc) in the background. As those counters will overflow rather quickly, some process has to run in the background to accumulate the counter values over time and relate them to the individual processes (and even lines of code if your code has debugging info build into it). Under Linux, the oprofile project

http://oprofile.sourceforge.net/doc/

is very useful in working with those performance counters. Under Windows, a tool like Intel's VTune is also excellent for experiments. If you are serious about counting actual operations and willing to invest some time in the learning-curve of those tools, this is the way to go IMHO.

GPU is a different matter. There are also Hardware Performance Counters built into modern GPUs but working with them is far less straight-forward and more intrusive wrt to the performance impact, AFAIK.

Cheers
HB

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1607891251
RAC: 688328

IMO fixing credit new is the

IMO fixing credit new is the answer and this project with it's stable work units could very well provide useful clues to help Dr A.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.