Hmm, weird...
BRP4: This one had runtime 6,271s and this one had runtime 1,292s
Even so CPU time is 231s compared to 165s
System is running on cool temperature (GPU 50°C, CPU 50°C) under full load and stable too.
I have no explanation for this wide range different runtimes.
Generally, the variation is not great if you provide more or less identical conditions. However, the database does not contain information on whether, how many and what applications were running in parallel on the same GPU.
Since I already have a little more completed jobs I see that the computation time is almost exactly 10x longer on my Nvidia cards. I also think that the scoring should take into account a bonus for the fact that task take more time. Therefore, I think that a little more than 5,000 for task will be fine. Of course this is just my personal opinion.
Generally, the variation is not great if you provide more or less identical conditions. However, the database does not contain information on whether, how many and what applications were running in parallel on the same GPU.
In my case I only 2 WU from the same project run in parallel on my GPU. No WUs from other GPU projects.
When I switch to another GPU project I stopp all other GPU projects. When Einstein is running 10 CPU cores are running WUs from CPU projects and 2 CPU cores run Einstein GPU.
So the conditions should fairly be the same.
Not necessarily. BRP takes a lot of transfers between the GPU and RAM. So it's possible that some of these applications running on the CPU saturate the memory bandwidth so that it starts to make a difference.
Ok. This might explain the variation in runtime but it doesn't explain why the estimation of WUs waiting to be executed also have this wide range.
I'm not sure if it is worth digging deeper into it.
I think it may be in some way helpful to the crew. Finally, you probably know best what is happening on your computer(s).
Good point.
Ok. Let me summarize some points that I have observed on BRP4 and BRP5:
Most of the BRP4 WUs have an estimated runtime of ca. 24 min. The effective runtime corresponds with estimation ~1,600sec +/- a some 100sec.
Sometimes single WUs have a significant longer estimation before they are started and during calculation. The effective runtime in this cases reflects the estimation means it is significant higher than the rest (up to 6,200sec).
BRP5 behave the same as far as I can see after 11 finished WUs (9 actuall validated). Normal runtime seems to be around 13,000sec for a single WU.
3 WU had runtime of ~25,000sec, ~30,000sec and ~41,000sec. For these the estimation was also higher than normal. The estimation of "normal" BRP5 WU is 3,5hrs +/-.
I'm executing two WU in parallel to reach ~90% GPU usage. I'm aware that the effective runtime for a single WU during prallel execution increases slightly. Single WU results in ~55% GPU usage.
If I remember correctly no parallel processing resulted in 20min per BRP4 WU means 3 WU/hr. With parallel processing I achieve >=4 WU/hr.
When 2 WUs are in execution on GPU boinc manager restricts CPU WUs to 10 WUs in parallel. That means 2 cores were used for Einstein GPU.
General setting in boinc: 92% CPU cores and 100% CPU time.
As far as I've seen VRAM usage was below 50% of the available capacity. RAM usage combined boinc, windows 7 prof 64, Firefox, ... is around 6 GB. That means 58 GB remain free.
Hardware: i7-3930k@3,2GHz, 64GB RAM, Radeon HD7850@1050Mhz and 2 GB VRAM@1350Mhz. Under full load temperature stays around 50°C so I suppose no step down of CPU speed caused by thermal problems.
Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs. Results for the individual hosts are against the left axis, the topmost line is the total for all hosts plotted against the right axis.
7 of the hosts are used exclusively on E@H, 2 are used on both E@H and A@H (75%/25%). Hosts with faster GPUs run 2 tasks at once (utilisation of 0.5), all others run one task per GPU.
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.
Eric Kaiser wrote:I can
)
BTW for BRP5 current range of estimated runtimes is from 3,5hrs up to 11hrs.
RE: RE: I can observe
)
On all 11 GPUs I have running Einstein WU time variation for any given GPU is small (within 5%) for either BRP4 or BRP5 tasks.
Hmm, weird... BRP4: This one
)
Hmm, weird...
BRP4: This one had runtime 6,271s and this one had runtime 1,292s
Even so CPU time is 231s compared to 165s
System is running on cool temperature (GPU 50°C, CPU 50°C) under full load and stable too.
I have no explanation for this wide range different runtimes.
Generally, the variation is
)
Generally, the variation is not great if you provide more or less identical conditions. However, the database does not contain information on whether, how many and what applications were running in parallel on the same GPU.
Since I already have a little more completed jobs I see that the computation time is almost exactly 10x longer on my Nvidia cards. I also think that the scoring should take into account a bonus for the fact that task take more time. Therefore, I think that a little more than 5,000 for task will be fine. Of course this is just my personal opinion.
Sebastian M. Bobrecki
)
In my case I only 2 WU from the same project run in parallel on my GPU. No WUs from other GPU projects.
When I switch to another GPU project I stopp all other GPU projects. When Einstein is running 10 CPU cores are running WUs from CPU projects and 2 CPU cores run Einstein GPU.
So the conditions should fairly be the same.
RE: ... So the conditions
)
Not necessarily. BRP takes a lot of transfers between the GPU and RAM. So it's possible that some of these applications running on the CPU saturate the memory bandwidth so that it starts to make a difference.
Ok. This might explain the
)
Ok. This might explain the variation in runtime but it doesn't explain why the estimation of WUs waiting to be executed also have this wide range.
I'm not sure if it is worth digging deeper into it.
RE: ... I'm not sure if it
)
I think it may be in some way helpful to the crew. Finally, you probably know best what is happening on your computer(s).
RE: I think it may be in
)
Good point.
Ok. Let me summarize some points that I have observed on BRP4 and BRP5:
Most of the BRP4 WUs have an estimated runtime of ca. 24 min. The effective runtime corresponds with estimation ~1,600sec +/- a some 100sec.
Sometimes single WUs have a significant longer estimation before they are started and during calculation. The effective runtime in this cases reflects the estimation means it is significant higher than the rest (up to 6,200sec).
BRP5 behave the same as far as I can see after 11 finished WUs (9 actuall validated). Normal runtime seems to be around 13,000sec for a single WU.
3 WU had runtime of ~25,000sec, ~30,000sec and ~41,000sec. For these the estimation was also higher than normal. The estimation of "normal" BRP5 WU is 3,5hrs +/-.
I'm executing two WU in parallel to reach ~90% GPU usage. I'm aware that the effective runtime for a single WU during prallel execution increases slightly. Single WU results in ~55% GPU usage.
If I remember correctly no parallel processing resulted in 20min per BRP4 WU means 3 WU/hr. With parallel processing I achieve >=4 WU/hr.
When 2 WUs are in execution on GPU boinc manager restricts CPU WUs to 10 WUs in parallel. That means 2 cores were used for Einstein GPU.
General setting in boinc: 92% CPU cores and 100% CPU time.
As far as I've seen VRAM usage was below 50% of the available capacity. RAM usage combined boinc, windows 7 prof 64, Firefox, ... is around 6 GB. That means 58 GB remain free.
Hardware: i7-3930k@3,2GHz, 64GB RAM, Radeon HD7850@1050Mhz and 2 GB VRAM@1350Mhz. Under full load temperature stays around 50°C so I suppose no step down of CPU speed caused by thermal problems.
Here's a graph showing the
)
Here's a graph showing the effect of the BRP5 introduction at 4k/task on the daily credit for my 9 hosts with NVIDIA GPUs. Results for the individual hosts are against the left axis, the topmost line is the total for all hosts plotted against the right axis.
7 of the hosts are used exclusively on E@H, 2 are used on both E@H and A@H (75%/25%). Hosts with faster GPUs run 2 tasks at once (utilisation of 0.5), all others run one task per GPU.
Be interesting to do a comparison against hosts with ATI and NVIDIA GTX6xx GPUs, if anyone would like to mail over the job log file for an e@h dedicated host.