Einstein@home want damage my PC (VGA cards)?

Sutaru Tsureku
Sutaru Tsureku
Joined: 26 Oct 09
Posts: 24
Credit: 102568165
RAC: 17366
Topic 209890

In the topic name I have a little exaggerated. Wink  

Here is the story...
My PC with two Xeon CPUs and four R9 Fury X VGA cards use ~ 850 W if fully loaded with SETI@home.
https://einsteinathome.org/de/host/12566972

This PC use ~ 1.050 W if the VGA cards crunch Einstein@home (on CPU SETI@home).
 
For ~ two weeks during the weekly maintenance at SETI@home, I noticed the VGA card fans are running much faster (during crunching Einstein@home WUs on the VGA cards - because no SETI@home WUs in BOINC)...

I looked at the current measuring device and the PC used ~ 1.400 W (70 % of max. of PSU).
And the VGA card temperatures increased a lot.
 
Normally a WU with the "Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) windows_x86_64" app last ~ 560 seconds.
Example:
https://einsteinathome.org/de/task/679425282

Now they last just the half time, ~ 270 seconds.
Examples:
https://einsteinathome.org/de/task/679452402
https://einsteinathome.org/de/task/679452404
https://einsteinathome.org/de/task/679453648
https://einsteinathome.org/de/task/679466036
 
 
 
OK, the PC have a 2.000 W PSU - but I like to let it run with max. ~ 50 % of the max. = ~ 1.000 W.
 
Maybe other people build their PCs different, average ~ 75 % usage of the max.
Maybe their PSU was used at ~ 100 % back then - I mentioned on the top?
 
It was wanted to send out this kind of "short/hard" WUs - or a mistake?
 
What will bring the future, more of this kind of this "short/hard" WUs?
 
Thanks.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110039253980
RAC: 22387097

Dirk Sadowski wrote:My PC

Dirk Sadowski wrote:

My PC with two Xeon CPUs and four R9 Fury X VGA cards use ~ 850 W if fully loaded with SETI@home.
https://einsteinathome.org/de/host/12566972

This PC use ~ 1.050 W if the VGA cards crunch Einstein@home (on CPU SETI@home).

You shouldn't expect the power requirements to be precisely the same.  It would be normal for different apps on different projects to have different power needs.  Whether the difference you quote is reasonable or not is another matter.  I have no experience crunching Seti GPU tasks so I can't comment.

Part of the story might be to do with how many concurrent tasks run on each GPU device at each project and how many of your CPU cores are running CPU tasks.  You need to supply those sorts of details for anyone to make some sort of guess as to whether or not the power difference is explainable.
 

Dirk Sadowski wrote:
For ~ two weeks during the weekly maintenance at SETI@home, I noticed the VGA card fans are running much faster (during crunching Einstein@home WUs on the VGA cards - because no SETI@home WUs in BOINC)...

If the Einstein app uses more power, it will need faster running fans to dissipate the heat.
 

Dirk Sadowski wrote:

Normally a WU with the "Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) windows_x86_64" app last ~ 560 seconds.
Example:
https://einsteinathome.org/task/679425282

Now they last just the half time, ~ 270 seconds.
Examples:
https://einsteinathome.org/task/679452402
https://einsteinathome.org/task/679452404
https://einsteinathome.org/task/679453648
https://einsteinathome.org/task/679466036

We are analysing data from the large area telescope (LAT) onboard the Fermi satellite.  The data received is broken into blocks (individual data files) which have names like LATeah0040L.dat (the current one).  These data files are distributed to all hosts participating when the first tasks depending on them are distributed.  There is usually a single data file in progress at any one time (apart from 'resend' tasks which might require a previous data file) and this file could be current for days to weeks as there are many tasks (sets of different parameters) to be distributed for each data block.

When tasks for a data file are created, the 'slicing and dicing' is done in such a way that the vast majority of tasks will be of a similar 'size' (equal squares if you like).  But, just like when cutting up a circular pie into a large number of small squares, there will always be a few squares at the edges that aren't full of pie.   Bernd used the term 'short ends'  some years ago when first asked about this.  These seem always to be found in the initial tasks for a brand new data block when first released.  They have less work content so crunch faster.

If you are lucky, you may get a couple when the data block changes but you have to be quick as they don't last for long.  If your host asks for a *lot* of new work at just the right time, you may get a whole bunch of these.  If you check this thread, you will see that others have noticed the same thing.

Taking your first example of a fast running task in your list, its full name is LATeah0040L_36.0_0_-1.5e-11_504_0.  The first parameter after the data block name (highlighted in bold) is a frequency value.  Currently, with tasks for new data blocks, this parameter starts at 4.0 and increments by 8.0 so the sequence is 4.0, 12.0, 20.0, 28.0, 36.0, 44.0, ....  I don't know if it's always this same sequence.  It has been different in the past.  Quite a few of the tasks at these low frequencies seem to be 'short ends'.  They are quickly gone and everything reverts to normal for many days until there is a further change of the data block being used.
 

Dirk Sadowski wrote:
It was wanted to send out this kind of "short/hard" WUs - or a mistake?

There's no need to assume that something has changed with the project or that a mistake has been made as this is quite normal.  Short running tasks are no harder on the hardware than standard running tasks.  They pose no additional danger to the health of your GPUs.

This artifact of the way FGRP style tasks are created has been around for many years.  I think where the actual work content is a specific fraction (eg 0.5, 0.33, etc) of the normal content, the credit award is adjusted accordingly.  I don't pay much attention these days but I seem to recall seeing half credit, one third credit, etc., outcomes in the past.

I've just noticed that LATeah0040L.dat has finished and there are now low frequency new tasks for LATeah0041L.dat.  One of my hosts has some and the first one has crunched faster than the high frequency (1204.0) final tasks for the previous data file (I promoted one task to find out :-) ).  Another of these new tasks is estimated to take about one fifth of the usual time whilst the others have the normal estimate.  I expect this one is a recognised 'short end' which will get about one fifth of the normal credit award, just based on the time estimate.

 

Cheers,
Gary.

mikey
mikey
Joined: 22 Jan 05
Posts: 11971
Credit: 1834042554
RAC: 225809

Dirk Sadowski wrote:In the

Dirk Sadowski wrote:

In the topic name I have a little exaggerated. Wink  

Here is the story...
My PC with two Xeon CPUs and four R9 Fury X VGA cards use ~ 850 W if fully loaded with SETI@home.
https://einsteinathome.org/de/host/12566972

This PC use ~ 1.050 W if the VGA cards crunch Einstein@home (on CPU SETI@home).
 
For ~ two weeks during the weekly maintenance at SETI@home, I noticed the VGA card fans are running much faster (during crunching Einstein@home WUs on the VGA cards - because no SETI@home WUs in BOINC)...

Thanks.

I run a Project called Xansons on one of my gpu's and according to Msi AfterBurner it uses 99% of my gpu, but when I use the same software to check a MilkyWay gpu  workunit it's only using about 75% of the gpu, you are probably seeing the same thing between Seti and Einstein. Seti is probably using less of the actual gpu's capabilities so the fans run less or slower than when you are crunching at Einstein. Each Project has their own programmers to write the software for their apps so each one is slightly different from every other Project.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

mikey wrote:. Seti is

mikey wrote:
. Seti is probably using less of the actual gpu's capabilities so the fans run less or slower than when you are crunching at Einstein. Each Project has their own programmers to write the software for their apps so each one is slightly different from every other Project.

Actually it's the opposite. The apps of Seti are highly refined for the GPUs so they are more efficient (Raistmer spent more than 1 year refining the OpenCL applications. Testing, retesting, modifying the apps, retesting. You get the idea.). That being said, you can run more on them and use less GPU/CPU than Einstein apps and OC like crazy. The Einstein apps, I can run more than 1 on each GPU but they tend to be unstable so it forces me to modify the GPUs to ensure they don't crash.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232287015
RAC: 100812

Being able to OC highly on a

Being able to OC highly on a GPU app means the app is not utilizing the GPU very efficiently. Or maybe the workload is just not parallel enough for a GPU. This is not 90% vs 100% GPU utilization but utilizing the parallel strength of a GPU.

Number apps like Primegrid drop my boost clocks like 200mhz or nearly 10% compared to the boost blocks when running SETI. It runs hotter at lower clocks at at a higher GPU utilization % than SETI. E@H just has a lower GPU utilization which could be due to the calculations requiring more CPU, PCI-E data transfer, etc but IMO it pushes GPUs harder than SETI.

A similar comparison is AVX on CPUs. Since AVX combines instructions into one and can end up doing more math per cycle it runs hotter and CPUs these days have lower boost clocks while running AVX apps than others. SETI would be the non AVX app of the CPU world.

I'm guessing SETI is just not suited for GPUs. SETI has a pretty poor RAC for GPUs and it's nothing crazy between CPU and GPU. Asteroids is similar.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Again, wrong assumptions

Again, wrong assumptions here.  

Just because Seti OpenCl runs more efficiently than Einstein OpenCl doesn't mean the Einstein OpenCl is superior. Seti's OpenCl does the majority of its calculation on the GPU as opposed to having to go back and forth to the CPU. It's only at the end where it requires to contact the CPU to finish up. I believe Einstein does this as well, why we see it stall at 89.9 then jump to 100% with CPU usage skyrocketing at the end of each work unit.

Einstein has a higher usage of PCIe and yes pushes GPUs harder because it isn't as efficient. 

Seti hasn't updated any AVX because the person that previously wrote the code is MIA and no one else has the skills to update them. So don't hold your breath on that one. (In case it has escaped your notice, all the code has been written by volunteers as the staff has all moved on to new projects.)

I'm guessing SETI is just not suited for GPUs. SETI has a pretty poor RAC for GPUs and it's nothing crazy between CPU and GPU. Asteroids is similar.

This is a ridiculous statement.  If you were familiar with Seti, you would know that credit for work is an arbitrary value set by one of the project directors that is well know to be steadily decreasing as the efficiency and productivity of GPU work increases. (That's why it's referred to as CreditScrew by people that crunch it)  If all you are after is credit, then you are better off running GPUGrid or one of the other project that offers outrageous credits for work done

 

Apologies to OP, we are WAY off topic

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110039253980
RAC: 22387097

Zalster wrote:... Seti's

Zalster wrote:
... Seti's OpenCl does the majority of its calculation on the GPU as opposed to having to go back and forth to the CPU. It's only at the end where it requires to contact the CPU to finish up. I believe Einstein does this as well, why we see it stall at 89.9 then jump to 100% with CPU usage skyrocketing at the end of each work unit.

It doesn't 'stall' at 89.997% - the %done stops increasing at that point because the primary pass through the data (which uses single precision to find 'candidate' signals) is complete.  There is a followup stage, which uses double precision, to sort/check the candidates into a 'toplist' of the 10 'best' candidates.  The %done estimate is not updated during the followup stage although checkpoints still appear to be saved.

If a GPU has some double precision capability (and probably most have these days), that followup stage will be done on the GPU.  If not, it will be done on the CPU and will take a lot longer.  Essentially what the design seems to be is to budget for about 10% of the total time for the followup stage without implementing any sort of continuing update of the progress during that time.  I don't know whether or not there's any increased CPU involvement while the toplist is being assembled.  if there is, it would most likely be by necessity rather than inefficient design.

Zalster wrote:
Einstein has a higher usage of PCIe and yes pushes GPUs harder because it isn't as efficient.

I don't think there's much point in claiming that one app is more efficient than the other.  It's a bit like saying that one lot of programmers don't know what they are doing or aren't really trying.  The truth is that the problems to be solved are different, with different data sets, different algorithms, different memory requirements and different degrees of parallelism that can be taken advantage of, etc.  Given time and experience and manpower, all algorithms tend to improve in efficiency.  I'm happy to celebrate that I've observed that over many years.

Zalster wrote:
Apologies to OP, we are WAY off topic

I'm not too sure about that :-).   The OP admitted he was doing a 'drive by' exaggeration :-).  I think he wanted this sort of discussion :-).  That's OK by me, as long as we keep it civil.

 

Cheers,
Gary.

tazzduke
tazzduke
Joined: 7 Aug 09
Posts: 9
Credit: 78302984
RAC: 3643

Greetings All We have

Greetings All

We have established that different projects use the GPU differently with regards to the type of application it uses.

I have noticed the same, thats why I only crunch SETI or EINSTEIN, as the GFN GPU tasks over at PRIMEGRID send my GPUS into the 70s unless I manually adjust my fans to keep them in the 60s (This is centigrade by the way) and if I do that it becomes unbearable in the room where these PC's are.

A couple of other things to come to mind, what cooling the OP has in place and also what software does he use to look after the GPU's ie MSI Afterburner, noticed that he has 4 cards in the one machine, where a change in temperature outside of the PC could cause a significant change.

Cheers

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.