Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519320352
RAC: 14087

Ian&Steve C. wrote:what about

Ian&Steve C. wrote:

what about those two tasks from just yesterday that errored out from not enough memory?

https://einsteinathome.org/host/12735373/tasks/6/0

it works fine... until it doesn't. that's why you shouldn't use cards with less than 4GB right now. unless you're fine trashing work every now and then and letting someone else process it?

I guess it depends on whether the Einstein servers want the most work done or care about network and disk bandwidth sending things out twice.

I've switched all mine to Gamma.  Four 3GB cards and one 4GB (but slower) card.

Can't the server work out what the RAM requirement is and send out only little tasks to smaller cards?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Two errors on May 14, 5 on

Two errors on May 14, 5 on May 16. In these last all wingmen errored out.

In May 14 only one out of 5 wingmen completed a task. Is there something wrong with the tasks?

Tullio

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062362642
RAC: 65316448

Peter Hucker wrote: Can't

Peter Hucker wrote:

Can't the server work out what the RAM requirement is and send out only little tasks to smaller cards?

it tries to. but as I showed in one of my previous posts the scheduler has a bug in it that isn't always estimating the GPU RAM required properly. it doesnt seem to ever think a tasks needs more than like 1800MB, even when the task does need more. so it gets sent to cards with at least 1800MB memory. that's the first problem.

the second problem is that they are looking at global (total) memory on the GPU, and not how much is actually available. BOINC records both values. by looking at this global value, these 1800MB tasks will be sent to 2GB GPUs, but if that GPU is driving a desktop environment (which is likely the case), then ~300MB of the GPU is not available and the 1800MB tasks will still fail.

 

they need to switch to looking at available memory, and fix the bug that's underestimating the ram required. I posted this info in the technical news thread. they either haven't seen it or haven't had the time/resources to get to it. best thing to do for now is that if you're aware of the problem, the user can and should just not crunch GW tasks on GPUs with <4GB VRAM, as you have done.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062362642
RAC: 65316448

tullio wrote:Two errors on

tullio wrote:

Two errors on May 14, 5 on May 16. In these all wingmen errored out.

Tullio

no they didn't. all the GPUs without enough memory, including yours, are the ones that errored it out. the two from May 14th have a successful completion by a host with RTX 2070 Super GPUs which have sufficient amount of GPU memory for the task. it's waiting to be sent to a GPU with enough memory.

the tasks from today, May 16th, also have only been sent to GPUs without enough memory so far, and are waiting to be sent to a host that can actually process them.

they will succeed once sent to a proper host. The WU isnt "bad" as you're trying to imply.

 

the apparently large amount of hosts with 2-3GB GPUs have got to be the main reason there are so many resends necessary and why so many tasks are still waiting for validation. the proper fix should be applied server side at the project, but until that is done, users with 2-3GB GPUs can help the situation by just removing them from GW for now.

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519320352
RAC: 14087

Ian&Steve C. wrote:they need

Ian&Steve C. wrote:
they need to switch to looking at available memory, and fix the bug that's underestimating the ram required. I posted this info in the technical news thread. they either haven't seen it or haven't had the time/resources to get to it. best thing to do for now is that if you're aware of the problem, the user can and should just not crunch GW tasks on GPUs with <4GB VRAM, as you have done.

I just had a quick go at running GW on one of my new machines, and the RAM wasn't the problem in this case (I was lucky enough to get small ones).

CPU: Intel Xeon X5650 (x2)
GPU: AMD Radeon R9 280X (x2)

For the purposes of the test, I paused all CPU WUs.

Running 1 GW task per GPU
GPU RAM is at 1.7GB used per card out of 3GB (neither is used to display the screen)
One CPU core is maxed out per card/WU
Each card is running at only 25%!

What are the ratios of speeds of your cards (on your GW machine) compared to mine, and your CPU compared to mine?  Because it seems like I'd need a much more powerful CPU, but if I had better cards like you do, I'd need a CPU that hasn't been invented yet!

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062362642
RAC: 65316448

Peter Hucker wrote:I just

Peter Hucker wrote:

I just had a quick go at running GW on one of my new machines, and the RAM wasn't the problem in this case (I was lucky enough to get small ones).

CPU: Intel Xeon X5650 (x2)
GPU: AMD Radeon R9 280X (x2)

For the purposes of the test, I paused all CPU WUs.

Running 1 GW task per GPU
GPU RAM is at 1.7GB used per card out of 3GB (neither is used to display the screen)
One CPU core is maxed out per card/WU
Each card is running at only 25%!

What are the ratios of speeds of your cards (on your GW machine) compared to mine, and your CPU compared to mine?  Because it seems like I'd need a much more powerful CPU, but if I had better cards like you do, I'd need a CPU that hasn't been invented yet!

I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519320352
RAC: 14087

Ian&Steve C. wrote:I see

Ian&Steve C. wrote:
I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

GPU utilization is in agreement from Windows task manager (compute 1 graph), MSI Afterburner (a fan speed and overclocking tool for AMD cards), and GPU-Z.  Windows task manager shows a full CPU core taken.  If I run two GW simultaneously on a card (providing enough RAM is available on the card), two CPU cores are taken fully, and the GPU usage increases to 50%.

Ian&Steve C. wrote:

but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

Not easy to do.  My i5 is the main computer in the living room and I don't want it taken to bits or making noise.

But it does have a weaker card in it, the Radeon RX 560, and it only gets that up to 70%, so I doubt it would do well with the bigger cards.  Maybe even better CPUs have more extensions it likes, or have a faster AVX part?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47062362642
RAC: 65316448

I only suggested it as a

I only suggested it as a test, it was just curiosity on my part if you'd see greater GPU utilization on the same GPU with a more powerful CPU pushing it. you don't necessarily have to leave it that way. but it's your hardware. there may be more factors involved anyway, since the AMD cards use a different app, and I don't have any AMD systems to be able to inspect the AMD/ATI GW linux binary.

_________________________________________________________________________

TBar
TBar
Joined: 3 Apr 20
Posts: 24
Credit: 891961726
RAC: 0

I have compared the GW App on

I have compared the GW App on both an older Core2 Quad and newer i7-8700 using a NV 970 in Ubuntu. The App is around 25% faster on the 8700, and it doesn't use a full CPU core on the newer CPUs running AMD GPUs. There isn't much difference with the GR App, but there is a large difference with GW App between older and newer CPUs. A recent GW test on a 'new' AMD 570 on an i7-6700 showed CPU usage around 45% and GPU usage around 60-70% with higher spikes. All this is under Ubuntu.

Peter Hucker wrote:

Ian&Steve C. wrote:
I see you're running windows, where are you getting the 25% value from? is that GPU utilization?

GPU utilization is in agreement from Windows task manager (compute 1 graph), MSI Afterburner (a fan speed and overclocking tool for AMD cards), and GPU-Z.  Windows task manager shows a full CPU core taken.  If I run two GW simultaneously on a card (providing enough RAM is available on the card), two CPU cores are taken fully, and the GPU usage increases to 50%.

Ian&Steve C. wrote:

but after inspecting the GW_nvidia binary a little, I think your issue is the lack of AVX instruction support on your CPU. Your CPUs are old, yes. but it looks like the CPU portion of the GW app uses AVX for some of its calculations, and your CPU doesn't have AVX so it's probably failing over to a slower method which is making the GPU wait around longer and not be used as much. My CPUs do support AVX. my gaming system's 9700K supports AVX2.

you should drop that card into your i5-8600k (it supports AVX and AVX2) system and see if the GPU utilization is better

Not easy to do.  My i5 is the main computer in the living room and I don't want it taken to bits or making noise.

But it does have a weaker card in it, the Radeon RX 560, and it only gets that up to 70%, so I doubt it would do well with the bigger cards.  Maybe even better CPUs have more extensions it likes, or have a faster AVX part?

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519320352
RAC: 14087

Ian&Steve C. wrote: I only

Ian&Steve C. wrote:

I only suggested it as a test, it was just curiosity on my part if you'd see greater GPU utilization on the same GPU with a more powerful CPU pushing it. you don't necessarily have to leave it that way. but it's your hardware. there may be more factors involved anyway, since the AMD cards use a different app, and I don't have any AMD systems to be able to inspect the AMD/ATI GW linux binary.

I could.  I had one of those cards in there before, it's just a lot of hassle to get into it.  5 of my machines are easily accessible, that one is not.  Oh well, as far as I'm concerned, I'm happy as long as the cards are being fully utilised.  At the moment that's Gamma and Milkyway.  If I ever couldn't find a project I liked that fully used them, I'd build a more modern computer to run them.  I'll let people like yourself with more modern chips do the gravity and I'll take care of the gamma.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.