Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

cecht

Joined: 7 Mar 18

Posts: 1537

Credit: 2915735289

RAC: 2111172

For the first time in months,

22 Apr 2020 16:04:31 UTC

Message 176984

(moderation:

)

For the first time in months, I finally got a batch of O2MDFV2g_VelaJr1 tasks, and boy, what a difference. They are much more GPU intensive, and much less CPU intensive, than prior GW tasks. Compared to my previous post, where running 2 GPUs@2x racked up a 175% load average and high CPU usage, the current VelaJr tasks have a mere 37% load average and about half the CPU usage. In contrast, GPU usage is now nearly pegged at 100% at 2GPU@2x, indicating that my RX570s are the bottleneck, not my 4-thread Pentium CPU. At 1x tasks, VelaJr time is ~17min, but at 2x it increases to ~20 min; again, no efficiency gain at higher task multiplicities indicates that GPU resources are limiting.
I thought I had it figured out how to most economically upgrade my system for better GW task production, but that's proving to be a moving target.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3965

Credit: 47233352642

RAC: 65381236

maybe it varies by the

22 Apr 2020 16:19:14 UTC

Message 176985 in response to message 176984

(moderation:

)

maybe it varies by the frequency? I noticed the GPU ram usage varying quite a bit with different files of different frequencies and what another user was calling "sequence" number at the end of the task ID. can't say I've noticed any difference in CPU/GPU utilization though, but that may be down to my nvidia card and associated app that is being used.

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519371204

RAC: 15292

Quote:cecht wrote:Its

22 Apr 2020 16:25:00 UTC

Message 176986 in response to message 176977

(moderation:

)

cecht wrote:

I too, tried that, but no luck, or partial luck. By excluding GW GPU tasks from one device in cc_config, I was able to have the excluded GPU run 3X pulsar tasks and the other GPU run 2x GW tasks, but only if I had my work queue already filled with both sets of tasks before setting the exclusion. As soon as pulsar tasks were all completed, the server did not reload them; only GW GPU tasks continued to download. I tried all combinations of exclude_gpu and gpu_usage in the two config files, but no luck. It seems that the E@H download server is interpreting <exclude_gpu> differently than we understand it to work. Is it a task priority issue?

Works ok here, as long as I have both types of tasks in my queue. There seems to be no way of getting the server to send me just gamma tasks if I have both allowed, but don't need any more gravity.

As far as I understand Tom M's problem, he had them in his local buffer, but they weren't being used. This doesn't make sense.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519371204

RAC: 15292

Ian&Steve C. wrote:maybe it

22 Apr 2020 16:28:29 UTC

Message 176987 in response to message 176985

(moderation:

)

Ian&Steve C. wrote:

maybe it varies by the frequency? I noticed the GPU ram usage varying quite a bit with different files of different frequencies and what another user was calling "sequence" number at the end of the task ID. can't say I've noticed any difference in CPU/GPU utilization though, but that may be down to my nvidia card and associated app that is being used.

Agreed, I get anything from 1.5 to 3.5GB used per gravity WU. Which is a pity as most of my cards are 3GB :-(

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

TBar

Joined: 3 Apr 20

Posts: 24

Credit: 891961726

RAC: 0

I seem to be having the same

22 Apr 2020 18:19:23 UTC

Message 176992

(moderation:

)

I seem to be having the same trouble as others with the 02MDF App. The VelaJr tasks are using around 3.2 GBs of vRam and I have quite a few 3 GB GPUs. I tried seeing how it would work by excluding the App on a machine with a 3GB & 4GB GPU. As with others, it only works if the tasks are pre-existing. Once you exclude the 02MDF App on the 3Gb GPU you are never sent any FGRPB1G tasks again. It also seems every time a 02MDF task finishes, the estimated run-time of the FGRPB1G task increases, that may have something to do with why it chooses not to send any.

Otherwise, it seems to work well on even the older machines, if, you could just have the Server send both tasks. Since the Server refuses to keep both GPUs busy, the only choice is to avoid the GW tasks altogether. I'm also seeing the trouble with Wingmen Not being assigned the GW tasks, perhaps if the Server problem were fixed more people would run the GW tasks? Then perhaps the Tasks could have Wingmen, and some points might be awarded?

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519371204

RAC: 15292

TBar wrote:I seem to be

22 Apr 2020 20:07:02 UTC

Message 176996 in response to message 176992

(moderation:

)

TBar wrote:

I seem to be having the same trouble as others with the 02MDF App. The VelaJr tasks are using around 3.2 GBs of vRam and I have quite a few 3 GB GPUs. I tried seeing how it would work by excluding the App on a machine with a 3GB & 4GB GPU. As with others, it only works if the tasks are pre-existing. Once you exclude the 02MDF App on the 3Gb GPU you are never sent any FGRPB1G tasks again. It also seems every time a 02MDF task finishes, the estimated run-time of the FGRPB1G task increases, that may have something to do with why it chooses not to send any.

Otherwise, it seems to work well on even the older machines, if, you could just have the Server send both tasks. Since the Server refuses to keep both GPUs busy, the only choice is to avoid the GW tasks altogether. I'm also seeing the trouble with Wingmen Not being assigned the GW tasks, perhaps if the Server problem were fixed more people would run the GW tasks? Then perhaps the Tasks could have Wingmen, and some points might be awarded?

I don't have the same problem here, because I have my 3GB GPUs and 4GB GPUs on separate machines, so I just have some machines do only gamma and some do both.

But I notice you think it might be to do with the time estimate and the queue. This is a known problem that I've reported in the past but they don't seem willing to sort it. The time remaining on gravity and gamma changes wildly when it works on one of them for a while. No idea why. I assume one runs a lot faster than expected, based on whatever's written into the task by the server.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

TBar

Joined: 3 Apr 20

Posts: 24

Credit: 891961726

RAC: 0

Peter Hucker wrote:But I

22 Apr 2020 20:36:37 UTC

Message 176997 in response to message 176996

(moderation:

)

Peter Hucker wrote:

But I notice you think it might be to do with the time estimate and the queue. This is a known problem that I've reported in the past but they don't seem willing to sort it. The time remaining on gravity and gamma changes wildly when it works on one of them for a while. No idea why. I assume one runs a lot faster than expected, based on whatever's written into the task by the server.

Yes, that appears to be what I'm seeing. The machine had run a few gamma tasks and the estimate was correct at around 20 minutes. When I received the first gravity task the estimate was around 6 minutes, however the actual run-time was the same as the gamma tasks, around 20 minutes. As the machine ran the gravity tasks the estimate on the gamma tasks increased to over an hour from the already correct 20 minutes.

I have since run some more gamma tasks and the estimate is back down to near 20 minutes, I had to remove the gravity tasks from the preferences to receive more gamma tasks. Now, I've added the gravity tasks back to preferences, but, the server hasn't sent any. I just removed the gamma tasks from preferences, perhaps the server will now send gravity tasks again? It will be interesting to see if the gravity estimates are back to 6 minutes again. The correct estimates would be 20 minutes for both tasks, on this machine, https://einsteinathome.org/host/12825379/tasks/0/0?page=17

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

TBar wrote:I seem to be

22 Apr 2020 20:44:29 UTC

Message 176998 in response to message 176992

(moderation:

)

TBar wrote:

I seem to be having the same trouble as others with the 02MDF App. The VelaJr tasks are using around 3.2 GBs of vRam and I have quite a few 3 GB GPUs. I tried seeing how it would work by excluding the App on a machine with a 3GB & 4GB GPU. As with others, it only works if the tasks are pre-existing. Once you exclude the 02MDF App on the 3Gb GPU you are never sent any FGRPB1G tasks again. It also seems every time a 02MDF task finishes, the estimated run-time of the FGRPB1G task increases, that may have something to do with why it chooses not to send any.

Otherwise, it seems to work well on even the older machines, if, you could just have the Server send both tasks. Since the Server refuses to keep both GPUs busy, the only choice is to avoid the GW tasks altogether. I'm also seeing the trouble with Wingmen Not being assigned the GW tasks, perhaps if the Server problem were fixed more people would run the GW tasks? Then perhaps the Tasks could have Wingmen, and some points might be awarded?

It had been so long since I had done the Exclude that I had forgotten about that. Yes, you will end up with only 1 GPU type in your cache. What I ended up doing, long time ago. I had set my cache to a high value and removed the cc_config and restarted boinc. It download a TON of work units, both kinds. Once it finished downloading, I would stop, place the cc_config back into the BOINC folder and restart boinc to allow the exclude to take effect. I would have to do this every couple of days.

TBar

Joined: 3 Apr 20

Posts: 24

Credit: 891961726

RAC: 0

It's starting to look as

22 Apr 2020 21:23:26 UTC

Message 176999

(moderation:

)

It's starting to look as though the Einstein Server is too old to properly execute the Exclude option. The only way I've been able to download any different tasks is to remove the other task from the preferences and then drastically increase the cache setting. I currently have only gravity selected in the preferences, I have 16 gamma tasks onboard estimated to take 23 minutes each, and the cache setting is One day. The Server refuses to send any tasks alleging none are needed.

So far, the only Wingman I've seen for a gravity task is a machine running a 2 GB GPU, and that task failed rather quickly, https://einsteinathome.org/workunit/450248518

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519371204

RAC: 15292

TBar wrote:Yes, that appears

22 Apr 2020 21:23:26 UTC

Message 177000 in response to message 176997

(moderation:

)

TBar wrote:

Yes, that appears to be what I'm seeing. The machine had run a few gamma tasks and the estimate was correct at around 20 minutes. When I received the first gravity task the estimate was around 6 minutes, however the actual run-time was the same as the gamma tasks, around 20 minutes. As the machine ran the gravity tasks the estimate on the gamma tasks increased to over an hour from the already correct 20 minutes.

I have since run some more gamma tasks and the estimate is back down to near 20 minutes, I had to remove the gravity tasks from the preferences to receive more gamma tasks. Now, I've added the gravity tasks back to preferences, but, the server hasn't sent any. I just removed the gamma tasks from preferences, perhaps the server will now send gravity tasks again? It will be interesting to see if the gravity estimates are back to 6 minutes again. The correct estimates would be 20 minutes for both tasks, on this machine, https://einsteinathome.org/host/12825379/tasks/0/0?page=17

I get similar results - the estimated time is correct if it's been running that type of task for several in a row (I haven't counted how many are required). Otherwise one goes way too high and one way too low, despite them both being similar in run time. As discussed in another thread somewhere, it's the estimate of how fast each will run by the server which is way out. Your Boinc client knows how long it took for the last several gamma, and it believes the server when it says gravity will run faster. Requests to sort this have been ignored by the admins.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner