Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

My only wingman uses a 2 GB

My only wingman uses a 2 GB board.So what?

Tullio

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519315251
RAC: 13889

tullio wrote: Watching GW

tullio wrote:

Watching GW GPU tasks on my BOINC manager I see a curious thing. Progress rises very rapidly up about 14% in about 3 minutes, then falls back to 0.470 and rises more slowly.

Tullio

 

That happens on many projects, I wouldn't worry about it.  It's just the task going through two different stages and having a rubbish progress meter.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519315251
RAC: 13889

Ian&Steve C. wrote: weird

Ian&Steve C. wrote:

weird stuff happens when you run out of video memory and produce computation errors.

https://einsteinathome.org/host/12735373/tasks/6/0

 

you should remove this GPU from running GW tasks. these things will continue to happen until you do so. run Gamma Ray on the GPU if you still want to use it for Einstein. If you want to run GW tasks, run it on your CPU, or upgrade to a GPU with at least 4GB of memory.

 

Since most of his tasks are completing ok, I don't see a problem.  If they go wrong, they go wrong quickly, not wasting his GPU's time, and the server just hands it to someone else.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519315251
RAC: 13889

Ian&Steve C. wrote: I don't

Ian&Steve C. wrote:

I don't know how you don't understand this.

it has nothing to do with your wingmen or youtube or windows updates or whatever other nonsense you are trying to distract with. the GPU is insufficient for certain GW tasks, and there is nothing you can do to prevent from getting them except stopping GPU GW processing.

some tasks will run OK. some tasks will not. some tasks require less than 2GB of GPU memory, these will succeed. some tasks require more than 3GB of GPU memory. these are the ones that will fail. they come in random and unpredictable times. the reason you haven't seen failures today yet is simply because you haven't been sent the large tasks in a little while. that doesn't mean the issue is fixed. you WILL receive the large tasks again and you will produce errors again.

please do everyone a favor and just stop GW processing on that 3GB GPU. you are just making the already bad situation worse.

What you should be doing is hitting the developers with a clue stick so that large tasks are not sent to people with small cards.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3959
Credit: 47048132642
RAC: 65122858

tullio wrote: My only

tullio wrote:

My only wingman uses a 2 GB board.So what?

Tullio

So it will probably fail, and then be resent to someone else. and keep doing so until it lands in the hands of a GPU with enough GPU memory.

He's your only wingman "right now". but the nature of the validation process creates new tasks to be sent to additional hosts when the first two don't agree or one of them returns an error.

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519315251
RAC: 13889

Tom M wrote: Preliminary

Tom M wrote:

Preliminary results with an Amd Radeon 5700 on GW gpu indicate almost not change in processing speeds from 1 task to 3 tasks.

It runs from 18+ minutes to under 21 minutes.  So basically an R5700 can out produce high-end Nvidia cards if you run 3 gpu tasks.

There may be memory issues with 3 gpu tasks just like there is with a Gtx 1060 3GB video card.

I am switching my Pulsar Search#1 box to run both P and GW gpus since I expect the R5700 to  end up on it possibly by the weekend.

Tom M

 

AMD cards don't care if they run out of memory (they just use system memory).  Nvidias do.  I avoid Nvidias.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3959
Credit: 47048132642
RAC: 65122858

Peter Hucker

Peter Hucker wrote:

Ian&Steve C. wrote:

weird stuff happens when you run out of video memory and produce computation errors.

https://einsteinathome.org/host/12735373/tasks/6/0

 

you should remove this GPU from running GW tasks. these things will continue to happen until you do so. run Gamma Ray on the GPU if you still want to use it for Einstein. If you want to run GW tasks, run it on your CPU, or upgrade to a GPU with at least 4GB of memory.

 

Since most of his tasks are completing ok, I don't see a problem.  If they go wrong, they go wrong quickly, not wasting his GPU's time, and the server just hands it to someone else.

 

hard to say. when I was testing out my 1060 3GB to see the failure mode, what happened was the card started loading data into GPU memory, and filled up in the first 10-15 seconds or so, at that point, the task progress jumped from 0 to 100% and moved to "complete", but kept trying to reload the data into the GPU memory over and over and wouldn't start a new task. I watched the GPU mem climb to full then crash to 0, and over and over and over. it sat there for quite a while with this behavior (several minutes) until I manually aborted it. but the reported run time only logged 15 seconds or so, when in reality it had really wasted minutes and minutes (and only that because I intervened) of the PCs time. It's probably the similar for others. so you can't use the runtime to accurately gauge how much time it wasted on the the afflicted hosts.

 

I think if you are aware that your system is producing errors and bad results for a project, and you know the reason, and you know how to fix it, you have the obligation to do so for the benefit of the project. not just let it keep pumping out errors because "some" are still succeeding. this same mindset plagued several projects where bad AMD drivers caused consistently incorrect computations on RX5700 (navi) cards, which validated with each other, but invalidated against everyone else. or when Nvidia changed something in their ~436 Windows drivers that only affected one type of WU at SETI. several people thought they would just leave their system generating tons of bad results choosing to ignore the bad results and only look at the ones that validated without understanding what was happening.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3959
Credit: 47048132642
RAC: 65122858

Peter Hucker wrote: What you

Peter Hucker wrote:

What you should be doing is hitting the developers with a clue stick so that large tasks are not sent to people with small cards.

I've posted the relevant information in the tech forum, but I can't make the required people read it, and/or they may have other priorities. 

theres a better chance of it getting more attention if more people bring it up, not just one man.

 

the squeaky wheel gets the grease. or something like that.

_________________________________________________________________________

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592162352
RAC: 773263

I am starting to think they

I am starting to think they are willing to accept the overhead. In my case I was still producing good results 2/3 of the time. 

As an aside my 3GB cards are now only doing pulsars, the 6GB card get GWs. 

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519315251
RAC: 13889

Ian&Steve C. wrote: hard to

Ian&Steve C. wrote:

hard to say. when I was testing out my 1060 3GB to see the failure mode, what happened was the card started loading data into GPU memory, and filled up in the first 10-15 seconds or so, at that point, the task progress jumped from 0 to 100% and moved to "complete", but kept trying to reload the data into the GPU memory over and over and wouldn't start a new task. I watched the GPU mem climb to full then crash to 0, and over and over and over. it sat there for quite a while with this behavior (several minutes) until I manually aborted it. but the reported run time only logged 15 seconds or so, when in reality it had really wasted minutes and minutes (and only that because I intervened) of the PCs time. It's probably the similar for others. so you can't use the runtime to accurately gauge how much time it wasted on the the afflicted hosts.

Doesn't happen with my AMDs. :-P
They just use system memory and run slower.  No invalid results.

Ian&Steve C. wrote:

I think if you are aware that your system is producing errors and bad results for a project, and you know the reason, and you know how to fix it, you have the obligation to do so for the benefit of the project. not just let it keep pumping out errors because "some" are still succeeding. this same mindset plagued several projects where bad AMD drivers caused consistently incorrect computations on RX5700 (navi) cards, which validated with each other, but invalidated against everyone else. or when Nvidia changed something in their ~436 Windows drivers that only affected one type of WU at SETI. several people thought they would just leave their system generating tons of bad results choosing to ignore the bad results and only look at the ones that validated without understanding what was happening.

If I had a machine that produced 8 good results, then failed on 2, but quickly, I'd leave it as is.  It's doing good work.  If I got 1 good result and 9 failures, I'd consider it was wasting server bandwidth.  But no matter which of the above occurs, surely the programmers at Einstein can see the high failure rate, find the problem, and fix it at their end?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.