BOINC GPU Tasks - Computation Errors

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 412
Credit: 10248763455
RAC: 20036508

Ian&Steve C.

Ian&Steve C. wrote:

...

It's officially supported for both. check the supported GPUs chart in the release notes for his driver (446.14)

 

http://us.download.nvidia.com/Windows/446.14/446.14-win10-win8-win7-release-notes.pdf

p.28-29

... oops, this is a new insight for me.

Just looking at the support list under "DOWNLOAD DRIVERS" is, I guess, then misleading.

Wonder why NVIDIA is doing this (probably because the list would be too long)?

Thanks Ian&Steve C.

Masse
Masse
Joined: 18 Mar 05
Posts: 5
Credit: 570591
RAC: 9

Sorry to say, but it still

Sorry to say, but it still seems that you actually didn't read or understood what I initially wrote ... With Einstein tasks, the GTX 1650 performs as slow as the GT 710, despite twice as much memory, faster memory type and about 10 times higher flops performance. This issue doesn't occur with other projects, where it works as it should. Any day in the week, this is software related and not to hardware and belonging drivers ...

[ 4x E5-2680v4 + i9-10900K + Q9550S + A57 | 334GB | 4x RTX A2000 + M6000 + M40 + 3x GTX 1650 + UHD Graphics 630 + 2x Tegra X1 | Lubuntu 21.04 ]

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47206592642
RAC: 65458307

Masse wrote:With Einstein

Masse wrote:

With Einstein tasks, the GTX 1650 performs as slow as the GT 710, despite twice as much memory, faster memory type and about 10 times higher flops performance.

 

looking at your completed/validated tasks (ignoring errors):

you have THIS TASK which completed in 14,310 seconds (almost 4hrs) on your GT 710

you have THIS TASK which completed in 2,547 seconds (42 minutes) on your GTX 1650

 

perhaps you could explain how these are the same? it's running nearly 6x faster on the GTX 1650 from what I can see

 

also keep in mind that Gravitational Wave GPU tasks are still pretty CPU bound. Xeons with low clock speed don't do all that great with GW GPU tasks with faster GPUs. probably not a bottleneck for your GT710 though.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117780145295
RAC: 34772165

Masse wrote:Sorry to say, but

Masse wrote:
Sorry to say, but it still seems that you actually didn't read or understood what I initially wrote ... With Einstein tasks, the GTX 1650 performs as slow as the GT 710, despite twice as much memory, faster memory type and about 10 times higher flops performance. This issue doesn't occur with other projects, where it works as it should. Any day in the week, this is software related and not to hardware and belonging drivers ...

I've emphasised some pretty strong claims you are making.  Let's see how they stack up.

In your initial message, essentially you said 3 things - 1) all GPU tasks give errors, 2) Boinc was showing some wildly different 'progress' figures, and 3) two very different GPUs were performing at the very same speed, when they shouldn't.

I looked at your full list of GPU tasks on the website.  Currently, there are 249 in total.  The oldest in the list was sent on May 23, returned successfully on the same day with a run time of 2136 secs and a CPU time of 2081 secs.  If you click the Task ID link for that task you can actually see information about that task that was returned to the project.  One of the things showed the GPU being used:-

GPU Device used for Search/Recalc and/or semi coherent step: 'GeForce GTX 1650

A few entries down your tasks list was a failed task which showed a run time of 8498 secs with 0 CPU time.  When using the Task ID link for it, I found:-

[ERROR] Couldn't get OpenCL device from BOINC (-1)!

Boinc has the job of detecting your hardware.  The Einstein app can't work if Boinc doesn't properly detect and whilst Boinc isn't perfect, the most likely cause of this problem (since the GPUs aren't 'unusual') is that for some odd reason, drivers and libraries aren't allowing Boinc to do a proper detection.

Another thing to realise is that the Einstein app creates regular progress points (called checkpoints) while running.  Until the first checkpoint is written (usually takes about a minute) Boinc 'fakes' the progress figure based on the estimated time it will take.  This can be wildly wrong initially.  In those cases, the first checkpoint contains the 'real' progress and the time it took, which allows Boinc to adjust the figures.  If you see tasks taking a long time with ever increasing fake progress, you can guess that something might be wrong.  If this has continued for a while, you could stop Boinc and then restart it.  If the fake progress disappears and the task restarts from zero (again with fake progress), then you know that checkpoints are not being written and that there is a real problem.

So, in summary, the true situation is not as you described it in your initial message.  1)Not all GPU tasks are failing, 2) Boinc's fake progress can't tell you what the problem really is, and 3) the two GPUs are definitely not performing at the same speed.

If we look at what has happened since the May 23/24 period, there are now examples of Boinc being able to detect your GT 710 - eg. this task ID.  It has a run time of 14,311 secs and a CPU time of 13,622 secs.  That task was sent on May 25 and returned on May 27.  Tasks crunched on the other GPU still have a run time around 2,000 secs (like that very first task) so there is really a 7 to 1 ratio.  Yet you still claim "as slow as the GT 710".

For Boinc to now detect your GT 710 properly, it seems highly likely that you have changed something at your end to do with drivers/OpenCL libraries.  Maybe you could think about that again and tell us what has changed.  It might be helpful to others having similar difficulties.  At least you should correct the false impression that the Einstein app trashes the performance of a GTX 1650.

As a final comment, you should realise that the only GPU app you are running here is the GW O3ASE test app which may have failures and is using simulated test data at the moment.  While testing, there can easily be failures.  When the real data eventually starts, there may be a 'tweaked' app and things might be different again.  In particular (based on past experience with real data) GPUs like a GT 710 are likely to fail on many (if not all) the real tasks, simply because it doesn't have enough VRAM and is woefully slow even if it did have enough.  We don't know for sure yet, but it seems unlikely that memory requirements will suddenly decline when the real data arrives.  If you want to use the GT 710, you would be wise to use it for the gamma-ray pulsar GPU tasks only.

Cheers,
Gary.

Masse
Masse
Joined: 18 Mar 05
Posts: 5
Credit: 570591
RAC: 9

It's simple, if two Einstein

It's simple, if two Einstein GPU tasks starts at the very same time (one with GTX 1650 and one with GT 710) with the same type of data and both ends in the very same time and both failed, something must be wrong. If at the same time GPU tasks in other projects are working without issues, logically it has to be isolated to Einstein.

This is what's presented on screen in BOINC, regardless what's in a log file. Ever since I started with computers back in the mid 1980's, I have come across several systems that have generated flawed/faulty log files, so I never read those as it would have been a book of law. Log files are not always perfect tools, when searching for errors ...

Regarding the hardware, I run everything by default - no overclocking, no messing with the drivers or any other tweaks. I need the computer to run as stable as possible for other [far] more important things. I leave driver management to the hardware manufactures, as I believe that in most cases, they are capable of handling it. Other programmers may not always do it right ... Not to forget, Windows by itself generate issues enough to deal with, even though I'm using one of the most stable versions. If I had the option, I would have chosen a completely different OS, but I don't.

You blame it on the hardware/driver and even BOINC, but ignore why this doesn't happen with other project I am running (mostly MilkyWay), except for two single separate cases. If it was a hw/drv case, it would have happen to all the others too, but it didn't, so it isn't. Again, it's simple. The error message you quoted (out of context?) declares that an error occurred, not why, so the OpenCL error doesn't necessary relates to the driver, as it could be the other way around - a programmer's error ... The two single cases are exceptionals and not constant occurring as the Einstein GPU tasks issues has been. Has been, as something has happen now, so someone has done something and corrected it (and it's not me). All Einstein GPU tasks are ending properly (not stuck for several minutes at 99% either), with the GTX 1650 running faster than the GT 710 in fair proportion. You actually gave me an explanation to that -

- Your ending paragraph - nice admission. You should have started with that. It put the issues in a quite different perspective ... Regarding limiting GT 710 to certain tasks, I almost run BOINC 'as it is, out of the box' and don't have time or interest to do anything else than easy accessible in the settings in BOINC (which is quite limited) and when it is working without any problems with other projects, so no ... Messing with config files is 'not my cup of tea', especially mushy/sloppy XML files, including no or hard to find of related documentation ... (It took some minutes before I realized that the cc_config.xml was not part of a clean install of BOINC and it had to be manually generated. Don't like that kind of sloppy/lazy work.) Neither searching in badly structured forums with poor search engines. That GW O3ASE is still in a test phase, may not be obvious to all. At version 1.00 it should be stable enough, without mentioned issues occurring. By experience and according to your information, it should be labeled 0.xx, as it still seems to be an actual beta version. At least what a beta version used to be once (before MS changed the concept ...).

[ 4x E5-2680v4 + i9-10900K + Q9550S + A57 | 334GB | 4x RTX A2000 + M6000 + M40 + 3x GTX 1650 + UHD Graphics 630 + 2x Tegra X1 | Lubuntu 21.04 ]

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 412
Credit: 10248763455
RAC: 20036508

... have a great weekend ...

... have a great weekend ...

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47206592642
RAC: 65458307

Masse wrote: It's simple, if

Masse wrote:

It's simple, if two Einstein GPU tasks starts at the very same time (one with GTX 1650 and one with GT 710) with the same type of data and both ends in the very same time and both failed, something must be wrong.

looking at failed tasks is disingenuous, and causing you to misinterpret the situation. if there's some problem with the system that's preventing the tasks from running at all, then of course they will fail in around the same amount of time. that's not indicative of the cards performing the same at all. that's why I specifically only looked at your VALID tasks in your logs. tasks that ran to completion without error. and there is clearly showing that the GTX1650 is performing MUCH faster than your GT710, as it should. so your claim that they "perform the same" is objectively false.

but since you've made it clear that you do not want to do any work to solve your problems, then there's nothing anyone can help you with.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.