Is E@H selecting the "reached daily quota of ??? tasks"?

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405
Topic 223417

Checking my event log occasionally I often see that my projects (whether E@H, M@H, etc.) have reached a daily quota of ??? tasks.


Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | update requested by user
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Sending scheduler request: Requested by user.
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Reporting 2 completed tasks
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Scheduler request completed: got 0 new tasks
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work sent
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Binary Radio Pulsar Search (Arecibo)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar search #5
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar binary search #1 on GPUs
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gravitational Wave search O2 Multi-Directional
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | (reached daily quota of 68 tasks)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project has no jobs available
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project requested delay of 30118 seconds


In this particular case I had at least two different "daily quotas" reached, the first one was 17 tasks, and this latest one was 68.

On this computer, an old i7-990X with 6 cores & 12 threads of which I am using 9 threads for BOINC, 16GB of memory using 60% when the computer is active and 100% when it is not being used, and an RTX2060 GPU (w/6GB) and the computer is running constantly 24/7.  It does not shut down BOINC unless the non-BOINC CPU load is at 80% or higher.  I am running E@H only and have over 1,000 tasks (project files) in my que.

What is the criteria for selecting the daily quota of tasks?

George

mikey
mikey
Joined: 22 Jan 05
Posts: 6,195
Credit: 548,407,364
RAC: 116,768

George wrote: Checking my

George wrote:

Checking my event log occasionally I often see that my projects (whether E@H, M@H, etc.) have reached a daily quota of ??? tasks.


Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | update requested by user
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Sending scheduler request: Requested by user.
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Reporting 2 completed tasks
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Scheduler request completed: got 0 new tasks
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work sent
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Binary Radio Pulsar Search (Arecibo)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar search #5
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar binary search #1 on GPUs
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gravitational Wave search O2 Multi-Directional
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | (reached daily quota of 68 tasks)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project has no jobs available
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project requested delay of 30118 seconds


In this particular case I had at least two different "daily quotas" reached, the first one was 17 tasks, and this latest one was 68.

On this computer, an old i7-990X with 6 cores & 12 threads of which I am using 9 threads for BOINC, 16GB of memory using 60% when the computer is active and 100% when it is not being used, and an RTX2060 GPU (w/6GB) and the computer is running constantly 24/7.  It does not shut down BOINC unless the non-BOINC CPU load is at 80% or higher.  I am running E@H only and have over 1,000 tasks (project files) in my que.

What is the criteria for selecting the daily quota of tasks? 

This usually kicks in when we have too many errors and the project throttles our workunits until we fix the problem, as you can see the return to normal is very quick as you get 2 more today for every good one returned today than you did yesterday. Units returning errors count against that though, if you abort units I do not THINK they count against you though.

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405

mikey wrote: George

mikey wrote:

George wrote:

Checking my event log occasionally I often see that my projects (whether E@H, M@H, etc.) have reached a daily quota of ??? tasks.


Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | update requested by user
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Sending scheduler request: Requested by user.
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Reporting 2 completed tasks
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Scheduler request completed: got 0 new tasks
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work sent
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Binary Radio Pulsar Search (Arecibo)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar search #5
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar binary search #1 on GPUs
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gravitational Wave search O2 Multi-Directional
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | (reached daily quota of 68 tasks)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project has no jobs available
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project requested delay of 30118 seconds


In this particular case I had at least two different "daily quotas" reached, the first one was 17 tasks, and this latest one was 68.

On this computer, an old i7-990X with 6 cores & 12 threads of which I am using 9 threads for BOINC, 16GB of memory using 60% when the computer is active and 100% when it is not being used, and an RTX2060 GPU (w/6GB) and the computer is running constantly 24/7.  It does not shut down BOINC unless the non-BOINC CPU load is at 80% or higher.  I am running E@H only and have over 1,000 tasks (project files) in my que.

What is the criteria for selecting the daily quota of tasks? 

This usually kicks in when we have too many errors and the project throttles our workunits until we fix the problem, as you can see the return to normal is very quick as you get 2 more today for every good one returned today than you did yesterday. Units returning errors count against that though, if you abort units I do not THINK they count against you though.

Thanks Mikey.  I didn't think about the errors I've had, which were several, though it was several days ago.  It all makes sense now.

George

mikey
mikey
Joined: 22 Jan 05
Posts: 6,195
Credit: 548,407,364
RAC: 116,768

George wrote: mikey

George wrote:

mikey wrote:

George wrote:

Checking my event log occasionally I often see that my projects (whether E@H, M@H, etc.) have reached a daily quota of ??? tasks.


Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | update requested by user
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Sending scheduler request: Requested by user.
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Reporting 2 completed tasks
Fri 04 Sep 2020 10:40:38 AM CDT | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Scheduler request completed: got 0 new tasks
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work sent
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Binary Radio Pulsar Search (Arecibo)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar search #5
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gamma-ray pulsar binary search #1 on GPUs
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | No work is available for Gravitational Wave search O2 Multi-Directional
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | (reached daily quota of 68 tasks)
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project has no jobs available
Fri 04 Sep 2020 10:40:40 AM CDT | Einstein@Home | Project requested delay of 30118 seconds


In this particular case I had at least two different "daily quotas" reached, the first one was 17 tasks, and this latest one was 68.

On this computer, an old i7-990X with 6 cores & 12 threads of which I am using 9 threads for BOINC, 16GB of memory using 60% when the computer is active and 100% when it is not being used, and an RTX2060 GPU (w/6GB) and the computer is running constantly 24/7.  It does not shut down BOINC unless the non-BOINC CPU load is at 80% or higher.  I am running E@H only and have over 1,000 tasks (project files) in my que.

What is the criteria for selecting the daily quota of tasks? 

This usually kicks in when we have too many errors and the project throttles our workunits until we fix the problem, as you can see the return to normal is very quick as you get 2 more today for every good one returned today than you did yesterday. Units returning errors count against that though, if you abort units I do not THINK they count against you though.

Thanks Mikey.  I didn't think about the errors I've had, which were several, though it was several days ago.  It all makes sense now. 

No problem I've had a few problems, at other projects, when a drive crashes and I can't get into it when I rebuild the pc.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,205
Credit: 43,111,226,498
RAC: 45,186,066

George wrote:...  I didn't

George wrote:
...  I didn't think about the errors I've had, which were several, though it was several days ago.

You probably should check that again because you currently have 361 compute errors as opposed to just 124 valid tasks showing in the tasks list for that machine.  Your most recent errors were only a short time ago so the problem continues.

Most tasks seem to fail during startup around the 30-40 secs mark.  I saw one that failed after about 300 secs which is not long after startup.  The error message (which you can find by clicking the taskID link for any failed task on the website) showed the error as

clEnqueueNDRangeKernel failed: CL_INVALID_COMMAND_QUEUE

and also

Internal function call failed

neither of which means anything much to me.  However, it all points to a hardware issue early on where the stresses of getting started are causing some bit of hardware to misbehave.

Are you overclocking?  Are you running concurrent tasks?  Have you tested for memory errors?  Have you checked operating temperatures or your cooling system?  How old is your PSU?  Is it properly rated to handle the GPU?  Is there any sign of bulging/swollen caps on the motherboard?  Those are the sorts of things I'd be investigating.  A good way to check hardware is to replace with known good stuff - eg. memory and PSU.  Really pay attention to CPU cooling.  Have you cleaned the heat sink?  Is the CPU fan running at full speed?

Good luck with finding what is causing the failures.

Cheers,
Gary.

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405

Gary Roberts wrote: George

Gary Roberts wrote:

George wrote:
...  I didn't think about the errors I've had, which were several, though it was several days ago.

You probably should check that again because you currently have 361 compute errors as opposed to just 124 valid tasks showing in the tasks list for that machine.  Your most recent errors were only a short time ago so the problem continues.

Most tasks seem to fail during startup around the 30-40 secs mark.  I saw one that failed after about 300 secs which is not long after startup.  The error message (which you can find by clicking the taskID link for any failed task on the website) showed the error as

clEnqueueNDRangeKernel failed: CL_INVALID_COMMAND_QUEUE

and also

Internal function call failed

neither of which means anything much to me.  However, it all points to a hardware issue early on where the stresses of getting started are causing some bit of hardware to misbehave.

Are you overclocking?  Are you running concurrent tasks?  Have you tested for memory errors?  Have you checked operating temperatures or your cooling system?  How old is your PSU?  Is it properly rated to handle the GPU?  Is there any sign of bulging/swollen caps on the motherboard?  Those are the sorts of things I'd be investigating.  A good way to check hardware is to replace with known good stuff - eg. memory and PSU.  Really pay attention to CPU cooling.  Have you cleaned the heat sink?  Is the CPU fan running at full speed?

Good luck with finding what is causing the failures.

Are you overclocking? 

Yes, at 4.00 GHz vs 3.47 GHz standard.

Are you running concurrent tasks? 

Yes, I am using 9 threads for BOINC tasks 24/7, and 3 threads for either browser activities or watching movies on CD/DVD. Otherwise just BOINC.

Have you tested for memory errors? 

No, not yet, I’ve just downloaded MemTest86 and have yet to run.

Have you checked operating temperatures or your cooling system? 

Yes, constantly. CPU holds 50-55 degrees C, GPU runs 48-64 degrees C.

How old is your PSU? 

About 2 ½ years, an EVGA SuperNOVA 650 G1, 80+ Gold 650W, Fully Modular, w/10 Year Warranty.

Is it properly rated to handle the GPU? 

Yes, no problem.

Is there any sign of bulging/swollen caps on the motherboard? 

Last time I checked was when I replaced the PSU, and no there wasn’t. I haven’t checked lately.

Those are the sorts of things I'd be investigating.  A good way to check hardware is to replace with known good stuff - eg. memory and PSU. 

I realize that but I don’t have any. Besides, the memory is old.

Really pay attention to CPU cooling.  Have you cleaned the heat sink? 

Yes, recently, when I swapped CPU coolers. I presently am using a Noctua NH-U12A.

Is the CPU fan running at full speed?

Yes, over 2000 RPM all the time.

Thank you Gary.  I will be in the process of checking memory tomorrow.  If I still feel that my computer is not living up to it's standards anymore, I will tear it down and thoroughly inspect for blown caps and any other signs of failure.  I'm just trying to make it last as long as I can (as if 10+ yrs isn't enough!).

George

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405

George wrote: Gary Roberts

George wrote:

Gary Roberts wrote:

George wrote:
...  I didn't think about the errors I've had, which were several, though it was several days ago.

You probably should check that again because you currently have 361 compute errors as opposed to just 124 valid tasks showing in the tasks list for that machine.  Your most recent errors were only a short time ago so the problem continues.

Most tasks seem to fail during startup around the 30-40 secs mark.  I saw one that failed after about 300 secs which is not long after startup.  The error message (which you can find by clicking the taskID link for any failed task on the website) showed the error as

clEnqueueNDRangeKernel failed: CL_INVALID_COMMAND_QUEUE

and also

Internal function call failed

neither of which means anything much to me.  However, it all points to a hardware issue early on where the stresses of getting started are causing some bit of hardware to misbehave.

Are you overclocking?  Are you running concurrent tasks?  Have you tested for memory errors?  Have you checked operating temperatures or your cooling system?  How old is your PSU?  Is it properly rated to handle the GPU?  Is there any sign of bulging/swollen caps on the motherboard?  Those are the sorts of things I'd be investigating.  A good way to check hardware is to replace with known good stuff - eg. memory and PSU.  Really pay attention to CPU cooling.  Have you cleaned the heat sink?  Is the CPU fan running at full speed?

Good luck with finding what is causing the failures.

Are you overclocking? 

Yes, at 4.00 GHz vs 3.47 GHz standard.

Are you running concurrent tasks? 

Yes, I am using 9 threads for BOINC tasks 24/7, and 3 threads for either browser activities or watching movies on CD/DVD. Otherwise just BOINC.

Have you tested for memory errors? 

No, not yet, I’ve just downloaded MemTest86 and have yet to run.

Have you checked operating temperatures or your cooling system? 

Yes, constantly. CPU holds 50-55 degrees C, GPU runs 48-64 degrees C.

How old is your PSU? 

About 2 ½ years, an EVGA SuperNOVA 650 G1, 80+ Gold 650W, Fully Modular, w/10 Year Warranty.

Is it properly rated to handle the GPU? 

Yes, no problem.

Is there any sign of bulging/swollen caps on the motherboard? 

Last time I checked was when I replaced the PSU, and no there wasn’t. I haven’t checked lately.

Those are the sorts of things I'd be investigating.  A good way to check hardware is to replace with known good stuff - eg. memory and PSU. 

I realize that but I don’t have any. Besides, the memory is old.

Really pay attention to CPU cooling.  Have you cleaned the heat sink? 

Yes, recently, when I swapped CPU coolers. I presently am using a Noctua NH-U12A.

Is the CPU fan running at full speed?

Yes, over 2000 RPM all the time.

Thank you Gary.  I will be in the process of checking memory tomorrow.  If I still feel that my computer is not living up to it's standards anymore, I will tear it down and thoroughly inspect for blown caps and any other signs of failure.  I'm just trying to make it last as long as I can (as if 10+ yrs isn't enough!).

As I said in the last post, my memory is old (the Kingston sticks, and I guess my grey matter is too!), 4-sticks of 4 MB each @ DDR3-800 that I have pushed to 1600 MHz a few years ago.  When I tested the memory with MemTest86 v4, I had let it run over night for about ~14:45 hrs, and here are the results.

I did not realize that I could also check the CPU cores as well as threads until after I had run the test.  So this test shows only the 6 cores in use.  As you can see, I've made 4 complete passes with no errors so I'm thinking that there is nothing wrong with the memory, even as old as it is.

Also, I have checked and rechecked the motherboard for signs of any blown caps and anything else that might be visible without physically removing the motherboard from the case and I didn't find anything suspicious.

As for validating my temps, look at these:

If you look at the GKrellM System Monitor in the upper right corner you'll see that I have been running BOINC again for ~6+ hrs and the CPU and GPU temps are as I said.  And this is with an ambient temp of 84 F and a relative humidity of 55%.  I don't think I have anything to be concerned about there.

I believe by your comments, Gary, that I may have been doing too much "running concurrent tasks" which may have caused the errors you noticed.  I don't otherwise notice anything wrong with my system.

Should I set my CPU usage to use 8 threads instead of 9 threads ?

George

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,205
Credit: 43,111,226,498
RAC: 45,186,066

You mentioned that you were

You mentioned that you were overclocking your CPU - 4.0GHz instead of 3.47GHz.  That's a pretty hefty overclock and a likely source of problems, particularly as stuff ages.  You also mention running DDR2-800 RAM at 1600.  I don't know that you could overclock DDR2 by that much.  In any case, that would also be a potential problem.

The Memtest screen shows no CPU overclock at all.  There is no memory frequency mentioned so I suspect the RAM was probably running at stock speeds as well.  Your RAM is fine at stock, as is your CPU most likely.  Could be a different story with either of those when overclocked at the levels you mention.

The easiest thing to do is to make sure you run BOINC at stock speeds for a while and see if the problem goes away.  If it does, you have identified the cause of the problem.

Your problems are with GPU tasks so when I mentioned concurrent tasks I was talking about GPU tasks. - ie multiple concurrent tasks sharing the one GPU.  I'm sorry I didn't make that more clear.

Cheers,
Gary.

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405

My ASUS Sabertooth X58

My ASUS Sabertooth X58 motherboard uses DDR3 memory, not DDR2, just verified by checking the ASUS Support page for the memory QVL.  Also, it does show a memory speed of over 1800 MHz as being accepted, so I don't think my memories are a problem.  If they were blanking out, or causing problems at the speed of 1600 MHz, I would think the testing I did would have shown up even at the stock speed of 800 MHz.

I did notice that my CPU was listed as running at stock frequency, not showing the boosted speed of 4.00 GHz.  I have had occasions where when running off a USB stick, which I was, it reverted back to stock settings.  But I do not know if that is the case here.  What I can tell you is that in Windows 10, under Task Manager, it shows the stock speed and the boosted speed, in which case I'm not sure if the speed shown is actually correct.  I have not checked if the speed is still set to 4.00 GHz since running the memory test.  Regardless, I haven't been running at 4.00 GHz for much more than 2 years.  I first joined BOINC on Oct 23, 2017, less than three years ago.

I will try running at 8 threads, one less than I am now, and see if that helps any.

George

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,205
Credit: 43,111,226,498
RAC: 45,186,066

George wrote:My ASUS

George wrote:
My ASUS Sabertooth X58 motherboard uses DDR3 memory ...

Sorry about that.  I incorrectly interpreted what you had written and in my haste saw the '800' designation, which was standard for DDR2.  The common values for DDR3 were 1333 and 1600.  You have what would have been a fairly high end CPU in its day so I'd be surprised if it came with DDR3-800 - I don't even remember that designation :-).   I remember seeing DDR3-1066 but that was very early in the DDR3 cycle.  I would guess you most likely have DDR3-1600 and if that's what it's running at, you aren't overclocked at all.  The labels on the RAM sticks will tell you for sure what its rating is.

You're still getting compute errors but a lot less than previously.  Whatever you've changed seems to be making a difference.  The tasks run longer before crashing and the type of error is different.  The problem is with GPU tasks so reducing the CPU thread count by just one may not make much difference.  If you really want to test whether CPU load is causing GPU tasks to crash, just temporarily suspend all CPU task crunching for a period (say a day or two) and see if the GPU errors stop.

You need to properly determine what frequency you're running the CPU at.  Maybe it's running at what you had when running memtest and maybe that is what's giving you the better outcome at the moment.  Do you perform overclocking through BIOS/UEFI settings?  Those settings shouldn't change just by booting a USB stick.

In any case, you're the only one who can work out what's really causing the failures, and unfortunately there are still some of those.  At a quick count about 10 in the last 24 hours.  Good luck with finding what is causing that.

Cheers,
Gary.

George
George
Joined: 8 Jan 18
Posts: 96
Credit: 66,423,670
RAC: 210,405

I did recheck my BIOS

I did recheck my BIOS settings (I don't have a UEFI) and found my CPU is still at 3.990 GHZ with my RAM set to run at 1600 MHz.  I was mistaken on my initial RAM speed which is factory clocked at 1066 MHz.  Sorry about that.

And yes, you are correct.  This computer was a very fast computer when it was built, at one time it was one of the fastest, but not anymore by any means.

I think I may have figured it out.  In my Nvidia X Server Settings, I may have set the PowerMizer Graphics Clock settings too high which may be causing my hiccups.  I lowered it a bit and we'll see what happens.

Gary, if I didn't say so before, I'll say it now.  You are a gracious moderator, giving your knowledge to anyone who would kindly accept your advice, and you do it so often... it makes me wonder when you sleep or do anything else with your time.

Thank you for your advice, and I'll be sure to let you know how it goes after a few days.

George

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.