not getting work units anymore

PatrickDeGraeve
PatrickDeGraeve
Joined: 7 Sep 05
Posts: 2
Credit: 387
RAC: 0
Topic 189879

Hello Everyone,

For a few days i'm not getting any workunits anymore.

Anyone who knows how to solve this problem?

I'm working with mac os X.4.2

Bye,
Patrick

Tern
Tern
Joined: 27 Jul 05
Posts: 309
Credit: 93416196
RAC: 979

not getting work units anymore

Quote:

For a few days i'm not getting any workunits anymore.

Anyone who knows how to solve this problem?

Possibly... but need a lot more info first. What other projects are you running? Are they getting work? What is your resource share for each? Are you getting any errors on Einstein in the Messages tab?

Are you sure that Einstein is not set to "Suspend" or "No New Work" in the Projects tab? (You're running BOINC 4.43 which is confusing... click on each project and make sure they all have a button that says "Suspend" and one that says "No new work", which means they are allowing new work, and not a button that says "Resume" or "Allow new work"...) If this is set wrong, correcting it should get you work within a few hours.

If all that is okay, then quit BOINC Manager and relaunch it. Let it run for about 10 minutes, then look at the Work tab and verify you still don't have any Einstein WUs. In the Projects tab, select Einstein and hit "Update". Go to the messages tab and you will see it trying to contact the Einstein servers. If it is "requesting 0 seconds", then you probably do NOT have a problem, other than you are "ahead" on Einstein work per your resource shares and the Manager is trying to catch up on the other projects. If it is "requesting 8640 seconds" (or whatever number) and failing, then there will be some error message that we can start from.

PatrickDeGraeve
PatrickDeGraeve
Joined: 7 Sep 05
Posts: 2
Credit: 387
RAC: 0

i'm only runnig seti and

Message 16495 in response to message 16494

i'm only runnig seti and einstein, and seti is getting wu, the share is 50 % for each, and in the message tab i'm getting the following

sending scheduler request to http://einstein.phys.uwm.edu/einsteinathome_cgi/cgi
scheduler request to http://einstein.phys.uwm.edu/einsteinathome_cgi/cgi succeded
request_reschedule_cpus : project op
schedule_cpus : must schedule

both suspend and no new work are off, and i don't have the message 'requesting ... seconds' after relaunching boinc

Tern
Tern
Joined: 27 Jul 05
Posts: 309
Credit: 93416196
RAC: 979

RE: both suspend and no new

Message 16496 in response to message 16495

Quote:
both suspend and no new work are off, and i don't have the message 'requesting ... seconds' after relaunching boinc

I see that you got a WU today - I don't know why we didn't see "requesting 0 seconds" message, but the fact that SETI was down for several days probably put you "in debt" to SETI - the 9th-12th you did return Einstein WUs. (You had said you hadn't gotten any for "a few days", but there was 1 returned the 9th, 2 the 10th, 1 each 11th and 12th...) If you got "ahead" by these 5 Einstein WUs while SETI was down, then as soon as SETI came back up, you would have been expected to do about 10 SETI WUs (if they take half as long) before seeing another Einstein.

Assuming all is okay now!

Fred Garcia-Cartaya
Fred Garcia-Cartaya
Joined: 22 Mar 05
Posts: 6
Credit: 127197
RAC: 0

I posted this in wrong

I posted this in wrong thread." Reached Daily Quota"

Reposted it here.

Hi
I have not recieved any WU for three days. I am runnig only two projects; climate prediction and Einstein. I have tried what I have read here.
1st resetting the project. 2nd aborting all of Climate WU only to recieve additional WU from climate only.

I am also getting the following messages when I update..

9/12/2005 8:53:30 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
9/12/2005 8:53:30 PM|Einstein@Home|Requesting 0 seconds of work, returning 0 results
9/12/2005 8:53:31 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
9/12/2005 8:53:32 PM||request_reschedule_cpus: project op

I thought to wait it out to no avial. What Next?

Regards

Fred

Fred Garcia-Cartaya
Fred Garcia-Cartaya
Joined: 22 Mar 05
Posts: 6
Credit: 127197
RAC: 0

I got the suggestion from

I got the suggestion from Bill Michael to suspend CPDN and update Einstein.

Tha worked

Thanks

Fred

J D K
J D K
Joined: 27 Aug 05
Posts: 86
Credit: 103878
RAC: 0

A request for 0 second of

A request for 0 second of work means one of three things:

The computer has too much work on hand already to meet deadlines reliably.

The project has enough work on hand, and the computer is not desperate for work.

The project has used too much CPU time for its resource share recently and is blocked from requesting more for a while.

In all of these cases, the best thing to do is to leave it alone, and it will fetch more work when it is ready.
"JM7"

JimHollandJr
JimHollandJr
Joined: 21 Feb 05
Posts: 11
Credit: 230264
RAC: 0

I am at a point were I call

Message 16500 in response to message 16499

I am at a point were I call this a bug. I think on some systems the math used to come up with the load that can be handled is just wrong.

I found a utility that allows you to reset these numbers and both my computers got the work they needed. (At the time most other projects were down.) It seems like my single CPU system (Dell inspiron 8200, 1.7 Gig Cpu, 1 gig, memory) is keeping about 3 days work. My dual CPU (AMD Athlon(tm) MP 2800+, 1 gig memory) does the Einstein WU and never gets new ones. The last message was
"9/26/2005 11:34:27 PM|Einstein@Home|Requesting 0 seconds of work, returning 0 results". Since I don't have climatePrefivtion on this CPU, I need many WU's from someone to keep this one busy. I have tried some sugestions like changing my network connection inverval with some short term help.

I am tired of fighing this thing unless I run completly out of work, then I will run the blanking utility since that is the only way to fix the bug for a while.

This is a bug that some people THINK they know about.

Quote:

A request for 0 second of work means one of three things:

The computer has too much work on hand already to meet deadlines reliably.

The project has enough work on hand, and the computer is not desperate for work.

The project has used too much CPU time for its resource share recently and is blocked from requesting more for a while.

In all of these cases, the best thing to do is to leave it alone, and it will fetch more work when it is ready.
"JM7"


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110028846105
RAC: 22434618

RE: I am at a point were I

Message 16501 in response to message 16500

Quote:
I am at a point were I call this a bug ....

Jim, you need to be careful what you say because the many people who read these lists looking for help and advice might actually believe that you do know what you are talking about. In this particular case it's not a bug but rather exactly how the system is designed to work. Now it is quite debateable as to whether or not the design should be different but that is entirely another matter.

Quote:
I am tired of fighing this thing unless I run completly out of work, then I will run the blanking utility since that is the only way to fix the bug for a while.

By "blanking utility" I presume you mean the utility that allows you to easily defeat your own resource shares and zero all the long term debts. Is that correct??

On the assumption that it is and having looked at the results list for your MP 2800+, here is what I think is the situation that is troubling you. You don't give hard information in your message so I'll make a few guesses. You can correct me where I get it wrong. My comments are entirely about the MP 2800+, a very nice crunching box indeed.

1. You are attached to 4 or 5 projects with roughly equal resource shares (20-25% each).
2. Your most reliable project is EAH which is never out of work (although work doesn't always download).
3. Your other projects are often out of work (seti, LHC, PPAH, etc).
4. To try to get more work you have upped your "connect to network" setting. This is your biggest mistake.

When all projects have work you will not have problems, apart from work going a bit "stale" while it is sitting there waiting to be started. It could equally well just sit on the server unless your machine has a problem connecting regularly. When several projects run out of work or have outages, poor old EAH will step in and take up the slack so your box wont actually stop crunching. However EAH will be "rewarded" for its efficiency by building up a considerable long term debt (LTD). So when other projects get work again, BOINC will step in and try to honour your resource shares by telling EAH to take a hike for a while. This is exactly how the system is designed to work so how can it be called a bug? The problems then develop because users decide this must be a problem and so start meddling and eventually end up creating a bigger mess.

Because one of the actions is usually to up the "connect to network" interval, the consequence will be that when you eventually force BOINC to get more EAH work it will download far too much and then BOINC will have to fight to prevent this excessive work from going past the deadline. This seems to have happened on your MP box on Sept 12 where about 10 EAH work units were downloaded on the one day and then all were processed between 20-22 Sept thus putting EAH back into a much worse LTD problem. So now BOINC is refusing to get more EAH work until the debt is repaid and you are grumbling about so-called bugs which don't exist.

You say you are tired of fighting the thing. Here's all you need to do:-
1. Set your resource shares exactly how you would like them and then stick to it.
2. Set your connect to network interval to something sane like 0.1 to 1.0 absolute max.
3. Let BOINC do its job and don't keep interfering.

If projects run out of work - so be it, EAH will be there to take up the slack. When projects get work again, they will be allowed to catch up. If you subscribe to the philosophy that any project that doesn't use its share should lose that share then add your name to the wish list because many people have already made that comment and I'm sure I've seen JM7 say that it's under consideration or perhaps even under development. I don't really know because I'm happily running a relatively old version of BOINC which works perfectly for me.

In the meantime if you want to manually implement this policy yourself then just use the utility you mention at any appropriate time to zero the LTDs and away you go. I should add the warning that I've never used the utility myself so I have no knowledge of any side effects from doing so. Others may wish to comment on that.

Quote:

This is a bug that some people THINK they know about.

Quote:

A request for 0 second of work means one of three things:

The computer has too much work on hand already to meet deadlines reliably.

The project has enough work on hand, and the computer is not desperate for work.

The project has used too much CPU time for its resource share recently and is blocked from requesting more for a while.

In all of these cases, the best thing to do is to leave it alone, and it will fetch more work when it is ready.
"JM7"

I interpret your comment as saying that volunteer developer JM7 is an idiot who THINKS he knows what he is talking about but in fact he doesn't and you know better. What arrogance!! JM7, a long time ago stated those three alternatives and he really would know because he wrote the code. I believe the last two sentences of that quote perfectly describe the situation on your MP box. I think an apology might be in order. If I am misinterpreting what you are saying then I'm sorry and perhaps you might like to clarify exactly what you are saying.

Cheers,
Gary.

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

As of (4.72? 5.1?) a project

As of (4.72? 5.1?) a project that is not supplying work will not have its LTDebt increased. However, if the client is not asking for work because of some sort of overload, then the LTDebt will be increased after the current deferral is finished.

Example:
LHC is not supplying work - no LT Debt increase. 24 hour deferal.
Einsten requires extra CPU time and the client enters EDF and NWF for 48 hours.
After 24 hours, LHC is not considered to be refusing to supply work, but rather the client is too overloaded to ask. Therefore at this point the LT Debt of LHC increases.
After 48 hours of Einstein running solo, CPDN gets its turn and will probably run for a couple of days solo while Einstein is not allowed to download work, LHC is still not supplying work, and therefor LHC is again not getting its LT Debt increased.

JimHollandJr
JimHollandJr
Joined: 21 Feb 05
Posts: 11
Credit: 230264
RAC: 0

Yes I am a semi expert, I do

Message 16503 in response to message 16501

Yes I am a semi expert, I do code for a living, tho work is a little slow now.

There are logic bugs where you get a syntax error.
Design flaws where is does not do as you expected.
Then the "Features" that don't please the user and don't function as they expected.

Yes I have 4 projects on my Dual system and 5 on my laptop. It seems to be doing well after I cleared the LTD and now has a a few WU's from each project.

My dual AMD machine now has 23 SAH WU's which will be done in 41.6 hours.

Now if the LTD is the hidden driving force then the user should get this explained in the help or on the BOINC site. The "Requesting 0 seconds" message is no reason to make the user think there is a problem.

Now the average user should have about 4 things to set up. 1) network connection 2) Select projects to do 3) set network connection time / cache size 4) fiddle with the percentages to give each project.

I am sure the average user could not follow the explanation of what is really happening nor why it is not a bug.

Quote:
Quote:
I am at a point were I call this a bug ....

Jim, you need to be careful what you say because the many people who read these lists looking for help and advice might actually believe that you do know what you are talking about. In this particular case it's not a bug but rather exactly how the system is designed to work. Now it is quite debateable as to whether or not the design should be different but that is entirely another matter.

Quote:
I am tired of fighing this thing unless I run completly out of work, then I will run the blanking utility since that is the only way to fix the bug for a while.

By "blanking utility" I presume you mean the utility that allows you to easily defeat your own resource shares and zero all the long term debts. Is that correct??

On the assumption that it is and having looked at the results list for your MP 2800+, here is what I think is the situation that is troubling you. You don't give hard information in your message so I'll make a few guesses. You can correct me where I get it wrong. My comments are entirely about the MP 2800+, a very nice crunching box indeed.

1. You are attached to 4 or 5 projects with roughly equal resource shares (20-25% each).
2. Your most reliable project is EAH which is never out of work (although work doesn't always download).
3. Your other projects are often out of work (seti, LHC, PPAH, etc).
4. To try to get more work you have upped your "connect to network" setting. This is your biggest mistake.

When all projects have work you will not have problems, apart from work going a bit "stale" while it is sitting there waiting to be started. It could equally well just sit on the server unless your machine has a problem connecting regularly. When several projects run out of work or have outages, poor old EAH will step in and take up the slack so your box wont actually stop crunching. However EAH will be "rewarded" for its efficiency by building up a considerable long term debt (LTD). So when other projects get work again, BOINC will step in and try to honour your resource shares by telling EAH to take a hike for a while. This is exactly how the system is designed to work so how can it be called a bug? The problems then develop because users decide this must be a problem and so start meddling and eventually end up creating a bigger mess.

Because one of the actions is usually to up the "connect to network" interval, the consequence will be that when you eventually force BOINC to get more EAH work it will download far too much and then BOINC will have to fight to prevent this excessive work from going past the deadline. This seems to have happened on your MP box on Sept 12 where about 10 EAH work units were downloaded on the one day and then all were processed between 20-22 Sept thus putting EAH back into a much worse LTD problem. So now BOINC is refusing to get more EAH work until the debt is repaid and you are grumbling about so-called bugs which don't exist.

You say you are tired of fighting the thing. Here's all you need to do:-
1. Set your resource shares exactly how you would like them and then stick to it.
2. Set your connect to network interval to something sane like 0.1 to 1.0 absolute max.
3. Let BOINC do its job and don't keep interfering.

If projects run out of work - so be it, EAH will be there to take up the slack. When projects get work again, they will be allowed to catch up. If you subscribe to the philosophy that any project that doesn't use its share should lose that share then add your name to the wish list because many people have already made that comment and I'm sure I've seen JM7 say that it's under consideration or perhaps even under development. I don't really know because I'm happily running a relatively old version of BOINC which works perfectly for me.

In the meantime if you want to manually implement this policy yourself then just use the utility you mention at any appropriate time to zero the LTDs and away you go. I should add the warning that I've never used the utility myself so I have no knowledge of any side effects from doing so. Others may wish to comment on that.

Quote:

This is a bug that some people THINK they know about.

Quote:

A request for 0 second of work means one of three things:

The computer has too much work on hand already to meet deadlines reliably.

The project has enough work on hand, and the computer is not desperate for work.

The project has used too much CPU time for its resource share recently and is blocked from requesting more for a while.

In all of these cases, the best thing to do is to leave it alone, and it will fetch more work when it is ready.
"JM7"

I interpret your comment as saying that volunteer developer JM7 is an idiot who THINKS he knows what he is talking about but in fact he doesn't and you know better. What arrogance!! JM7, a long time ago stated those three alternatives and he really would know because he wrote the code. I believe the last two sentences of that quote perfectly describe the situation on your MP box. I think an apology might be in order. If I am misinterpreting what you are saying then I'm sorry and perhaps you might like to clarify exactly what you are saying.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.