I would guess that the user concerned had some sort of software glitch that managed to create about 200 new CPUIDs (since 8 x 200 = 1600). I have a vague recollection of reading somewhere about a proposal to manage runaway CPUID creation by limiting the number that could be created at one go (maybe per day, I don't remember) to 200. Anyway it looks suspiciously like that.
I'm guessing again but the user in question in this case now shows only the one CPUID - the one you listed so it looks like the user might be on to it and already has merged all the phantom CPUIDs. I don't know for sure if there is any other way a batch of phantoms could be reduced to one unless there was user intervention.
If this is the case, the user is probably scratching his head and wondering about how to get rid of (ie abort) all the excess that wont make it before the deadline. I imagine he might have a bit of a potential disk space problem as well. 200 new large data files in rapid succession might have caused a bit of strain :).
You can't merge computers that have work out though, so I don't think it can be multiple merged computers.
But if it is, then the person would only see the WUs from the computer they were all merged into. If you detach and reattach, you often can't use the WUs you had perviously d/led, because they are assigned to a different numbered computer.
The WU limit of 8 per day will stop a single runaway cpu in its tracks. I remember seeing a comment from Bruce several months ago about runaway CPUID creation and the potential to "empty the store" if something wasn't done. I'm sure I saw a figure of 200 as a limit on CPUID creation. This would still allow a fairly sizeable farm to be added without restriction :).
OK, you have a good point about merging. However, the user in question only has one CPUID, created very recently, so I don't know how else to explain it other than to say some sort of merging seems to have occurred because all 1600+ results are now shown on the one CPUID. As far as detaching/reattaching is concerned, I've no experience of what happens there. I've never had to do it.
Actually, I've just noticed that there is another problem for the user. The version of BOINC is 4.19 which doesn't have a work unit abort, I don't think. I think he will basically have to ditch the lot. With the 14 day deadline, hopefully he will not do anything rash but will come looking for help. Someone may know a solution to his problem. Unfortunately I don't know of one off the top of my head.
This is a typical example of a user whoi did set the 'Connect to network about every days' to a value much larger than 1.0. I guess it is set to the maximum of 10.
Why do I think so? Simple, check out the WU's. You always see a set of WU's which are downloaded within a small time, then a gap, then the next group.
I had a similar experiance with when I did set the contact to a value of 3. When I checked on the next day, I had already 20 WU's waiting on my machine. Okay, I could set the value back, so just one WU on my slowest machine was lost.
This is a very bad bug in the boinc client.
Ans I think this has nothing to do with a farm of machines, with merging or anything similar. I am sure everyone can reproduce this behavior just by setting the contact rate to whatever value which is reasonable larger than 1.0
The person in question is actually on the boards in the recent past and I've asked him to come join this thread so we can ask him some questions. Completely by chance I've just noticed him.
I've had a look again at his list of results. Notice the completed result is dated 5 Oct 2005 11:43:47 UTC. That means to me that all the ones dated prior to that date are ghosts and are actually not on his system. I think that version 4.19 always does the oldest results first so he shouldn't have any older than that. So maybe he doesn't have too much of a problem after all. Version 4.19 does not have the handshaking that would download the ghost WUs so he can just forget about them and allow them to expire eventually.
If you look through a few pages of his results you will see many examples of more than 8 per day so even if he set his connect to 10, how can he get more than 8 per day? The server will simply not give him more than 8 per day unless the server thinks he has multiple cpus or multiple CPUIDs. For example notice all the ones dated 01 October. I'll be shocked if the person concerned deliberately set the value as high as 10. I think it's best if we try to ask the user to tell us what happened and then maybe we can assist.
Thanks for raising this, it happened in the past but then appeared to cure itself without any intervention.
I've looked at my settings...I changed connecct to network from 3 to 1 day(s).
Which according to my account gives me 2 Wu's per day to crunch.
The only problem is, whether I have the days set to 1 or 3 no WU reaches my machine. But as you see from my record I should have xx Wu's stacked up waiting to be crunched.
Actually, I think your problems may have sorted themselves out. You've just uploaded another successful result and the server has upped your allocation to 4/day. It was 2/day last time I looked.
Ghost results are a bit of a mystery. They come about because the BOINC client asks for work and the server tries to send it but the client never receives it. I think basically the client keeps asking for work and the server sends more which in bad cases like yours results in more ghosts - until your daily limit of 8 is reached. The server then tells the client to get lost because the server thinks the client has 8 but the poor client in bad cases actually has none. It's hard to know how your list on the server got to 1600 though. I'll ask Bruce to have a look at it but basically I imagine his response might be something like "We've addressed these problems in later versions so please upgrade".
I think the problem is addressed in 4.45 but surely by 4.72. By all accounts we are getting near the release of 5.2.x so support for 4.x will probably drop away fairly quickly. Maybe if your machine keeps crunching normally now you might just wait for the expected new stable client.
scheduler got bird flu?
)
I would guess that the user concerned had some sort of software glitch that managed to create about 200 new CPUIDs (since 8 x 200 = 1600). I have a vague recollection of reading somewhere about a proposal to manage runaway CPUID creation by limiting the number that could be created at one go (maybe per day, I don't remember) to 200. Anyway it looks suspiciously like that.
I'm guessing again but the user in question in this case now shows only the one CPUID - the one you listed so it looks like the user might be on to it and already has merged all the phantom CPUIDs. I don't know for sure if there is any other way a batch of phantoms could be reduced to one unless there was user intervention.
If this is the case, the user is probably scratching his head and wondering about how to get rid of (ie abort) all the excess that wont make it before the deadline. I imagine he might have a bit of a potential disk space problem as well. 200 new large data files in rapid succession might have caused a bit of strain :).
Cheers,
Gary.
You can't merge computers
)
You can't merge computers that have work out though, so I don't think it can be multiple merged computers.
But if it is, then the person would only see the WUs from the computer they were all merged into. If you detach and reattach, you often can't use the WUs you had perviously d/led, because they are assigned to a different numbered computer.
The WU limit of 8 per day
)
The WU limit of 8 per day will stop a single runaway cpu in its tracks. I remember seeing a comment from Bruce several months ago about runaway CPUID creation and the potential to "empty the store" if something wasn't done. I'm sure I saw a figure of 200 as a limit on CPUID creation. This would still allow a fairly sizeable farm to be added without restriction :).
OK, you have a good point about merging. However, the user in question only has one CPUID, created very recently, so I don't know how else to explain it other than to say some sort of merging seems to have occurred because all 1600+ results are now shown on the one CPUID. As far as detaching/reattaching is concerned, I've no experience of what happens there. I've never had to do it.
Actually, I've just noticed that there is another problem for the user. The version of BOINC is 4.19 which doesn't have a work unit abort, I don't think. I think he will basically have to ditch the lot. With the 14 day deadline, hopefully he will not do anything rash but will come looking for help. Someone may know a solution to his problem. Unfortunately I don't know of one off the top of my head.
Cheers,
Gary.
This is a typical example of
)
This is a typical example of a user whoi did set the 'Connect to network about every days' to a value much larger than 1.0. I guess it is set to the maximum of 10.
Why do I think so? Simple, check out the WU's. You always see a set of WU's which are downloaded within a small time, then a gap, then the next group.
I had a similar experiance with when I did set the contact to a value of 3. When I checked on the next day, I had already 20 WU's waiting on my machine. Okay, I could set the value back, so just one WU on my slowest machine was lost.
This is a very bad bug in the boinc client.
Ans I think this has nothing to do with a farm of machines, with merging or anything similar. I am sure everyone can reproduce this behavior just by setting the contact rate to whatever value which is reasonable larger than 1.0
The person in question is
)
The person in question is actually on the boards in the recent past and I've asked him to come join this thread so we can ask him some questions. Completely by chance I've just noticed him.
I've had a look again at his list of results. Notice the completed result is dated 5 Oct 2005 11:43:47 UTC. That means to me that all the ones dated prior to that date are ghosts and are actually not on his system. I think that version 4.19 always does the oldest results first so he shouldn't have any older than that. So maybe he doesn't have too much of a problem after all. Version 4.19 does not have the handshaking that would download the ghost WUs so he can just forget about them and allow them to expire eventually.
If you look through a few pages of his results you will see many examples of more than 8 per day so even if he set his connect to 10, how can he get more than 8 per day? The server will simply not give him more than 8 per day unless the server thinks he has multiple cpus or multiple CPUIDs. For example notice all the ones dated 01 October. I'll be shocked if the person concerned deliberately set the value as high as 10. I think it's best if we try to ask the user to tell us what happened and then maybe we can assist.
Cheers,
Gary.
Thanks for raising this, it
)
Thanks for raising this, it happened in the past but then appeared to cure itself without any intervention.
I've looked at my settings...I changed connecct to network from 3 to 1 day(s).
Which according to my account gives me 2 Wu's per day to crunch.
The only problem is, whether I have the days set to 1 or 3 no WU reaches my machine. But as you see from my record I should have xx Wu's stacked up waiting to be crunched.
Yes you have an extremely bad
)
Yes you have an extremely bad problem with ghosts. How many results do you actually have at the moment?
Cheers,
Gary.
Actually, I think your
)
Actually, I think your problems may have sorted themselves out. You've just uploaded another successful result and the server has upped your allocation to 4/day. It was 2/day last time I looked.
Cheers,
Gary.
2 actual returned results
)
2 actual returned results followed by xxx ghost results
I live in hope Gary!
Ghost results are a bit of a
)
Ghost results are a bit of a mystery. They come about because the BOINC client asks for work and the server tries to send it but the client never receives it. I think basically the client keeps asking for work and the server sends more which in bad cases like yours results in more ghosts - until your daily limit of 8 is reached. The server then tells the client to get lost because the server thinks the client has 8 but the poor client in bad cases actually has none. It's hard to know how your list on the server got to 1600 though. I'll ask Bruce to have a look at it but basically I imagine his response might be something like "We've addressed these problems in later versions so please upgrade".
I think the problem is addressed in 4.45 but surely by 4.72. By all accounts we are getting near the release of 5.2.x so support for 4.x will probably drop away fairly quickly. Maybe if your machine keeps crunching normally now you might just wait for the expected new stable client.
Cheers,
Gary.