OK, I've done a reset on the project. but still the same problem.
I'm sorry, but I was just about to post that resetting is likely to be useless.
By chance, I've found one of my machines that has been on NNT (no new tasks) since before the problem started. It's been returning completed work quite happily and running down its cache completely oblivious to the bad sched_reply files and the associated error messages. As an experiment, I removed NNT and allowed it to request work - which of course failed dismally and the machine has now joined the others with this problem.
There is still something to be fixed on the servers so people should take no action like resetting, which will just trash what remains of your cache without fixing anything.
OK, I've now found another machine that last requested work at 00:27UTC on 21st July which was before the problem started. It has returned completed tasks but hasn't requested new work so the messages log contains no errors yet.
I'll keep this one in reserve (NNT is now set) until there's something further to test.
Just throwing my voice in as I am also experiencing the same issue. My Boinc version is 6.10.18 which I have installed on two XP workstations. Off the top of my head I can not remember which version I am running on my Linux laptop at home.
The problem is only occurring on one of my XP workstations. The Linux and 2nd XP workstation is working just fine.
I too am getting this on one machine. Schedule requests for other projects on the same machine seem to work fine. BOINC version is 6.10.58 and OS is Win7 x64.
Log snippet
Quote:
22/07/2010 9:46:26 PM Einstein@Home Sending scheduler request: To report completed tasks.
22/07/2010 9:46:26 PM Einstein@Home Reporting 12 completed tasks, not requesting new tasks
22/07/2010 9:46:50 PM [error] Task h1_1087.80_S5R4__701_S5GC1a: bad command line
22/07/2010 9:46:50 PM Einstein@Home [error] Can't parse workunit in scheduler reply: unexpected XML tag or syntax
22/07/2010 9:46:50 PM Einstein@Home [error] No close tag in scheduler reply
22/07/2010 9:47:51 PM Einstein@Home Sending scheduler request: To report completed tasks.
22/07/2010 9:47:51 PM Einstein@Home Reporting 12 completed tasks, not requesting new tasks
22/07/2010 9:48:14 PM [error] Task h1_1087.80_S5R4__701_S5GC1a: bad command line
22/07/2010 9:48:14 PM Einstein@Home [error] Can't parse workunit in scheduler reply: unexpected XML tag or syntax
22/07/2010 9:48:14 PM Einstein@Home [error] No close tag in scheduler reply
22/07/2010 9:48:25 PM climateprediction.net Sending scheduler request: To send trickle-up message.
22/07/2010 9:48:25 PM climateprediction.net Not reporting or requesting tasks
22/07/2010 9:48:25 PM GPUGRID update requested by user
22/07/2010 9:48:27 PM climateprediction.net Scheduler request completed
22/07/2010 9:48:32 PM GPUGRID Sending scheduler request: Requested by user.
22/07/2010 9:48:32 PM GPUGRID Reporting 2 completed tasks, not requesting new tasks
22/07/2010 9:48:34 PM GPUGRID Scheduler request completed
The problem is only occurring on one of my XP workstations. The Linux and 2nd XP workstation is working just fine.
Very bizarre.
Not really bizarre since the problem seems to be related to the size of the whole ... block that the scheduler is attempting to create and insert into the sched_reply response that is being sent to your client. The number of data files that need to be handled increases with increasing frequency so I bet if you look at the actual frequency in the task names being done on different hosts you will see a pattern.
Those that are failing will have frequencies in the task names probably around the 1000.xx to 1200.xx range. All of mine are around 1140.xx. Those hosts that don't seem to be having problems will probably be doing tasks at much lower frequencies - something like 500.xx to 800.xx for example. These numbers are only guesses at this stage. I don't know the real transition point.
As an example, take a look at MarkJ's log snippet (in the next message to yours) for a host showing the problem. The frequency visible there is high - 1087.80 Hz. Also take a look at the opening post in this thread. Also a high frequency of 1105.75.
The really bizarre thing is why this suddenly started about 1.5 days ago when everything was fine with large frequencies before that. My hosts have done thousands of 'high frequency' tasks over the last month or so before this issue suddenly arose.
One of the few things we changed last days was to add new download
mirrors, probably lengthening the server reply quite a bit. Maybe we hit a limit there.
I shortened the list on the server again by removing two mirrors, please have another go at it.
Success! I had 2 hosts with the problem, and both just cleared out their pending result reporting, with the expected message about them having already reported. One that needed work has also just downloaded work.
OK, I've done a reset on the
)
OK, I've done a reset on the project. but still the same problem.
RE: OK, I've done a reset
)
I'm sorry, but I was just about to post that resetting is likely to be useless.
By chance, I've found one of my machines that has been on NNT (no new tasks) since before the problem started. It's been returning completed work quite happily and running down its cache completely oblivious to the bad sched_reply files and the associated error messages. As an experiment, I removed NNT and allowed it to request work - which of course failed dismally and the machine has now joined the others with this problem.
There is still something to be fixed on the servers so people should take no action like resetting, which will just trash what remains of your cache without fixing anything.
Cheers,
Gary.
OK, I've now found another
)
OK, I've now found another machine that last requested work at 00:27UTC on 21st July which was before the problem started. It has returned completed tasks but hasn't requested new work so the messages log contains no errors yet.
I'll keep this one in reserve (NNT is now set) until there's something further to test.
Cheers,
Gary.
Just throwing my voice in as
)
Just throwing my voice in as I am also experiencing the same issue. My Boinc version is 6.10.18 which I have installed on two XP workstations. Off the top of my head I can not remember which version I am running on my Linux laptop at home.
The problem is only occurring on one of my XP workstations. The Linux and 2nd XP workstation is working just fine.
Very bizarre.
I too am getting this on one
)
I too am getting this on one machine. Schedule requests for other projects on the same machine seem to work fine. BOINC version is 6.10.58 and OS is Win7 x64.
Log snippet
Sched_Reply snippet
As it says the command_line tag seems to have lost its close tag.
BOINC blog
RE: The problem is only
)
Not really bizarre since the problem seems to be related to the size of the whole ... block that the scheduler is attempting to create and insert into the sched_reply response that is being sent to your client. The number of data files that need to be handled increases with increasing frequency so I bet if you look at the actual frequency in the task names being done on different hosts you will see a pattern.
Those that are failing will have frequencies in the task names probably around the 1000.xx to 1200.xx range. All of mine are around 1140.xx. Those hosts that don't seem to be having problems will probably be doing tasks at much lower frequencies - something like 500.xx to 800.xx for example. These numbers are only guesses at this stage. I don't know the real transition point.
As an example, take a look at MarkJ's log snippet (in the next message to yours) for a host showing the problem. The frequency visible there is high - 1087.80 Hz. Also take a look at the opening post in this thread. Also a high frequency of 1105.75.
The really bizarre thing is why this suddenly started about 1.5 days ago when everything was fine with large frequencies before that. My hosts have done thousands of 'high frequency' tasks over the last month or so before this issue suddenly arose.
Cheers,
Gary.
One of the few things we
)
One of the few things we changed last days was to add new download
mirrors, probably lengthening the server reply quite a bit. Maybe we hit a limit there.
I shortened the list on the server again by removing two mirrors, please have another go at it.
BM
BM
Success! I had 2 hosts with
)
Success! I had 2 hosts with the problem, and both just cleared out their pending result reporting, with the expected message about them having already reported. One that needed work has also just downloaded work.
Thanks for the quick help on this.
That fixed it fror me
)
That fixed it fror me too.
Thank you very much.
Worked for me also. All is
)
Worked for me also. All is well now...Thanks