I notice a task has completed and - stuck at 100% uploading. Probably a couple of days.
the host is https://einsteinathome.org/host/11905468
boinccmd reveals.
name: LATeah0031L_972.0_0_0.0_10386380_0
WU name: LATeah0031L_972.0_0_0.0_10386380
project URL: http://einstein.phys.uwm.edu/
report deadline: Tue Jun 27 17:27:57 2017
ready to report: no
got server ack: no
final CPU time: 82.684000
state: uploading
scheduler state: uninitialized
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: UNINITIALIZED
app version num: 0
checkpoint CPU time: 0.000000
current CPU time: 0.000000
fraction done: 0.000000
swap size: 0 MB
working set size: 0 MB
estimated CPU time remaining: 0.000000
I don't see this task in my task list on the https://einsteinathome.org/host/11905468/tasks
I don't see this task completed in the job_log_einstein.phys.uwm.edu.txt log file either. It may have errored although the time looks ok.
There used to be a method for searching for a specific task by name (to find WU number) in the old web site but i can't see if that exists on new. So I can't provide a link to task.
I have been suspending processing on this host due to temps but this should not affect tasks on the server side perhaps it is there and i can't see it in the portal.
The task template data exist here http://einstein3.aei.uni-hannover.de/EinsteinAtHome/download/32a/templates_LATeah0031L_0972_10386380.dat
It was downloaded Jun 13 17:28
So something is strange about this task, I have not yet "reset project" and it's not affecting other tasks.
I might have a backup of the event logs
So I guess the questions are
Why no matching task at E@H?
Is there a way to go from WU Name to WU ID?
and
what should i try next?
Cheers
edit: apologies for the double line spacing - but it doesn't look double spaced when i post, so perhaps i'll test out things at albert and post on this later.
Copyright © 2024 Einstein@Home. All rights reserved.
This is a rare case where the
)
This is a rare case where the task is stuck in the "uploading" state. My guess is that the client didn't get the ACK from the upload server and for some reason it does not try again. Is it still available under "Transfers" tab in the Manager and can you try to "restart" transfer from there?
If you are more advantageous you could try to reset the task state in client_state.xml to the last state before uploading to trigger a new upload. I would need to look for the correct value in this case.
This tasks link is this: https://einsteinathome.org/task/655689706
Christian Beer wrote:Is it
)
Thanks Christian, sadly no it has "left the room"
OK worth a punt!
client_state.xml shows a lot, but i'm guessing it's around here i need to make an edit.
<result>
<name>LATeah0031L_972.0_0_0.0_10386380_0</name>
<final_cpu_time>82.684000</final_cpu_time>
<final_elapsed_time>1440.523876</final_elapsed_time>
<exit_status>0</exit_status> #here
<state>4</state> # or here i guess.
<platform>x86_64-pc-linux-gnu</platform>
<version_num>118</version_num>
<plan_class>FGRPopencl1K-ati</plan_class>
<final_peak_working_set_size>272289792</final_peak_working_set_size>
<final_peak_swap_size>17620443136</final_peak_swap_size>
<final_peak_disk_usage>318133</final_peak_disk_usage>
<stderr_out> ## (long stderr and stdout follow)
OK strange i could not see it (specsaver moment i guess) but it definitely is in the list, now i'm sorting on WU ID it was easy to find.
AgentB wrote:I notice a task
)
By that, are you referring to the tasks tab or the transfers tab?? If it's the transfers tab, you have to some how force the transfer to retry and complete (if even possible). If it's the tasks tab and the task is 100% completed and it's status is 'uploading' (and there is nothing showing on the transfers tab) then it's a different situation and probably much easier to deal with.
I've seen quite a few of this latter case over the years. The task status was 'uploading' but there were no files to upload showing on the transfers tab and no ability to 'report'. After getting sick of seeing them eventually time out and be wasted, I decided to do something about it. Pretty much along the lines Christian mentions, my initial deduction was that the 'payload' had been received by the server which had allowed the transfers to be removed from the transfers tab but the client hadn't updated the state file to reflect that fact.
Not really knowing anything about this, my reasoning told me that if that were the case, all I would need to do would be edit the <result> block in the state file to 'simulate' the change of status from 'uploading' to 'ready to report'. My reasoning also said that this would probably be just by increasing a <status> value for that <result> by one and possibly inserting a <ready_to_report/> entry (or something like that) as well.
What I decided to do was stop BOINC when a further task had completed and uploaded and was 'ready to report'. I could then examine the state file and see exactly what the differences were between the two <results>. I don't remember precisely and although I have done this successfully quite a few times, I still do exactly the same each time I need to do it :-). I'm too lazy to write it down and I don't trust my memory any more :-). What I do remember is that the changes are quite minor and really obvious from the comparison.
If you have a task stuck as 'uploading' on the tasks tab, the above procedure should work.
Cheers,
Gary.
AgentB wrote:
)
No, don't change this! That's the exit status of the tasks and 0=clean exit
Yes, exactly. This needs to be 5 which (I presume) means the upload was successful.
I'm pretty sure you will also need a <ready_to_report/> flag as well so you should check another 'ready to report' task to see exactly what to put and where. If you need me to, I can easily check one of mine. I've got plenty to choose from :-).
Cheers,
Gary.
Gary Roberts wrote:AgentB
)
Thanks Gary - that seem needed to be added into the stderr section... the task is showing in progress on the link Christian supplied so it looks like i'll need to resend.
I 'll give that a try tomorrow.
AgentB wrote:Thanks Gary -
)
No, you insert <ready_to_report/> immediately AFTER the closing tag </stderr_out>
and immediately BEFORE the <completed_time>....</completed_time> entry.
I've just checked one of mine and that's what I see.
No you won't ... in fact that's the problem - there's nothing to resend. If there were, it would still be showing on the transfers tab. The green 'in progress' is only going to change when the upload server has both the upload (which it already has) and the 'report'. It's waiting for a report - that's the hold-up.
You just need to convince your client to do the report. If you stop BOINC and do the two edits, and then relaunch BOINC, the task can be reported because your client will see the <ready_to_report/> tag and do it pretty much immediately. Make sure you don't overlook the trailing slash :-).
Be careful and it should be fine :-). It's worked for me whenever I've seen this problem and done the edits.
Cheers,
Gary.
Thanks Gary for stepping in.
)
Thanks Gary for stepping in. This seems very reasonable. The server state is always in progress until the task is reported. The upload does not trigger a state change on the server (by design). So you see if it worked in the end when the task vanished in the Manager and the state on the server changes to done.
OK thanks both, so for the
)
OK thanks both, so for the record in case it comes up again, i was missing two lines.
Check tasks "Ready to Report"
Update project, the task does not validate but i expected that - it has now cleared.
The validator marked this as
)
The validator marked this as error because it couldn't find the result files. We should have gone one step back instead of one further as it seems that result files were not uploaded yet.
Also in your latest post where you add two lines you seem to add them to a O1Spot1 task and not the stuck FGRPB1G task was this just a copy and paste error?
Christian Beer wrote:Also in
)
I saw that too and came to the same conclusion at the time - he's probably just copied and pasted the excerpt from the wrong <result> block - perhaps the one immediately before or after the correct one.
Now that you mention that the uploaded files couldn't be found, perhaps he actually edited the wrong <result> block so that some other random task that hadn't even been crunched was 'reported' - hence the missing upload files. The 'wrong <result> block' theory is supported by the fact that a <completed_time> needed to be added. A task that really had been completed and was stuck in 'uploading' would have had that entry there already. However, if it were just a case of the wrong <result> block, the FGRPB1G task should still be showing in BOINC Manager with a status of 'uploading' so it's a bit of a mystery.
Do you have any way of easily checking if the upload files for the task he originally mentioned are still sitting on the upload server?
Cheers,
Gary.