Mystery task - stuck at 100% uploading

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0
Topic 208354

I notice a task has completed and - stuck at 100% uploading.  Probably a couple of days.

 

the host is https://einsteinathome.org/host/11905468

 

boinccmd reveals.

 

   name: LATeah0031L_972.0_0_0.0_10386380_0
   WU name: LATeah0031L_972.0_0_0.0_10386380
   project URL: http://einstein.phys.uwm.edu/
   report deadline: Tue Jun 27 17:27:57 2017
   ready to report: no
   got server ack: no
   final CPU time: 82.684000
   state: uploading
   scheduler state: uninitialized
   exit_status: 0
   signal: 0
   suspended via GUI: no
   active_task_state: UNINITIALIZED
   app version num: 0
   checkpoint CPU time: 0.000000
   current CPU time: 0.000000
   fraction done: 0.000000
   swap size: 0 MB
   working set size: 0 MB
   estimated CPU time remaining: 0.000000

 

I don't see this task in my task list on the https://einsteinathome.org/host/11905468/tasks

 

I don't see this task completed in the job_log_einstein.phys.uwm.edu.txt log file either.  It may have errored although the time looks ok.

There used to be a method for searching for a specific task by name (to find WU number) in the old web site but i can't see if that exists on new. So I can't provide a link to task.

 

I have been suspending processing on this host due to temps but this should not affect tasks on the server side perhaps it is there and i can't see it in the portal.

 

The task template data exist here http://einstein3.aei.uni-hannover.de/EinsteinAtHome/download/32a/templates_LATeah0031L_0972_10386380.dat

 

It was downloaded Jun 13 17:28

So something is strange about this task, I have not yet "reset project" and it's not affecting other tasks.

I might have a backup of the event logs

So I guess the questions are

Why no matching task at E@H?

Is there a way to go from WU Name to WU ID?

and

what should i try next?

 

Cheers

edit: apologies for the double line spacing - but it doesn't look double spaced when i post, so perhaps i'll test out things at albert and post on this later.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 188504810
RAC: 213100

This is a rare case where the

This is a rare case where the task is stuck in the "uploading" state. My guess is that the client didn't get the ACK from the upload server and for some reason it does not try again. Is it still available under "Transfers" tab in the Manager and can you try to "restart" transfer from there?

If you are more advantageous you could try to reset the task state in client_state.xml to the last state before uploading to trigger a new upload. I would need to look for the correct value in this case.

This tasks link is this: https://einsteinathome.org/task/655689706

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Christian Beer wrote:Is it

Christian Beer wrote:
Is it still available under "Transfers" tab in the Manager and can you try to "restart" transfer from there?

Thanks Christian, sadly no it has "left the room"

Quote:
If you are more advantageous you could try to reset the task state in client_state.xml to the last state before uploading to trigger a new upload. I would need to look for the correct value in this case.

OK worth a punt!

client_state.xml shows a lot, but i'm guessing it's around here i need to make an edit.

<result>
    <name>LATeah0031L_972.0_0_0.0_10386380_0</name>
    <final_cpu_time>82.684000</final_cpu_time>
    <final_elapsed_time>1440.523876</final_elapsed_time>
    <exit_status>0</exit_status>  #here
    <state>4</state>      # or here i guess.
    <platform>x86_64-pc-linux-gnu</platform>
    <version_num>118</version_num>
    <plan_class>FGRPopencl1K-ati</plan_class>
    <final_peak_working_set_size>272289792</final_peak_working_set_size>
    <final_peak_swap_size>17620443136</final_peak_swap_size>
    <final_peak_disk_usage>318133</final_peak_disk_usage>
<stderr_out> ## (long stderr and stdout follow)

Quote:
This tasks link is this: https://einsteinathome.org/task/655689706

  OK strange i could not see it (specsaver moment i guess) but it definitely is in the list, now i'm sorting on WU ID it was easy to find.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117691095854
RAC: 35079695

AgentB wrote:I notice a task

AgentB wrote:
I notice a task has completed and - stuck at 100% uploading.  Probably a couple of days.

By that, are you referring to the tasks tab or the transfers tab??  If it's the transfers tab, you have to some how force the transfer to retry and complete (if even possible).  If it's the tasks tab and the task is 100% completed and it's status is 'uploading' (and there is nothing showing on the transfers tab) then it's a different situation and probably much easier to deal with.

I've seen quite a few of this latter case over the years.  The task status was 'uploading' but there were no files to upload showing on the transfers tab and no ability to 'report'.  After getting sick of seeing them eventually time out and be wasted, I decided to do something about it.  Pretty much along the lines Christian mentions, my initial deduction was that the 'payload' had been received by the server which had allowed the transfers to be removed from the transfers tab but the client hadn't updated the state file to reflect that fact.

Not really knowing anything about this, my reasoning told me that if that were the case, all I would need to do would be edit the <result> block in the state file to 'simulate' the change of status from 'uploading' to 'ready to report'.  My reasoning also said that this would probably be just by increasing a <status> value for that <result> by one and possibly inserting a <ready_to_report/> entry (or something like that) as well.

What I decided to do was stop BOINC when a further task had completed and uploaded and was 'ready to report'.  I could then examine the state file and see exactly what the differences were between the two <results>.  I don't remember precisely and although I have done this successfully quite a few times, I still do exactly the same each time I need to do it :-).  I'm too lazy to write it down and I don't trust my memory any more :-).  What I do remember is that the changes are quite minor and really obvious from the comparison.

If you have a task stuck as 'uploading' on the tasks tab, the above procedure should work.

 

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117691095854
RAC: 35079695

AgentB wrote:    

AgentB wrote:

    <exit_status>0</exit_status>  #here

No, don't change this!  That's the exit status of the tasks and 0=clean exit
   

Quote:
<state>4</state>      # or here i guess.

Yes, exactly.  This needs to be 5 which (I presume) means the upload was successful.

I'm pretty sure you will also need a <ready_to_report/> flag as well so you should check another 'ready to report' task to see exactly what to put and where.  If you need me to, I can easily check one of mine.  I've got plenty to choose from :-).

 

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Gary Roberts wrote:AgentB

Gary Roberts wrote:
AgentB wrote:
Quote:
<state>4</state>      # or here i guess.
 

Yes, exactly.  This needs to be 5 which (I presume) means the upload was successful.

I'm pretty sure you will also need a <ready_to_report/> flag as well so you should check another 'ready to report' task to see exactly what to put and where.  If you need me to, I can easily check one of mine.  I've got plenty to choose from :-).

Thanks Gary - that seem needed to be added into the stderr section...  the task is showing in progress on the link Christian supplied so it looks like i'll need to resend.

I 'll give that a try tomorrow.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117691095854
RAC: 35079695

AgentB wrote:Thanks Gary -

AgentB wrote:
Thanks Gary - that seem needed to be added into the stderr section...

No, you insert <ready_to_report/> immediately AFTER the closing tag </stderr_out>
and immediately BEFORE the <completed_time>....</completed_time> entry.

I've just checked one of mine and that's what I see.

Quote:
the task is showing in progress on the link Christian supplied so it looks like i'll need to resend.

No you won't ... in fact that's the problem - there's nothing to resend.  If there were, it would still be showing on the transfers tab.  The green 'in progress' is only going to change when the upload server has both the upload (which it already has) and the 'report'.  It's waiting for a report - that's the hold-up.

You just need to convince your client to do the report.  If you stop BOINC and do the two edits, and then relaunch BOINC,  the task can be reported because your client will see the <ready_to_report/> tag and do it pretty much immediately.  Make sure you don't overlook the trailing slash :-).

Quote:
I 'll give that a try tomorrow.

Be careful and it should be fine :-).  It's worked for me whenever I've seen this problem and done the edits.

 

Cheers,
Gary.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 188504810
RAC: 213100

Thanks Gary for stepping in.

Thanks Gary for stepping in. This seems very reasonable. The server state is always in progress until the task is reported. The upload does not trigger a state change on the server (by design). So you see if it worked in the end when the task vanished in the Manager and the state on the server changes to done.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

OK thanks both, so for the

OK thanks both, so for the record in case it comes up again, i was missing two lines.

cd /var/lib/boinc-client                                           # boinc data directory

sudo systemctl stop boinc-client.service            # systemd stop boinc

ps -u boinc                                                              # check all boinc  stopped

sudo cp client_state.xml backup.client_state.xml

sudo gedit client_state.xml 

                # changed state from 4 to 5

                # added 2 missing lines into /stderr_out stanza

</stderr_out>
    <ready_to_report/>                                                                            #line 1
    <completed_time>1497988428.890994</completed_time>   #line 2
 <wu_name>h1_1409.40_O1C02Cl1In0C__O1Spot1Hi_GalCent_1409.90Hz_114</wu_name>

# saved file and started boinc...

sudo systemctl start boinc-client.service # systemd start boinc

Check tasks "Ready to Report"

Update project,  the task does not validate but i expected that - it has now cleared.

Server state: Over
Outcome: Validate error
 
Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 188504810
RAC: 213100

The validator marked this as

The validator marked this as error because it couldn't find the result files. We should have gone one step back instead of one further as it seems that result files were not uploaded yet.

Also in your latest post where you add two lines you seem to add them to a O1Spot1 task and not the stuck FGRPB1G task was this just a copy and paste error?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117691095854
RAC: 35079695

Christian Beer wrote:Also in

Christian Beer wrote:
Also in your latest post where you add two lines you seem to add them to a O1Spot1 task and not the stuck FGRPB1G task was this just a copy and paste error?

I saw that too and came to the same conclusion at the time - he's probably just copied and pasted the excerpt from the wrong <result> block - perhaps the one immediately before or after the correct one.

Now that you mention that the uploaded files couldn't be found, perhaps he actually edited the wrong <result> block so that some other random task that hadn't even been crunched was 'reported' - hence the missing upload files.   The 'wrong <result> block' theory is supported by the fact that a <completed_time> needed to be added.  A task that really had been completed and was stuck in 'uploading' would have had that entry there already.  However, if it were just a case of the wrong <result> block, the FGRPB1G task should still be showing in BOINC Manager with a status of 'uploading' so it's a bit of a mystery.

Do you have any way of easily checking if the upload files for the task he originally mentioned are still sitting on the upload server?

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.