25 + hours on a Perseus wu

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138752
RAC: 558524

Excellent, this helped a

Excellent, this helped a lot!!

The problem is that the init_data.xml file contains an opening tag


that is closed nowhere in the file, so the XML is not well-formed and the parser is correct in rejecting this file.

Unfortunately this kind of problem is not easily handled from the science app perspective, it requires a fix in the BOINC client.

EDIT:
The bug was fixed last year already. From the checkin_notes_2012 document in the source code:

David  29 June 2012
    - client: add missing end tag for .  Doh!

/EDIT

It is exactly the same problem (occurring in a different context) reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=924441

Cheers
HB

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

OK, so most likely upgrading

OK, so most likely upgrading to a newer BOINC version should take care of the running in standalone mode problem.

So just to close out the long run time issue that started all this off:

The debate about just how fast a 7970 can blast through a Perseus got me looking a lot closer at my pair.

One thing which became apparent in a hurry is that how fast the app will run is highly dependent on what other load is on the machine at the time.

So in a nutshell, if you are running a lot of other projects with BOINC in a more or less default configuration there are times when Perseus can get slowed down almost to a crawl, most likely due to waiting for the CPU to get around to shuffling data in and out of the graphics card.

In my case, when the machine started running one of MilkyWay's CPU multi-threaded nBody tasks it can bring Perseus to a virtual standstill.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454483971
RAC: 8487

Current status with respect

Current status with respect to E&H: back in the game (at least I hope so)

What I was forced to do:

dump FC17 (no way around the BOINC problem with their version 7.0.29)
rebuild the node with ubuntu 12.04 LTS
reinstall NVIDIA drivers - went as advertised ~5 steps, reboot, recognized, done
looked at Ubuntu's BOINC - version 7.0.27 (older then Fedora)
downloaded the new 7.0.65 from the Boinc website (tested on Ubuntu) - IT FAILED to run because of library issues like it did on Fedora
installed the BOINC client/manager from the Ubuntu distro 7.0.27
rejoined E&H
work came in and running.
ran xmllint against a GPU job's init_data.xml file - CLEAN

so I meekly state that all is well. I will start looking at the tasks, etc.after they complete to ensure all is clean.

Will pursue seti once I feel that all is well.

"ageless" I believe had suggested a move to FC 18/19 but could not guarantee a clean install. The issue seems to have been fixed in FC 18, but FC is just too dynamic and fixes don't get fixed in the current release but are fixed in the next release. This pretty much implies you are toast unless you are willing to download source and compile. Therefore I feel that Fedora for "number crunching" is a bad choice. I have been running Ubuntu on a laptop for other work and have been quite satisfied with its performance, stability, etc. so I decided to install in on a standalone box for crunching.

Total migration time ~2.5 hours. Not bad since all worked and went as advertised. I am still in shock over the ease of installing the NVIDIA drivers.

Thanks for everyone's help/suggestion.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Good job. Glad to hear your

Good job. Glad to hear your host is back in the game. :-)

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454483971
RAC: 8487

RE: Good job. Glad to hear

Quote:
Good job. Glad to hear your host is back in the game. :-)

You can't imagine how happy I am. :>)

had a scare a bit ago though. I was forced to restart the Ubuntu node. When I did Boinc was reporting no usable GPUs. and would not download any GPU work. This was corrected by bouncing the client "service boinc-client restart" immediately GPU work was downloaded and within a short period of time a GPU WU started processing. I thought I was going to throw myself under the bus.

I don't mean to drag this thread out but if it will benefit someone else then that would be a good thing.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138752
RAC: 558524

Thanks for the feedback, it

Thanks for the feedback, it is always good to have the confirmation of a suggested solution in a thread about a given problem for others to look it up later.

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.