Concept: Large WU branch of E@H?

peanut

Joined: 4 May 07

Posts: 162

Credit: 9644812

RAC: 0

12 Jan 2008 16:52:35 UTC

Topic 193435

(moderation:

)

I might be an oddball, but I actually liked the days when the tasks at E@H were monsters. It made keeping track of them easier since there were fewer things to track. I usually just leave my computers on 24/7.

I was wondering if E@H had ever thought of having a separate or sub-project that 24/7 crunchers could sign up for. That project would have monster WU's that could feed the cruncher's hunger for data without piling up monsterous numbers of WUs to track.

Just an idea I had while sipping the morning coffee.

Elphidieus

Joined: 20 Feb 05

Posts: 245

Credit: 20603702

RAC: 0

Concept: Large WU branch of E@H?

13 Jan 2008 15:19:03 UTC

Message 76818

(moderation:

)

Having a hard time with your V8.....?

I thought I would be the oddball who would prefer those countless WUs as short as S@H.

peanut

Joined: 4 May 07

Posts: 162

Credit: 9644812

RAC: 0

I see some V8s at SETI and

13 Jan 2008 16:03:37 UTC

Message 76819

(moderation:

)

I see some V8s at SETI and they have thousands of tasks in progress and completed. Thats why I put my V8 here; I thought I would be able to keep track of big WUs easier. If I put the V8 on Seti, I would be completely barraged with WUs and there would be no way I could monitor them very well.

As a side question: Anyone know of a project that has monster WUs? I'd like something that takes about a day to complete on the newer Intel Core 2 quad or duos.

xxxxx

Joined: 10 Jan 08

Posts: 1

Credit: 2852

RAC: 0

Have you tried 'climate

13 Jan 2008 16:12:26 UTC

Message 76820

(moderation:

)

Have you tried 'climate prediction'?

Erik

Joined: 14 Feb 06

Posts: 2815

Credit: 2645600

RAC: 0

Or Rosetta. You can set the

13 Jan 2008 16:16:38 UTC

Message 76821

(moderation:

)

Or Rosetta. You can set the length of CPU time per WU. If I recall correctly, up to 24 hrs.

-edit- in 2 hour increments, from 2 to 24 hours

Screenshot

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: I thought I would be

13 Jan 2008 19:09:37 UTC

Message 76822 in response to message 76819

(moderation:

)

Quote:

I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

peanut

Joined: 4 May 07

Posts: 162

Credit: 9644812

RAC: 0

RE: RE: I thought I would

15 Jan 2008 2:16:12 UTC

Message 76823 in response to message 76822

(moderation:

)

Quote:

Quote:
I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

It is not a real "need" but rather a result of something I started when I did not have the V8 Mac Pro. I wrote some applescripts and little java programs to keep a spreadsheet of nearly all the tasks I finish on Einstein or Seti. It basically automates what a person would do by reading the normal status web pages provided by the projects. It ran quick before my V8. Now my automated reader take 2 hours to get through all the results from my 2 mac minis and the V8 Pro. There is probably a better way to do what I do, but I don't know how yet. If you look at my website under my user page you can get an idea of what I track (click on peaut). The site is on my home PC and my IP changes randomly, so it is kind of hit or miss as far as being available. I have a script that updates my IP, but it runs only once per hour. And then the update has to filter through the DNS system, so my site is not always accessible.

Thanks to the others who suggested projects with large WUs. You saved me time in digging for other projects to consider.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: RE: RE: I thought I

15 Jan 2008 3:51:39 UTC

Message 76824 in response to message 76823

(moderation:

)

Quote:

Quote:
Quote:
I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

It is not a real "need" but rather a result of something I started when I did not have the V8 Mac Pro. I wrote some applescripts and little java programs to keep a spreadsheet of nearly all the tasks I finish on Einstein or Seti. It basically automates what a person would do by reading the normal status web pages provided by the projects. It ran quick before my V8. Now my automated reader take 2 hours to get through all the results from my 2 mac minis and the V8 Pro. There is probably a better way to do what I do, but I don't know how yet.

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3593234919

RAC: 662521

RE: You could make it

15 Jan 2008 5:00:43 UTC

Message 76825 in response to message 76824

(moderation:

)

Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture. Creating an SS with a few thousand rows from scratch shouldn't take more than a few minutes. Even then, it shouldn't need recreated at all. The existing output from the last run should be saved any the most recent updates appended to the end. Based on it suddenly going from quick to taking forever My suspicion is that there's something going on that is either factorial or exponential in order.

Peanut: roughly how long (lines of code) is your collection of scripts? If it's fairly short, and applescript is easy enough to read (I've never seen it before). I might be able to give an eyeball sanity check for anything that would clobber performance.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: RE: You could make

15 Jan 2008 5:49:46 UTC

Message 76826 in response to message 76825

(moderation:

)

Quote:

Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture.

Would you care to elaborate on that instead of, what feels like to me, beating me up over it? IOW, your dismissal of that as an idea seems quite RUDE!

Looking at his web site, it seems like it is basically a crawl of all of the tasks, sorted by descending WUID, then going into each task to obtain who the associated computer(s) is/are with each task. Another way it *COULD* be done is to break out the WUs by each host, enabling threads. Data merging could happen on the back side of all of it, once all data has been collected from the web pages.

IOW, threads would SPEED the data retrieval process from the web pages, as he could have the MacPro spinning out there on its own while still allowing other threads to be running. I would think it would speed up the data retrieval, which absolutely has to be the slowest aspect of the process if you are plowing through it sequentially like you are given if you just click on "My Results". The more threads, up to a certain point, you have running, the quicker the critical path can complete. Someone I worked with at a prior job noted the same thing with diminishing returns above a certain number of threads, but definitely an improvement with threads. That specific task was sweeping 5000+ systems to find the database revision on the specific box as well as other metrics, such as disk space.

As for "previous results" being stored, that's going to assume that there's a database of some kind behind the scenes, but honestly, since he has pending status and statistical info for the other host, even if the task switched from PENDING to credit granted, the way he's doing it still would require going into each task. As such, overhead to compare whether or not a value had changed would cause it to be slower than just flat out bulk importing the data fresh every time it runs.

So, unless you would like to elaborate on how specifically that thought process of doing threads based on task lists of each hostID (perhaps 2 for the MacPro itself, along the lines of top-half/bottom-half) is incorrect, I wouldn't mind at least a "gee, I hadn't thought about that", if not a full apology.

Respectfully,

Brian

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3593234919

RAC: 662521

RE: RE: RE: You could

16 Jan 2008 0:05:25 UTC

Message 76827 in response to message 76826

(moderation:

)

Quote:

Quote:
Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture.

Would you care to elaborate on that instead of, what feels like to me, beating me up over it? IOW, your dismissal of that as an idea seems quite RUDE!

Processing several thousand records should take minutes at most. That it's taking hours means there's something seriously wrong with the implementation. A portion of the algorithm taking exponential time would be one posibility, and given his comment about the time suddenly going from fast to horrible seems the most likely. Beyond the trivial cases that are O(2^N) in any language with managed code something that abuses the memory manager can do the same. POstfixing to a nonmutable object (.net strings qualify and IIRC so do java's) or expanding a large array a few bytes at a time will absolutely murder performance once you go beyond nontrivial datasets. With an exponential algorithm, throwing hardware at it (all that multithreading can do) is a stopgap measure at best and will be overwelmed in short order as the size of the task continues to grow. Unless applescript is just hideously inefficient this is is a problem that should only take a few minutes to compute. It's taking hours is an indication taht the fundamental algorithm is badly flawed.

Quote:

Looking at his web site, it seems like it is basically a crawl of all of the tasks, sorted by descending WUID, then going into each task to obtain who the associated computer(s) is/are with each task. Another way it *COULD* be done is to break out the WUs by each host, enabling threads. Data merging could happen on the back side of all of it, once all data has been collected from the web pages.

IOW, threads would SPEED the data retrieval process from the web pages, as he could have the MacPro spinning out there on its own while still allowing other threads to be running. I would think it would speed up the data retrieval, which absolutely has to be the slowest aspect of the process if you are plowing through it sequentially like you are given if you just click on "My Results". The more threads, up to a certain point, you have running, the quicker the critical path can complete. Someone I worked with at a prior job noted the same thing with diminishing returns above a certain number of threads, but definitely an improvement with threads. That specific task was sweeping 5000+ systems to find the database revision on the specific box as well as other metrics, such as disk space.

You'd quicly hit personal bandwidth limits even assuming E@H doesn't have something to detect bots (either a search spider or a crude DDOS attack) thrashing its servers to throttle you on thier end. And DLing every WUs webpage instead of just the ones that could've been updated would be a clear case of bad implementation.

Quote:

As for "previous results" being stored, that's going to assume that there's a database of some kind behind the scenes, but honestly, since he has pending status and statistical info for the other host, even if the task switched from PENDING to credit granted, the way he's doing it still would require going into each task. As such, overhead to compare whether or not a value had changed would cause it to be slower than just flat out bulk importing the data fresh every time it runs.

Scraping thousands of webpages instead of storing data in a database (a CSV would be more than adequate here, no need for an SQL based implementation) would be a broken implementation. Once credit is granted for all particpants the record will not changem and there's no benefit to downloading it repeatedly. All you're doing is wasting your hardware time, and E@H's bandwidth.

Quote:

So, unless you would like to elaborate on how specifically that thought process of doing threads based on task lists of each hostID (perhaps 2 for the MacPro itself, along the lines of top-half/bottom-half) is incorrect, I wouldn't mind at least a "gee, I hadn't thought about that", if not a full apology.

See my first para. The performance levels he's getting for this small of a dataset are indicative of a badly broken algorithm that is unable to scale well with the dataset. Adding threads won't fix the fundamental problem any more than a bandaid will cure a dented skull. Either the codebase, or possibly the applescript engine has a serious problem. I asked about the size of, and possibly looking at, his codebase because that's the only way to tell which.

Concept: Large WU branch of E@H?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner