Concept: Large WU branch of E@H?

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9644812
RAC: 0
Topic 193435

I might be an oddball, but I actually liked the days when the tasks at E@H were monsters. It made keeping track of them easier since there were fewer things to track. I usually just leave my computers on 24/7.

I was wondering if E@H had ever thought of having a separate or sub-project that 24/7 crunchers could sign up for. That project would have monster WU's that could feed the cruncher's hunger for data without piling up monsterous numbers of WUs to track.

Just an idea I had while sipping the morning coffee.

Elphidieus
Elphidieus
Joined: 20 Feb 05
Posts: 245
Credit: 20603702
RAC: 0

Concept: Large WU branch of E@H?

Having a hard time with your V8.....?

I thought I would be the oddball who would prefer those countless WUs as short as S@H.

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9644812
RAC: 0

I see some V8s at SETI and

I see some V8s at SETI and they have thousands of tasks in progress and completed. Thats why I put my V8 here; I thought I would be able to keep track of big WUs easier. If I put the V8 on Seti, I would be completely barraged with WUs and there would be no way I could monitor them very well.

As a side question: Anyone know of a project that has monster WUs? I'd like something that takes about a day to complete on the newer Intel Core 2 quad or duos.

xxxxx
xxxxx
Joined: 10 Jan 08
Posts: 1
Credit: 2852
RAC: 0

Have you tried 'climate

Have you tried 'climate prediction'?

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

Or Rosetta. You can set the

Or Rosetta. You can set the length of CPU time per WU. If I recall correctly, up to 24 hrs.

-edit- in 2 hour increments, from 2 to 24 hours

Screenshot

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: I thought I would be

Message 76822 in response to message 76819

Quote:
I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9644812
RAC: 0

RE: RE: I thought I would

Message 76823 in response to message 76822

Quote:
Quote:
I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

It is not a real "need" but rather a result of something I started when I did not have the V8 Mac Pro. I wrote some applescripts and little java programs to keep a spreadsheet of nearly all the tasks I finish on Einstein or Seti. It basically automates what a person would do by reading the normal status web pages provided by the projects. It ran quick before my V8. Now my automated reader take 2 hours to get through all the results from my 2 mac minis and the V8 Pro. There is probably a better way to do what I do, but I don't know how yet. If you look at my website under my user page you can get an idea of what I track (click on peaut). The site is on my home PC and my IP changes randomly, so it is kind of hit or miss as far as being available. I have a script that updates my IP, but it runs only once per hour. And then the update has to filter through the DNS system, so my site is not always accessible.

Thanks to the others who suggested projects with large WUs. You saved me time in digging for other projects to consider.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: RE: I thought I

Message 76824 in response to message 76823

Quote:
Quote:
Quote:
I thought I would be able to keep track of big WUs easier.

I don't understand your need to "track" things... Could you explain what you are needing to "track"?

It is not a real "need" but rather a result of something I started when I did not have the V8 Mac Pro. I wrote some applescripts and little java programs to keep a spreadsheet of nearly all the tasks I finish on Einstein or Seti. It basically automates what a person would do by reading the normal status web pages provided by the projects. It ran quick before my V8. Now my automated reader take 2 hours to get through all the results from my 2 mac minis and the V8 Pro. There is probably a better way to do what I do, but I don't know how yet.

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 197

RE: You could make it

Message 76825 in response to message 76824

Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture. Creating an SS with a few thousand rows from scratch shouldn't take more than a few minutes. Even then, it shouldn't need recreated at all. The existing output from the last run should be saved any the most recent updates appended to the end. Based on it suddenly going from quick to taking forever My suspicion is that there's something going on that is either factorial or exponential in order.

Peanut: roughly how long (lines of code) is your collection of scripts? If it's fairly short, and applescript is easy enough to read (I've never seen it before). I might be able to give an eyeball sanity check for anything that would clobber performance.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: You could make

Message 76826 in response to message 76825

Quote:
Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture.

Would you care to elaborate on that instead of, what feels like to me, beating me up over it? IOW, your dismissal of that as an idea seems quite RUDE!

Looking at his web site, it seems like it is basically a crawl of all of the tasks, sorted by descending WUID, then going into each task to obtain who the associated computer(s) is/are with each task. Another way it *COULD* be done is to break out the WUs by each host, enabling threads. Data merging could happen on the back side of all of it, once all data has been collected from the web pages.

IOW, threads would SPEED the data retrieval process from the web pages, as he could have the MacPro spinning out there on its own while still allowing other threads to be running. I would think it would speed up the data retrieval, which absolutely has to be the slowest aspect of the process if you are plowing through it sequentially like you are given if you just click on "My Results". The more threads, up to a certain point, you have running, the quicker the critical path can complete. Someone I worked with at a prior job noted the same thing with diminishing returns above a certain number of threads, but definitely an improvement with threads. That specific task was sweeping 5000+ systems to find the database revision on the specific box as well as other metrics, such as disk space.

As for "previous results" being stored, that's going to assume that there's a database of some kind behind the scenes, but honestly, since he has pending status and statistical info for the other host, even if the task switched from PENDING to credit granted, the way he's doing it still would require going into each task. As such, overhead to compare whether or not a value had changed would cause it to be slower than just flat out bulk importing the data fresh every time it runs.

So, unless you would like to elaborate on how specifically that thought process of doing threads based on task lists of each hostID (perhaps 2 for the MacPro itself, along the lines of top-half/bottom-half) is incorrect, I wouldn't mind at least a "gee, I hadn't thought about that", if not a full apology.

Respectfully,

Brian

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 197

RE: RE: RE: You could

Message 76827 in response to message 76826

Quote:
Quote:
Quote:

You could make it multithreaded, if it isn't already... If in Java, you'd need to either extend thread or implement the runnable interface, then make sure you issue a start, as you can't do Run() yourself, it always has to be with a start()... If you've done that, then the only thing I can think of is using more arraylists, cutting down the amount of time spent on writing to disk (do more in memory), etc...

Multithreading would be the wrong way to go here. Or more precisely it'd be like putting a bandaid on a skull fracture.

Would you care to elaborate on that instead of, what feels like to me, beating me up over it? IOW, your dismissal of that as an idea seems quite RUDE!

Processing several thousand records should take minutes at most. That it's taking hours means there's something seriously wrong with the implementation. A portion of the algorithm taking exponential time would be one posibility, and given his comment about the time suddenly going from fast to horrible seems the most likely. Beyond the trivial cases that are O(2^N) in any language with managed code something that abuses the memory manager can do the same. POstfixing to a nonmutable object (.net strings qualify and IIRC so do java's) or expanding a large array a few bytes at a time will absolutely murder performance once you go beyond nontrivial datasets. With an exponential algorithm, throwing hardware at it (all that multithreading can do) is a stopgap measure at best and will be overwelmed in short order as the size of the task continues to grow. Unless applescript is just hideously inefficient this is is a problem that should only take a few minutes to compute. It's taking hours is an indication taht the fundamental algorithm is badly flawed.

Quote:


Looking at his web site, it seems like it is basically a crawl of all of the tasks, sorted by descending WUID, then going into each task to obtain who the associated computer(s) is/are with each task. Another way it *COULD* be done is to break out the WUs by each host, enabling threads. Data merging could happen on the back side of all of it, once all data has been collected from the web pages.

IOW, threads would SPEED the data retrieval process from the web pages, as he could have the MacPro spinning out there on its own while still allowing other threads to be running. I would think it would speed up the data retrieval, which absolutely has to be the slowest aspect of the process if you are plowing through it sequentially like you are given if you just click on "My Results". The more threads, up to a certain point, you have running, the quicker the critical path can complete. Someone I worked with at a prior job noted the same thing with diminishing returns above a certain number of threads, but definitely an improvement with threads. That specific task was sweeping 5000+ systems to find the database revision on the specific box as well as other metrics, such as disk space.

You'd quicly hit personal bandwidth limits even assuming E@H doesn't have something to detect bots (either a search spider or a crude DDOS attack) thrashing its servers to throttle you on thier end. And DLing every WUs webpage instead of just the ones that could've been updated would be a clear case of bad implementation.

Quote:

As for "previous results" being stored, that's going to assume that there's a database of some kind behind the scenes, but honestly, since he has pending status and statistical info for the other host, even if the task switched from PENDING to credit granted, the way he's doing it still would require going into each task. As such, overhead to compare whether or not a value had changed would cause it to be slower than just flat out bulk importing the data fresh every time it runs.

Scraping thousands of webpages instead of storing data in a database (a CSV would be more than adequate here, no need for an SQL based implementation) would be a broken implementation. Once credit is granted for all particpants the record will not changem and there's no benefit to downloading it repeatedly. All you're doing is wasting your hardware time, and E@H's bandwidth.

Quote:

So, unless you would like to elaborate on how specifically that thought process of doing threads based on task lists of each hostID (perhaps 2 for the MacPro itself, along the lines of top-half/bottom-half) is incorrect, I wouldn't mind at least a "gee, I hadn't thought about that", if not a full apology.

See my first para. The performance levels he's getting for this small of a dataset are indicative of a badly broken algorithm that is unable to scale well with the dataset. Adding threads won't fix the fundamental problem any more than a bandaid will cure a dented skull. Either the codebase, or possibly the applescript engine has a serious problem. I asked about the size of, and possibly looking at, his codebase because that's the only way to tell which.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.