Now that the workunits have been finally evac'd off my old PCs, I have built a proper cluster using OSCAR on Fedora 8.
Right now I have 13 systems [12 P4, 1 P3] as a cluster. I'll toss on one node tomorrow.
I have another dozen systems [2 P4, but need more SRAM], and 10 more P3s of varying speeds. But I'm currently power & ethernet limited at the moment. I need to go buy a spool of ethernet and jacks, and I'll have to put together more power.
But it'll all be running BOINC when it isn't running someone's program. Which will be faaaar more often than not. I wonder if it'd be worthwhile to schedule BOINC as an /actual/ cluster process as opposed to the regular idle configuration that gets killed when an actual program starts...
Very impressive!!! Are you planning to use the SSE enabled Power User app? Might be a pain installing on so many nodes unless this is somehow outomated, tho.
Quote:
But it'll all be running BOINC when it isn't running someone's program. Which will be faaaar more often than not. I wonder if it'd be worthwhile to schedule BOINC as an /actual/ cluster process as opposed to the regular idle configuration that gets killed when an actual program starts...
Well, BOINC itself runs as a "normal" process, only the science app that is executed by BOINC runs with maximum niceness, and the logic for this is hard-wired into BOINC, so no way to change this I'm afraid.
Maximum nice-ness isn't good enough, I've found. E@H remains a significant resource [CPU] drain on systems that aren't that great, as well as on systems that are good [my x2] while using CPU-intensive tasks.
My lazy solution is this: Whenever someone wants to run a program, I'll just have them drop in the relevant commands to start and stop BOINC.
It makes sense too because I'm short on RAM on some of the machines [SRAM is a PAIN to find] so I'm running 256mb on some of them.
I will be using the SSE enabled stuff as soon as I make a file marking what has what. I can deploy /really easily/ too, and it sounds like it's worth talking about for a minute.
I'm using OSCAR. It's basically an amalgation of clustering utilities wrapped up in a nice tight little package. One of the utilities it has is something called cexec.
The cexec command allows me to run a command on an arbitrary number of hosts, or all the hosts from a list, etc. That, and there is an automatic mounting of the server node's user home director, things are simple.
One cexec command copies the already extracted BOINC install over NFS to each machine in /opt/BOINC.
Another cexec command runs a shell script I made to attach to Einstein, close before the first unit starts, and restart with certain command line options [stdout ---> file, return results immediately].
I need to make an init.d script for BOINC, but that'll be simple.
Next, I'll deploy custom apps to each machine after separating the dozen nodes into classes [SSE, non-SSE, etc] as needed.
I'm learning linux crazy fast as a bonus, and I'm learning how to manage a computer cluster. I still have cluster-specific bugs to work out, like ganglia and MPI aren't cooperating, but that's lower priority since there is no active need for this cluster. This is my personal project that has ballooned slightly, but I've managed to only spend 22.50 [on a 24 port switch!] so far.
I still have a dozen machines [3x dual core P3 {900, 800, 666mhz}, 1 slow P4, another P3 900, and 7 or 8 P3 [450,500] machines to deploy. The pain is that the crap I haven't put online is old enough that PXE isn't quite an option [or is first gen PXE] so I have to burn a CD and do that.
At least I can use that dual core 666 now - there was a cute issue with the Dell Poweredge that for /some/ goddamn reason it was decided that there would be /no/ IDE controllers on the system. I'm stuck with PXE [which I don't know if it works on that heap, since I've been ignoring it] or hoping to hell that the ancient SCSI CDROM will read a burned CD.
You might want to consider using my setup since I understand you run a setup similar to mine. The advantages are homogeneity in software, crazy-easy configuring [cexec & NFS], slight decrease in resource usage [no GUI, less crap installed], ease of deployment, and monitoring ability. It takes awhile to get your head around how the system operates, but it's pretty sweet and I haven't even begun to take full advantage of what's there.
Hmm, Oscar sounds great. If i had enough PC and time i would try to set up that cluster. Maybe that will be a solution for using a superhost for crunching.
It's an AlphaDS10@600MHz running linux.
WU progress 3,746% after 1h52 min...
First WU done in 176,809.78 sec...
Could I kindly ask you to use an app version number for self-compiled Apps that's not in the range of version numbers currently in use for official or Beta Apps? I'd suggest the generic versions 400 or 500. If you want to distinguish app versions yourself, everything below 400 is unused (apart from my own experiments usually in the range 0.01-0.50).
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...
Spec:
2x 1.4GHz P3s with 512KB L2-Cache
768MB registered SDRam with ECC
3x 9.1GB Ultra2 SCSI/SCA2/LVD HDD
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...
Spec:
2x 1.4GHz P3s with 512KB L2-Cache
768MB registered SDRam with ECC
3x 9.1GB Ultra2 SCSI/SCA2/LVD HDD
With four SGI 1200's in my living room, you don't have to tell me about noise. (Each one sounds like a Cessna getting ready for take-off.)
Yeah...I've got two similar Dual PIII 1.4GHz 1U servers with a total of 16 fans, and the noise is deafening :-). I keep them in an unused room where they don't bother anyone.
If you just use that server for E@H I guess it would not harm to pull out one of the disks or even all three of them and replace them with a single, but bigger, more modern and less noisy / power consuming /heat generating HD?
With four SGI 1200's in my living room, you don't have to tell me about noise. (Each one sounds like a Cessna getting ready for take-off.)
In your livingroom? Oh No, i'm not single anymore. My girlie would kill me...
Quote:
Nice vintage box!
Yeah...I've got two similar Dual PIII 1.4GHz 1U servers with a total of 16 fans, and the noise is deafening :-). I keep them in an unused room where they don't bother anyone.
If you just use that server for E@H I guess it would not harm to pull out one of the disks or even all three of them and replace them with a single, but bigger, more modern and less noisy / power consuming /heat generating HD?
CU
Bikeman
The "DL380 G2" is a 2HE server for 19"-racks with 400watt redundant psu. The 3 disks are not the problem, they are changed now from raid5 to raid1 with hotspare. But i'm buying whole thing for 120Euronen without any driver or OS. Cool, ftp.compaq.com is still alive!!!
With the latest Linux beta app 4.38, E@H will again use CPU feature detection (e.g. SSE support) to switch between "optimized" and "stock" versions of the science client.
So I guess it's a good idea to test this especially with vintage computers: the older, the better! I guess of particular interest would be hosts that are so old that they do not even support the CPUID instruction (however, hosts that old will probably not finish a E@H task within the deadline, but still they should not cause a crash of the app, I guess).
Now that the workunits have
)
Now that the workunits have been finally evac'd off my old PCs, I have built a proper cluster using OSCAR on Fedora 8.
Right now I have 13 systems [12 P4, 1 P3] as a cluster. I'll toss on one node tomorrow.
I have another dozen systems [2 P4, but need more SRAM], and 10 more P3s of varying speeds. But I'm currently power & ethernet limited at the moment. I need to go buy a spool of ethernet and jacks, and I'll have to put together more power.
But it'll all be running BOINC when it isn't running someone's program. Which will be faaaar more often than not. I wonder if it'd be worthwhile to schedule BOINC as an /actual/ cluster process as opposed to the regular idle configuration that gets killed when an actual program starts...
Very impressive!!! Are you
)
Very impressive!!! Are you planning to use the SSE enabled Power User app? Might be a pain installing on so many nodes unless this is somehow outomated, tho.
Well, BOINC itself runs as a "normal" process, only the science app that is executed by BOINC runs with maximum niceness, and the logic for this is hard-wired into BOINC, so no way to change this I'm afraid.
CU
Bikeman
Maximum nice-ness isn't good
)
Maximum nice-ness isn't good enough, I've found. E@H remains a significant resource [CPU] drain on systems that aren't that great, as well as on systems that are good [my x2] while using CPU-intensive tasks.
My lazy solution is this: Whenever someone wants to run a program, I'll just have them drop in the relevant commands to start and stop BOINC.
It makes sense too because I'm short on RAM on some of the machines [SRAM is a PAIN to find] so I'm running 256mb on some of them.
I will be using the SSE enabled stuff as soon as I make a file marking what has what. I can deploy /really easily/ too, and it sounds like it's worth talking about for a minute.
I'm using OSCAR. It's basically an amalgation of clustering utilities wrapped up in a nice tight little package. One of the utilities it has is something called cexec.
The cexec command allows me to run a command on an arbitrary number of hosts, or all the hosts from a list, etc. That, and there is an automatic mounting of the server node's user home director, things are simple.
One cexec command copies the already extracted BOINC install over NFS to each machine in /opt/BOINC.
Another cexec command runs a shell script I made to attach to Einstein, close before the first unit starts, and restart with certain command line options [stdout ---> file, return results immediately].
I need to make an init.d script for BOINC, but that'll be simple.
Next, I'll deploy custom apps to each machine after separating the dozen nodes into classes [SSE, non-SSE, etc] as needed.
I'm learning linux crazy fast as a bonus, and I'm learning how to manage a computer cluster. I still have cluster-specific bugs to work out, like ganglia and MPI aren't cooperating, but that's lower priority since there is no active need for this cluster. This is my personal project that has ballooned slightly, but I've managed to only spend 22.50 [on a 24 port switch!] so far.
I still have a dozen machines [3x dual core P3 {900, 800, 666mhz}, 1 slow P4, another P3 900, and 7 or 8 P3 [450,500] machines to deploy. The pain is that the crap I haven't put online is old enough that PXE isn't quite an option [or is first gen PXE] so I have to burn a CD and do that.
At least I can use that dual core 666 now - there was a cute issue with the Dell Poweredge that for /some/ goddamn reason it was decided that there would be /no/ IDE controllers on the system. I'm stuck with PXE [which I don't know if it works on that heap, since I've been ignoring it] or hoping to hell that the ancient SCSI CDROM will read a burned CD.
You might want to consider using my setup since I understand you run a setup similar to mine. The advantages are homogeneity in software, crazy-easy configuring [cexec & NFS], slight decrease in resource usage [no GUI, less crap installed], ease of deployment, and monitoring ability. It takes awhile to get your head around how the system operates, but it's pretty sweet and I haven't even begun to take full advantage of what's there.
Hmm, Oscar sounds great. If i
)
Hmm, Oscar sounds great. If i had enough PC and time i would try to set up that cluster. Maybe that will be a solution for using a superhost for crunching.
RE: RE: Here's another
)
Could I kindly ask you to use an app version number for self-compiled Apps that's not in the range of version numbers currently in use for official or Beta Apps? I'd suggest the generic versions 400 or 500. If you want to distinguish app versions yourself, everything below 400 is unused (apart from my own experiments usually in the range 0.01-0.50).
BM
BM
Today a "DL380 G2" comes to
)
Today a "DL380 G2" comes to my sweet home and is still waiting for setup for Boinc. Now i'm thinking about where to place or hang especially with his incredible noise from 10 aircoolers...
Spec:
2x 1.4GHz P3s with 512KB L2-Cache
768MB registered SDRam with ECC
3x 9.1GB Ultra2 SCSI/SCA2/LVD HDD
RE: Today a "DL380 G2"
)
With four SGI 1200's in my living room, you don't have to tell me about noise. (Each one sounds like a Cessna getting ready for take-off.)
Nice vintage
)
Nice vintage box!
Yeah...I've got two similar Dual PIII 1.4GHz 1U servers with a total of 16 fans, and the noise is deafening :-). I keep them in an unused room where they don't bother anyone.
If you just use that server for E@H I guess it would not harm to pull out one of the disks or even all three of them and replace them with a single, but bigger, more modern and less noisy / power consuming /heat generating HD?
CU
Bikeman
RE: With four SGI 1200's in
)
In your livingroom? Oh No, i'm not single anymore. My girlie would kill me...
The "DL380 G2" is a 2HE server for 19"-racks with 400watt redundant psu. The 3 disks are not the problem, they are changed now from raid5 to raid1 with hotspare. But i'm buying whole thing for 120Euronen without any driver or OS.
Cool, ftp.compaq.com is still alive!!!
Hi all! With the latest
)
Hi all!
With the latest Linux beta app 4.38, E@H will again use CPU feature detection (e.g. SSE support) to switch between "optimized" and "stock" versions of the science client.
So I guess it's a good idea to test this especially with vintage computers: the older, the better! I guess of particular interest would be hosts that are so old that they do not even support the CPUID instruction (however, hosts that old will probably not finish a E@H task within the deadline, but still they should not cause a crash of the app, I guess).
CU
Bikeman