The SINGLE gpu horse race at e@h

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 303
Credit: 11476906702
RAC: 13422710

Ian&Steve C. wrote: Boca

Ian&Steve C. wrote:

Boca Raton Community HS wrote:
Nice! Is it in an old(er) blade server? Just curious on how you are cooling/implementing it.


the GPUs have an SXM2 interface, not normal PCIe. you need an adapter board to convert that to PCIe. and then you put a heatsink on top. I'm water cooling mine right now.

 

That is so awesome. I have heard of this adapter but I was not sure that they actually existed in real life. Two questions:

1. Did you have to modify the adapter or GPU to make it actually compatible?

2. Is installing a SXM2 GPU onto the "socket" as difficult as I have read about?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4153
Credit: 50040438223
RAC: 42351282

Tom M wrote:Is it even

Tom M wrote:
Is it even feasible to air cool it?



you can. just have to be mindful of what application you're running and how much power it will use.

I had it up to like 440W on some very specific AI/ML tensor loads. and that probably wont be an easy thing to cool with air cooling. but i think it would probably be fine on an appropriately sized 3U or 4U cooler. but I'm not sure what air cooler actually fits this thing since the bolts are non-standard (for SXM2). they are wider apart at 36mm, when SXM2 heatsink bolts are usually 32mm apart. some users just took their normal air cooler and modified them with a file to widen the bolt holes.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4153
Credit: 50040438223
RAC: 42351282

Boca Raton Community HS

Boca Raton Community HS wrote:

That is so awesome. I have heard of this adapter but I was not sure that they actually existed in real life. Two questions:

1. Did you have to modify the adapter or GPU to make it actually compatible?

2. Is installing a SXM2 GPU onto the "socket" as difficult as I have read about?



i didn't modify anything in order to bolt the GPU to the adapter. as you can see in my picture, there is no I/O mounting bracket, so it's just kind of sitting in the slot there. it's not secured in any way and would be easy to bump loose. I made some crude brackets with 1" aluminum angle iron, just to be able to zip tie them to my mining rack later.

installing the SXM2 module isn't really that hard. it's secured by 8 screws. but they can be a little finicky if the mounting pressure is off. plus my boards are all used engineering test boards anyway and probably have their own set of idiosyncrasies. a low-torque screw driver is probably appropriate, but I've been winging it and adjusting manually as necessary.

_________________________________________________________________________

[AF>EDLS]zOU
[AF>EDLS]zOU
Joined: 5 May 15
Posts: 80
Credit: 389160059
RAC: 200324

I'm running out of tasks on

I'm running out of tasks on my ARC750 GPU.... so that means I'm maxing out :D 



20158    Einstein@Home    2/6/2025 7:35:32 PM    Requesting new tasks for Intel GPU    
20160    Einstein@Home    2/6/2025 7:35:34 PM    Scheduler request completed: got 0 new tasks    
20162    Einstein@Home    2/6/2025 7:35:34 PM    (reached daily quota of 960 tasks)  

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6884
Credit: 9793670264
RAC: 3394237

[AF>EDLS wrote:zOU]I'm

[AF>EDLS wrote:

zOU]

I'm running out of tasks on my ARC750 GPU.... so that means I'm maxing out :D 



20158    Einstein@Home    2/6/2025 7:35:32 PM    Requesting new tasks for Intel GPU    
20160    Einstein@Home    2/6/2025 7:35:34 PM    Scheduler request completed: got 0 new tasks    
20162    Einstein@Home    2/6/2025 7:35:34 PM    (reached daily quota of 960 tasks)  

The quota limit is caused by having a lower number of CPU cores than the boinc software thinks you ought to have. It is also triggered by a flood of computation errors. If it was a flood of computation errors just wait out the 24 hour time out.

A work around is to use the <ncpu> parameter in the cc_config.xml file.

The thing you have to limit is the flood of additional CPU tasks trying to run. This usually means lowering the boincmgr available threads to the number of real threads you want to make available.

It is common for a 8c/16t system to use the ncpu parameter to represent that the system has 64 threads.

HTH

A Proud member of the O.F.A.  (Old Farts Association).

[AF>EDLS]zOU
[AF>EDLS]zOU
Joined: 5 May 15
Posts: 80
Credit: 389160059
RAC: 200324
Tom M
Tom M
Joined: 2 Feb 06
Posts: 6884
Credit: 9793670264
RAC: 3394237

[AF>EDLS

Here is what your cc_config.xml file which is in your hidden Boinc (program data) folder might look like.

<cc_config>
 <log_flags>
   <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
   <rec_half_life_days>1</rec_half_life_days>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>90</save_stats_days>  
   <max_file_xfers>8</max_file_xfers>
   <max_file_xfers_per_project>4</max_file_xfers_per_project>  
    <ncpus>64</ncpus>
 </options>
</cc_config>

Note the <ncpus>

Your boincmgr or profile setting for CPU threads needs to be set to "12.5%" or lower.  That represents 8 threads / 64 threads.

Generally I prefer at least 1 thread left free so your system can run other tasks besides Boinc easily.

This should help.

Keep talking to us.  We are volunteers helping volunteers.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).

[AF>EDLS]zOU
[AF>EDLS]zOU
Joined: 5 May 15
Posts: 80
Credit: 389160059
RAC: 200324

that's my

that's my cc_config


<cc_config>

<options>

<allow_remote_gui_rpc>1</allow_remote_gui_rpc>

</options>

</cc_config>

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6884
Credit: 9793670264
RAC: 3394237

[AF>EDLS wrote:zOU] that's

[AF>EDLS wrote:

zOU]

that's my cc_config


<cc_config>

<options>

<allow_remote_gui_rpc>1</allow_remote_gui_rpc>

</options>

</cc_config>

Add this: <ncpus>64</ncpus>

Restart Boinc.  Set the boincmgr via the local menu to 12.5   (% cpu threads) and then maybe you won't get stopped by that daily task limit message, when you are running those GPU tasks.

HTH.

A Proud member of the O.F.A.  (Old Farts Association).

mikey
mikey
Joined: 22 Jan 05
Posts: 12940
Credit: 1884476265
RAC: 30306

Tom M wrote: Your boincmgr

Tom M wrote:

Your boincmgr or profile setting for CPU threads needs to be set to "12.5%" or lower.  That represents 8 threads / 64 threads.

Generally I prefer at least 1 thread left free so your system can run other tasks besides Boinc easily.

This should help.

Keep talking to us.  We are volunteers helping volunteers.

Tom M

Tom what do you mean by the setting of 12.5%, I understand how you got there just not where to put it, do you mean Project resource share? I saw another post that said thru the 'local menu' and I don't see that in my Boinc either and I'm using ver 8.04. YES I always adjust the Boinc Manager manually pc by pc so if that's the 'local menu' then I still don't see where to put the 12.5% setting.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.