Einstein disk usage growing

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Keith Myers

Keith Myers wrote:

San-Fernando-Valley wrote:

Keith Myers wrote:

So Gary, why does my computer need so many data files?  It is not as if I have a huge cache of work.  I limit the amount of work to only 25 cpu tasks and only 125 gpu tasks at any time.

I am wondering:  Hasn't that been explained ?

No, it hasn't been explained.  Why is my computer an outlier?  Bill uses only uses 1.1GB for his 101 tasks.

Why does my computer need 33X more space for 45 more tasks.

Please reread Gary's answer to Bill:

Gary Roberts wrote:
bill wrote:
... but I suspect that Einstein has old tasks ...
Absolutely NO! There are no "old tasks" stored on your computer. The space is occupied by data - BIG data! You may very well need that data for more tasks in the future. When the project deems that the data is really and truly finished with, the scheduler will issue a delete request.

The amount of data and the space it takes has nothing to do with the number of tasks you have in your cache at any given time. The task is actually only search parameters downloaded in the scheduler reply and inserted into client_state.xml.

A GW task needs a number of data files (10, 15, 20? I'm not sure for this search) and those data files can then be used to run many tasks. The scheduler tries to match tasks to the data files you already have to limit the number of new data files you have to download (and the server needs to send out).

As Gary wrote the scheduler will send a message to delete data files that are no longer needed when all possible tasks that use them are done.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18767293330
RAC: 7116827

I understood the answer.  I

I understood the answer.  I just am unhappy that my hosts are outliers compared to the majority in that my hosts got 33X more variety of tasks compared to everyone else requiring 33X more disk space.

Still does not solve my problem of running out of disk space.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3963
Credit: 47182942642
RAC: 65402641

it seems to be a consequence

it seems to be a consequence of having run GW tasks for a long time with leftover files that aren't needed by current work.

 

on my system that has been running GW tasks for several months, my einstein directory is like 26GB (20 tasks)

on my system that has been running GW only for a few weeks, the directory is only 4GB (14 tasks)

on my system that's only running GR tasks, the directory is <1GB (14 tasks)

 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18767293330
RAC: 7116827

Well I was getting bugged by

Well I was getting bugged by the new reminder messages about switching projects to https.  So I removed and rejoined both Einstein and Milkyway with the new https addresses.

Solved the issues about how much space Einstein was using.  Still have to do something about resizing the partition though to solve the problem that is going to come back.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117768095357
RAC: 34794211

Sorry for the delayed

Sorry for the delayed response to the further questions/comments that have arisen since my earlier comment.

It's late autumn here, the weather is nice and cool and there are no storms about so why there was a sudden and complete power outage is a bit of a mystery.  The power was off for around 2 mins and then it came back on.  It's taken quite a while to recover and get the farm back up and running again.

There were several disk corruptions that needed fixing and there are still 3 machines that refuse to even boot.  The nice thing with Linux is that it's very easy to take the disk from a non-booting machine and install it in something vaguely similar and have it boot up and process the outstanding work.  That's happening at the moment so there will be no loss of work resulting from the non-booting behaviour.  Having taken care of the cached tasks, I now have to spend some time to work out what is ailing those 3 machines and if the faulty hardware can be salvaged/repaired.

I'll take some of the questions/comments that have arisen in this thread and respond to them as best I can.  I emphasise that I'm not speaking on behalf of the project in any way.  They can answer for themselves if they wish.  You should take the following comments as just my personal opinion.  I do as much research and testing as I am able and I try not to give wrong information.  I'm just a volunteer like everyone else posting here but I do try to experiment and to understand the reasons for the observed behaviour.

Keith Myers wrote:
So Gary, why does my computer need so many data files?  It is not as if I have a huge cache of work.

As others have responded, the cache size is essentially irrelevant and has no significant bearing on the level of stored data files.  There are several true factors, all of which are compounding.  I'll try to list those I understand.  Please ask if anything is not clear.

  1. Locality scheduling is now working with a hugely enlarged data set size.  Prior to the GPU app, the standard data set for a task had approx 6 h1_ files and 6 l1_ files (approx 12 in total).  With the initial GPU app, that had grown to something like 16-24 files in total.  With the current VelaJr tasks, I'm seeing numbers from 32 to 72 data files per single task.  The number can easily be seen by examining a <workunit> ... </workunit> block in the state file.  Each required data file is listed separately in <file_ref> ... </file_ref> blocks within the workunit block.
  2. More data means more memory to store it while the task is running.  Ian&Steve C. has already pointed out to the Staff that despite some attempts to get the server not to send tasks to GPUs that can't handle them, the process is not working properly.  There are huge numbers of failures resulting in resends.  Very recently, I watched a test machine of mine get well over 100+ consecutive resends without a single primary (_0 or _1) task in the mix.  I kept upping the work cache size just to get a feel for how many resends there were.  Some of those resends had _4 and _5 extensions, indicating the existence of several earlier failures for each one.  If resends keep happening at these sorts of rates, how can the data for that particular frequency bin ever be finished with?
  3. For data files to be deleted, there must be some script/app that trawls the database to work out if a particular data file is not relied on by any particular workunit quorum that is not yet completed.  Either the staff haven't bothered to run that script/app recently or perhaps there are very good reasons we don't know about as to why data file deletion is not appropriate at the moment.

There are probably other reasons.  The project would certainly be under pressure from all the new Seti arrivals and Bernd seems to have the entire job of dealing with all that.  I think we need to cut them some slack under the circumstances.  Seti shutting down probably came as a bit of a shock.  It has probably complicated things and robbed them of the time to do the usual maintenance things.

Keith Myers wrote:
No, it hasn't been explained.  Why is my computer an outlier?

It's not an outlier at all.  The space needed is time dependent.  Bill mentioned 4GB on 1 machine.  His time crunching GW is probably quite short.  Yours is a lot longer, as is mine.  My test machine is currently storing 35GB.  I have a 120GB SSD and the partition size holding BOINC is 90GB.  Partition sizes can be adjusted with Gparted launched from a freshly booted live USB, without corrupting the installed system.  It's quite possible to shrink one partition and use the space released to grow/move another.  I've done that on quite a few machines without issue.

Ian&Steve C. wrote:
it seems to be a consequence of having run GW tasks for a long time with leftover files that aren't needed by current work

Exactly!  The size grows over time.  A project reset will give temporary relief if the space used must be reduced.  I'm fully aware that you (Ian&Steve C.) already fully understand all this.  For the benefit of the general readership, I'll try to explain why keeping data, if possible, is important

It's also true that much of that stored data doesn't relate to current primary tasks, but it certainly applies to resends.  For that 100+ resend tasks I mentioned earlier (belonging to 6 different 'frequency bins'), there wasn't a single data file that needed to be downloaded.  I had processed primary tasks earlier (and had none left) and was therefore an easy candidate for the scheduler to choose when the resends turned up.  That right there is one obvious benefit of keeping data, if possible.

By resetting the project you remove yourself from the pool of compatible systems that can immediately take the copious resends.  If everyone did that, the scheduler (after a time trying) would get sufficiently desperate to start picking hosts at random, perhaps giving them the 'christmas present' of a 72 (or more) large data file download for just a single resend task.  That will certainly be happening when a particular frequency bin has just the occasional resend remaining to be completed - ie. towards the end of the whole VelaJr series.  At the moment, there are huge numbers of resends.  In the bunch I observed, there were lots of tasks for each different frequency bin.

If the disk space can be afforded, it's much more neighbourly to keep the data you already have so that the scheduler can more efficiently distribute the flood of resends.

Keith Myers wrote:
I understood the answer.  I just am unhappy that my hosts are outliers compared to the majority ...

Your hosts are no different to any others that have participated to the same extent for the same length of time.  There is nothing special about you that is causing you to be singled out :-).

On a completely different note, if anyone is interested in my experiments on the current VelaJr GW GPU task behaviour, there are some things I've discovered.  For the last three weeks, I've been closely monitoring a Ryzen 5 2600 machine with a 4GB RX 570 GPU.  Of the tasks processed over that time (~2.5K tasks), around 70% have been done at a multiplicity of x3 and around 30% have been done at x2.   None (to my recollection) have needed to be crunched as singles.  None have failed to start and crunch correctly.

I have developed a system that allows me to know in advance, what will likely fit in the available memory and therefore what tasks can be run as x3 as opposed to those that can't.  I choose to process the two categories separately.  Yes, this is micromanaging.  I'm choosing to do it on a single machine just to better understand the problem.

The basic technique I've been using is to gradually increment the work cache size until I have around 2.5 - 3 days worth.  I then set the cache size to 0.05 to prevent further work.  I select out and suspend all tasks that will run at x3.  I set the multiplicity to x2 and process all the x2 tasks that remain.  When they are done, I set x3 and resume processing of the balance.  At a convenient point near the end of the x3 tasks, I rinse and repeat.  Of course, it's a bit more involved than that.

One particular trick I use for x2 tasks is to combine any 'very high required memory' type of task with one having a much smaller memory requirement and process each such pair first.  It seems to work a treat.   I mentioned above a batch of 100+ resends.  Four of those are in the 'very high mem' category.  As I write, they are processing right now, each paired up with a lower mem counterpart and none are failing.  There is now just one to go.  The crunch times are surprisingly good - better than I expected.

I have started writing up the whole experiment in much more detail so as to properly document my findings.  I expect to be able to publish it in Crunchers Corner in a day or two.  It will be ready when it's ready :-).

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18767293330
RAC: 7116827

That's a lot of

That's a lot of micromanaging, Gary.  I have been accused of being one, but I never would go the extent you are to babysit which tasks get loaded on which card at which multiplicity.

I know what I need to do with the disk partitions, but not how to do it. I apparently don't have a typical partition scheme.  I have the UEFI boot partition sandwiched between the two Linux partitions.  I have to shrink the partition at the end of th disk which should allow me to move the UEFI partition snug up to its beginning, then grow the first daily driver-BOINC partition toward the beginning of the UEFI partition.

But attempting to do that is not working.  The UEFI partition won't move.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3963
Credit: 47182942642
RAC: 65402641

I agree with Keith here.

I agree with Keith here. Unless you've scripted something to automate the process, that's just WAY too much work to manually do on a long term basis.

but I'd like to see more quantitative results on how fast the tasks actually run with 1x/2x/3x etc. I didn't see anything that showed they actually process faster by doing what you're doing.

it's possible that the AMD cards/apps handle multiple tasks better than the nvidia hardware, but I only observed slower run times (less WU completed per unit time) by running 2x and 3x on my nvidia cards with the GW tasks. 1x produces the highest throughput.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117768095357
RAC: 34794211

Ian&Steve C. wrote:Unless

Ian&Steve C. wrote:
Unless you've scripted something to automate the process, that's just WAY too much work to manually do on a long term basis.

The original intention was to find reliable behaviour patterns that could be scripted and applied to a bunch of hosts.  Hence the need to experiment extensively on one particular candidate host.  What I'm doing manually now is actually quite manageable.  Once the x3 and x2 groups are decided on, the machine looks after itself until the changeover.  Mixing and matching particular tasks is only being done to really understand the factors involved.  For the last 24 hours, x3 tasks have been crunched without intervention.

Ian&Steve C. wrote:
but I'd like to see more quantitative results on how fast the tasks actually run with 1x/2x/3x etc. I didn't see anything that showed they actually process faster by doing what you're doing.

Each time I've tested this there has been a significant improvement on AMD RX 570 GPUs.  It's a while since I've checked and it is frequency dependent to some extent and the frequency is currently in the 1400 - 1500 range so I've interrupted the steady x3 processing to get some fresh data for a higher frequency that I've not tested previously.   The tasks currently running are named h1_1447.95_....  They belong to the 1447.95 frequency bin and there is plenty of them to play with.  Because I download them in batches, I tend to get a whole lot of similar tasks, many with consecutive sequence numbers.

The last 3 groups of x3 tasks (9 tasks in total) took between 35 and 36.5 mins per group to crunch.  That's just under 12 mins on a per task basis.  These particular tasks were getting towards the upper limit (in terms of number of data files that each task references) for tasks that will reliably crunch at x3.  There were more tasks to follow that reference the same number of data files so I allowed the next 3 in the batch to be crunched singly instead of the intended x3.  The 3 consecutive times were 23.9, 22.8 and 22.3 mins respectively.  Thats a total of 69 mins to crunch the three as opposed to 36 mins when run concurrently.  Pretty close to 100% gain in productivity.

Whilst I was in the "micromanaging mood", I decided to do the same experiment for the batch of x2 tasks waiting in the wings.  They belonged to a different frequency bin - 1447.50Hz.  These happened to be just past what I regarded as 'safe' to run at x3 so I expected them to run fairly well at x2.  I ran a concurrent pair first and they both completed in very much the same time of 28.1 mins - so just over 14 mins per task.  I ran the very next 2 in the series as singles.  One took 23.2 mins and the other 23.6 mins.  So once again, a very big improvement for running the pair concurrently, but not as big an improvement as I see when tasks can reliably run at x3.

Ian&Steve C. wrote:
it's possible that the AMD cards/apps handle multiple tasks better than the nvidia hardware, but I only observed slower run times (less WU completed per unit time) by running 2x and 3x on my nvidia cards with the GW tasks. 1x produces the highest throughput.

I don't use nvidia at all so I can't comment.  A conspiracy theorist would declare that nvidia cripples OpenCL performance to make it look bad in comparison to CUDA.  I don't know about that but I've always found I get better results from this project from AMD cards that actually cost quite a bit less than their nvidia equivalent.  Australia seems to cop a pretty big 'nvidia tax' it seems.  That makes it hard for me to justify using them.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18767293330
RAC: 7116827

Quote:I don't use nvidia at

Quote:
I don't use nvidia at all so I can't comment.  A conspiracy theorist would declare that nvidia cripples OpenCL performance to make it look bad in comparison to CUDA.

I think it probably boils down to Nvidia locking the OpenCL API at level 1.2 for compatibility reasons while AMD is constantly pushing the envelope with OpenCL 2.0 and now 2.1

Curious to see the fallout of the decision for OpenCL 3.0 to revert to OpenCL 1.2 baseline for all vendors and leave it up to the vendors to implement specific advanced instructions.

Did you see the Khronos Group announcement Gary?

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117768095357
RAC: 34794211

Keith Myers wrote:Curious to

Keith Myers wrote:

Curious to see the fallout of the decision for OpenCL 3.0 to revert to OpenCL 1.2 baseline for all vendors and leave it up to the vendors to implement specific advanced instructions.

Did you see the Khronos Group announcement Gary?

I presume you're referring to the 27th April "provisional 3.0 spec" announcement.  Does "provisional" mean it's not quite decided yet? :-)  Also, by "vendors", I presume you mean hardware manufacturers?   That might lead to cheaper hardware limited to 1.2 compliance only, with the more specialised capabilities not included in those cheaper models.

It actually sounds like it might be useful for us.  I'm guessing that most OpenCL use relevant to BOINC projects only needs 1.2 anyway.   If you don't need the 'extras', why should you add unnecessary bloat to your system?

As I interpreted the announcement, it's not a matter of "reverting" but more one of "modularising".  If you need extra functionality, add it in the form of additional modules.  A lot of those should already exist with what has been developed over the years since 1.2 was set as the standard.  Instead of some monolithic 2.x or 3.0 package, you could choose the 1.2 base package and add modules as needed.

I imagine this might make it easier to debug both the library modules themselves and the apps that need the extra functionality contained in those modules.  Of course, that's just speculation since I've never tried delving into any of that stuff.  I'm not really concerned as long as the app I want to use will run efficiently.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.