Thu Oct 11 09:46:42 2001
I now have another area of caching, and wondering how this works in 2k, Linux, whatever.
In short, the argument goes like this:
The 8 MB cache on the WD 1000BB now is supposed to give this drive a great boost in performance over any ATA drive,
in fact, makes it even faster then X 15 scsi drives, since
the WD algorithims, are optimized to take advantage of the 8 mb of Cache on the new drive.
I have no clue as to how the os, and the algorithims on the
hard drives, affect their interrelationship with the drives firmware, and how it uses the drive cache. Could someone explain what happens?
Thanks in advance, Pizza.
Socrates
Thu Oct 11 12:21:11 2001
Thu Oct 11 14:34:02 2001
Caches can be used for two things. Speeding up reads, and speeding up writes.
Speeding up writes is fairly self-explanatory -- the cache operates faster than the drive mechanism, so can be written to at the full speed of the interface; this means that short writes can be written at the speed of the interface, not of the drive mechanism; the drive will then write to disk at its leisure. (This is assuming a write-back cache; often, especially in RAIDs, you change the cache to write-through behaviour, where it writes it to the disk straight away, to ensure data integrity). So for writes (and it shouldn't really matter if they're sequential or random), a larger cache means a faster disk.
This is why RAID controllers often have large caches; they can cover up RAID 5's slower write times (slower because it has to update a minimum of two spindles (the one where the data is, and the parity) in order to change a single byte. This lengthens average seek times) because writes are done to the (normally battery-backed) cache, rather than the drives (which have their caches set to write-through).
With reads, the situation is a bit more complicated. If there were no other factors coming into play, the read performance would be limited by the drive mechanism. However, there are a few things that make this no longer the case.
All transfers have to go through the cache (both read and write) -- it's also used as a buffer to join the high-speed SCSI bus to the lower-speed internal bus. If the cache is empty, no read can occur, if the cache is full, no write can occur.
Drive controllers can't burst unlimited amounts of information over their bus. Each transfer has a fixed upper limit (and it's not that big, of the order of a few hundred kilobytes). In between these transfers, the drive is inactive.
Further, programs don't often read large sequential blocks of data as fast as possible. They'll tend to read a bit of a file, then process that, then read a bit more, then process it, then a bit more, and so on.
So on-drive caches can be used to take advantage of these things.
Making the cache bigger reduces the situations where a read or write can't take place because the cache is empty or full (respectively).
When reading, rather than only reading enough to satisfy a single transfer, they can read until their buffer is filled. If they then get another transfer that's in the cache already, they can service it from the cache -- so they get better read performance. There are various algorithms for calculating how best to read ahead. The OS does a lot of work in this area too, and will direct the drive to read bits of a file that the user hasn't yet requested, if it thinks there's a good chance that the user /will/ request them.
Other ways the cache can be optimized is by partitioning it. If you have, say, a 4 Mbyte cache, and you're doing long sequential reads, you might prefer to have two ~2 Mbyte chunks. Because each transfer must be serviced by the cache -- and by a single segment -- larger segments permit larger single transfer operations (you can transfer ~2 Mbytes at a time; if the cache were split into 4x ~1 Mbyte, you'd only be able to transfer ~1 Mbyte at a time). This would favourably improve your data:SCSI arbitration ratio.
On the other hand, if you were doing lots of random access, with smaller reads, you might prefer more segments. Each segment contains an amount of "context" information (describing where it's located on the disk), and more segments mean that you can cache data from more areas on the disk (albeit in smaller amounts).
So WD presumably tune this kind of thing depending on how you use the disk. They have smart read-ahead algorithms (you read ahead differently if the file is randomly accessed than if it's sequentially accessed), and they have smart partitioning mechanisms.
The OS does a level of caching above this, and makes similar decisions. Some OSes (e.g. Windows) let you hint how you're going to use the file when you open it (you can specify whether to optimize the caching for sequential transfers or for random access), which lets them pick a caching mechanism appropriate to what you're doing. The OS has a larger cache to play with (it uses system memory), but does much the same thing.
Thu Oct 11 20:56:04 2001
How do caching algorithims, firmware, work with different os?
Could you write a firmware package for let's say, XP, that would optimize the disk cache for that os, let's say for average usage by a workstation?
If you do this, do you sacrifice performance in nix based os?
How do the different firmware packages interact with the different os? When seagate makes a cheetah, what determines
which os they optimize the firmware for?
Nix, 2k, XP, etc.?
Can they do this, or is it beyond the capabilities of the firmware to optimize for a certain os?
In other words, if you look at the seek patterns of the different os, could you optimize the firmware to read ahead in a pattern that's consistent with 2k, and how would that differ from Linux?
Thanks again, for all your help,
Socrates
Thu Oct 11 23:28:37 2001
Why haven't they, in these days of cheap ram?
Socrates
Fri Oct 12 00:52:04 2001
from Socrates posted at 9:56 pm on Oct. 11, 2001
Ok: Thanks, that really helped.How do caching algorithims, firmware, work with different os?
Could you write a firmware package for let's say, XP, that would optimize the disk cache for that os, let's say for average usage by a workstation?If you do this, do you sacrifice performance in nix based os?
How do the different firmware packages interact with the different os? When seagate makes a cheetah, what determines which os they optimize the firmware for?
Some drives (e.g. some IBM drives) let you specify how the cache is carved up -- so you can pick, say, 4 x ~1 Mbyte or 2 x ~2 Mbyte or whatever -- depending on what you're going to use the drive for.
But in principle, the drive should work it out for itself, and will have some clever heuristic for doing so. Or, it'll just pick a good set of defaults and stick with them.
Nix, 2k, XP, etc.?Can they do this, or is it beyond the capabilities of the firmware to optimize for a certain os?
In other words, if you look at the seek patterns of the different os, could you optimize the firmware to read ahead in a pattern that's consistent with 2k, and how would that differ from Linux?
Another question. This WD probably is using the 8 MB cache to store operating functions. Couldn't windows do the same thing, by increasing the Largecachefile setting from 4 mb, to something like 16mb?Why haven't they, in these days of cheap ram?
Fri Oct 12 01:16:45 2001
For a workstation, you don't have much but os functions, and the occasional pagefile hit.
My system in workstation, just loads sequential files of the programs I use, and then, unless the program calls for
pagefile system memory, everything just sits there.
Right now my memory usage history looks like I have a dead computer, with almost no cpu usage, 1=2%.
So these Ipeak tests, and stuff, maybe they call stuff that is likely to be stored in the buffer, but what would that be, if the os is in ram pretty much.
Perhaps they have segmented the buffer into many, very small, partitions, assuring that any system call will be either in the buffer, or the cache?
Don't know..
Socrates
seeking wisdom
Fri Oct 12 01:54:12 2001
from Socrates posted at 2:16 am on Oct. 12, 2001
OK:
If 2k, which is currently holding 12700k of Physical memory for system cache, in task manager,
then what is being stored in the buffer, that would have such a large impact on performance?
But in that situation, it won't have a huge impact on performance. Only when doing something with the disk.
For a workstation, you don't have much but os functions, and the occasional pagefile hit.My system in workstation, just loads sequential files of the programs I use, and then, unless the program calls for pagefile system memory, everything just sits there.
Right now my memory usage history looks like I have a dead computer, with almost no cpu usage, 1=2%.
So these Ipeak tests, and stuff, maybe they call stuff that is likely to be stored in the buffer, but what would that be, if the os is in ram pretty much.
The disk cache will contain the last read/written data.
The OS cache will contain those files that were last read/written. Just, it can store more of them.
Perhaps they have segmented the buffer into many, very small, partitions, assuring that any system call will be either in the buffer, or the cache?
The details of the disk buffer aren't [generally] controllable by the OS -- it's possible, I think, that one could write some way of controlling the disk buffer through software, but its configuration should be considered as static.
If your system isn't hitting the disk that often -- that is, heavy I/O is limited to, say, OS boots, and program loads -- then the disk speed isn't a critical factor in the performance; it just doesn't use the disk all that much (particularly not if you rarely reboot and rarely quit applications). A fast disk will speed up those I/O-limited tasks -- but they're sufficiently rare that it's probably not worth the money.
If, however, you do lots of disk I/O (for instance, using databases, or video files, or are extremely RAM limited so do lots of paging) then speeding the disk is more valuable, as it becomes a serious issue.
Fri Oct 12 01:56:48 2001
But, with just 128 mb of ram, the system can't take all the system physical memory it wants. So, perhaps the high scores of the WD are due to the buffer
caching system functions, that on machines with larger ram allocations, would be
in system physical memory cache.
Perhaps the buffer picks up either kernal functions, or os functions, that on a system with sufficent ram, would be in system cache.
So rather then the drive being all that fast, it's just the test machine that
is really susceptible to being influenced by the drive cache size, and excellent algoritims.
Perhaps the selection of 128 mb of ram makes the machine highly susceptible to drive cache influences, since many of the os, or kernal functions are now in the buffer.
Perhaps a better reflection of the performance of the drive would be a system with 64 MB of ram.
Since 32 would be about the size of the kernal, total, on 2k, perhaps that would be the best size to test with.
From these observations, I strongly suspect if you take the WD drive, and put it in a system with 384-512 MB of ram, the caching functions that make for such a huge increase in performance under Ipeak and winbench, would be superseded by the system using physical ram for system cache.
Is this what you are saying, Davin and Eugene, with the following
quote:
"On the other, with today's high-level benchmarks, 256 megs of RAM would drown out a significant amount of the hard disk's contribution towards performance, perhaps excessively so. We eventually decided on just 128 megs. "
Pizza: Am I close?
Socrates
Fri Oct 12 03:38:23 2001
On a machine with little RAM, it becomes more likely that the OS won't cache something it needs, so will have to go to the drive -- which might (a) be already caching what it needs, and (b) will have higher burst reads thanks to the cache.
But with ample RAM, it's more likely that any read will be serviced from a file already in RAM. Burst performance will still be improved by the drive's cache, but the random access that crops up from normal usage (which is dominated by seek times, not transfer rates) won't receive the same kind of advantage (comparing large disk cache to small disk cache), because it just won't have to hit the disk so often. By reducing this kind of I/O -- seek-intensive, with small transfers -- you'll generally get far more performance than by making bursts much faster.
If you're benching real-world performance of the drive, you'll tend to get more differentiation between the drives if you use a machine with relatively little memory. It will page to disk more (which means more disk I/O) and it will cache less, so the faster disk will win doubly. Paging is the kind of thing where the cache will probably help more, as it tends to be somewhat sequential
(Edited by DrPizza at 4:44 am on Oct. 12, 2001)
Fri Oct 12 03:56:19 2001
Sincerely,
Socrates
Fri Oct 12 04:17:00 2001
Use 32 mb of ram, set the pagefile high, encouraging the system to page? This would put a premium on cache size, and function?
Still, I don't see a perfect solution.
Any ideas?
Socrates
Fri Oct 12 06:07:43 2001
If it's theoretical performance, there's no problem -- you can tell Windows not to use its own buffering at all, so it gets bottlenecked solely on the drive/interface. I would imagine that any competent drive performance benchtest will do this.
If it's real-world performance, well... if you want to test true real-world performance, you have to give the machine a reasonable amount of RAM and so on (in which case many requests will be serviced from RAM anyway). The situation is such that realistic real-world tests won't highlight the drive's performance, except for very specific tests (e.g. boot speed, hibernate speed, video capture performance, etc.) that are really I/O intensive. If you really cripple the machine you'll stress the drive more, but it's not real world because most machines now have an adequate amount of memory.
I would be inclined to do realistic real-world tests (even if it won't stress the drives that much), and then synthetic benchmarks (which, if properly written, will take measures to ensure that OS caching is turned off for their test dataset).
Fri Oct 12 07:12:16 2001
They have managed to pick a ram selection that is hyper sensitive to a certain drive characteristic, buffer size.
It makes the tests both worthless for drive tests, since the tests skew in favor of cache, rather then drive performance, and, in a real world performance situation, there would be enough ram to diminish the drives cache
impact. Weird.
Guess you should be designing their test beds, because they think they are making great choices.
I have to really take the time to thank you for the effort you have put into educating me. Considering or prior
conflicts in ARS, I'm not sure if I was in your position, I would be so helpful. I'm amazed and thankful that we can have these dialogues, and really value your knowledge, and expertise.
Thanks again
Socrates