[BBLISA] accounting for I/O
Daniel Feenberg
feenberg at nber.org
Thu Sep 1 15:05:14 EDT 2016
Apparently heavy random I/O overloaded our fileserver last week, and
response was very slow. We solved the problem with additional spindles,
but we are curious to know which process is doing the random I/O. Perhaps
we could approach that user with an offer to help improve their turnaround
time by changing the code. Our users are mostly inexperienced students so
the possibility of suboptimal code is certainly there. Most usage is
sequential access to very large files that does not load the fileserver
much at all so this has been a new experience for us.
We can easily track bytes/second but a process doing random I/O may use
very few bytes/second, but still occupy much of the fileservers capacity,
so it hasn't been fruitful to identify the processes doing the most reads
and writes. During the period of overload, few disks were showing more
than kilobytes/second of read or write, yet iostat revealed that several
disks were continuously at 100%.
A program such as iostat will tell us which physical disk is busy, lsof
will tell us which file is open by which process, netstat and nfstat will
give aggregate statistics over all processes, but I can't find a program
that will tell us which process is occupying the fileservers attention
with expensive requests.
We couldn't replace all the disks with SSD, but might be able to provide
SSD for some files, if we could identify the culprits.
Daniel Feenberg
More information about the bblisa
mailing list