Mar 12, 2019
Abstract
At FP Complete, we strive to build systems that endure the
direst of situations. An unexpected shutdown (like a kernel panic,
or unplugging the power cord) in a machine should not affect the
durability of confirmed writes in programs we develop.
As a developer, you'll likely want to
have options in regards to guaranteed durable writes; durability is
the property that ensures that once an API confirms a write, every
read will reflect its changes.
Durable writes are essential for our customers; many of them
work in the Financial and Medical Tech industry, and any saved
piece of data—whether in a filesystem or a database—must have
durability guarantees after we perform a write operation.
Otherwise, companies lose money and lives may be at risk.
In this blog post, we will demonstrate that, although our
high-level Haskell APIs tell us that it persisted our file writes
in the filesystem, a catastrophic failure like an unexpected
shutdown may cause us to lose writes that we thought were
committed. We will later demonstrate how using low-level C APIs
(the ones popular RDBMS databases use) offer better guarantees in
write durability; and finally, we’ll show you some prior art we
have implemented in the rio
package , which you can use today to
get durable file writes.
Status Quo of File APIs in Haskell
When performing writes in Haskell programs, we often rely on
functions like writeFile
or withBinaryFile
in our Haskell codebase; these
functions return as soon as the OS Kernel confirms writes happened.
However, the kernel does not typically store writes on physical
disks right away. Instead, it stores file writes in a cache buffer
first, which helps to improve runtime performance of writes.
Often, this behavior is acceptable. Not every piece of data must
be durable and if, for some reason, you are dealing with large
files, having them being durable by default might be an expensive
operation. That said, the durability aspect of filesystem writes
must be a conscious decision rather than an afterthought.
Improving write durability in Haskell
How can we improve the durability situation in our Haskell APIs?
First, we need to make use of C functions that will sync writes to
the file system. Once writes have been performed to a
Handle
, we need to be able to call the lower-level function fsync
on the Handle
internal file descriptor. However, a fsync
on the file Handle
alone won’t do. We also must call fsync
in the file descriptor of the containing directory to sync the file name of our Handle
in the
file metadata system. A great reference to learn
interesting/important details about fsync
might be Xavier Roche’s excellent blog post;
this blog post document what behavior you should expect in
different OS and filesystem formats. (You should check it out, it’s
cool, we’ll wait here.)
Developers at FP Complete implemented a new family of functions
that use the strategies mentioned above in the rio
library:
withBinaryFileDurable
withBinaryFileDurableAtomic
ensureFileDurable
To implement these functions, we used internal APIs from
GHC.IO.Handle.FD
and System.Internal.Posix
. We found a few constraints in
the existing GHC API that are worth mentioning:
A
Handle
cannot be built from directory pathsThis limitation makes total sense, as we can only open
directories in
ReadMode
, and theHandle
API doesn’t make sense for directory operations. We made the C file
descriptor of directories an internal implementation detail of the
high-level functions exported by the
rio
library , the way API users deal with files stays unaffected.There is no
fsync
andopenat
foreignimports in the GHC filesystem API
In our module, we added a few foreign imports to accommodate
lower-level APIs for reliable writes. We use a combination of C and
internal types from the GHC API to offer a high-level API for users
of our library.
You can take a look at the source code if you are curious.
We are open to feedback, as the usage of low-level APIs can get
somewhat tricky at times. We are hoping to at some point in the
future makes these functions (or a version of them) part of the
standard Haskell base
, so not only users of the rio
library can take advantage of this
functionality.
Testing the durability of our file system
Being aware of the durability aspects of our filesystem is nice
and dandy, but should we really be concerned? Is this
issue something that could happen frequently, or are we paranoid?
This question is a fair one. We believe this ordeal happens more
often than you might expect. We can easily replicate durability
concerns using a virtual machine.
When we execute the following program:
In the scenario where we execute this program with the
normal
sub-command and it finishes without errors:
Our program writes new files in the given directory path
normal
; However, we may all lose all these writes if
a hard restart happens merely after our program finishes. The
following screencast shows how our program creates 100 files in an
empty directory, each file containing a small message. Once our
program finishes, we list the files in the specified directory to
assert everything is in order, then, we perform a hard reset on the
VirtualBox machine, once the machine is booted again, our precious
files are no longer present in the file system:
The same algorithm using our durable flavored API doesn’t have
this problem after a hard restart; when executing the program above
with the durable
sub-command:
The written files will be kept on disk thanks to the usage of
fsync
. Following a screencast showing this exercise
on a VirtualBox machine:
Summary
In this blog post, we covered some nuances of the durability of
file systems and also learned about the importance of using
fsync
in our lower level APIs.
If your system:
Runs a virtualized environment that can disappear at any time (e.g., Cloud Providers and Hot Spot instances)
Is responsible for storing file contents from third parties
Belongs to a business domain where write durability is a crucial concern
It is a good practice to make sure file writes are guaranteed to
be durable in catastrophic scenarios. FP Complete offers auditing
services where we can help you discover this and other various
challenges.