Class SftpFileSystem
is an implementation of java.nio.file.FileSystem
and
lets client code treat an SFTP server like any other file system. It is a
remote file system, though, and that has some effects that clients have
to aware of. Because operations are remote operations involving network
requests and answers, the performance characteristics are different than
most other file systems.
An SftpFileSystem
needs an SSH session to be able to talk to the SFTP server.
There are two ways to create an SftpFileSystem
:
-
If you already have an SSH
ClientSession
, you can create the file system off that session usingSftpClientFactory.instance().createSftpFileSystem()
. The file system remains valid until it is closed, or until the session is closed. When the file system is closed, the session will not be closed. When the session closes, so will the file system. -
You can create an
SftpFileSystem
with asftp://
URI using the standard Java factoryjava.nio.file.FileSystems.newFileSystem()
. This will automatically create anSshClient
with default settings, and the file system will open an SSH session itself. This session has heartbeats enabled to keep it open for as long as the file system is open. The file system remains valid until closed, at which point it will close the session it had created. If the session closes while the file system is not closed, the next operation on such a file system will open a new session.
Most operations on an SftpFileSystem
, or on streams returned, produce SFTP
requests over the network, and wait for a reply to be received. This works
internally by using SftpClient
to talk to the SFTP server. An SftpClient
is the client-side implementation of the SFTP protocol over an SSH channel
(a ChannelSession
).
The SSH channel and the SftpClient
are tightly coupled: the channel is
opened when the SftpClient
is initialized, and the channel is closed when
the SftpClient
is closed, and when the channel closes, so is the SftpClient
.
For SftpFileSystem
it would be rather inefficient to use a new SftpClient
for each new operation. That would create a new channel every time, and tear
it down after the operation. But channels have a setup cost, and the SFTP
protocol also has to initialized, both of which involve exchanging messages
over the network. This is not efficient if one wants to perform multiple
operations, such as transferring multiple files with java.nio.file.Files.copy()
.
The SftpFileSystem
thus employs a pool of SftpClient
s. This pool is
initially empty. The first operation will create an SftpClient
, initialize
it, and then perform its operation. But then it will add the still open
SftpClient
to the pool for use by subsequent operations instead of closing
it. The next operation can then simply grab this already initialized SftpClient
with its open channel and perform its operation.
The pool is limited by a maximum size of SftpModuleProperties.POOL_SIZE
(by
default 8). The pool can grow to this size if there are that many threads that
perform operations on the SftpFileSystem
concurrently.
SftpClient
s in the pool need to be closed at some point. Consider an application
that has a burst of file transfers and uses 8 threads to perform them. Afterwards,
the pool will contain 8 SftpClient
s: that's 8 open SSH channels, each with an SFTP
subsystem at the server's end. If the application then does only little, like
transferring a few files sequentially over the next few hours, until the next burst
(which may never come), then we don't want to keep all 8 channels open and consuming
resources not only in the client but also on the server side.
(This assumes that the whole SSH session remains open for that long, which can be accomplished by using heartbeats on the session.)
The SftpFileSystem
handles this by expiring inactive clients from the pool. If a
client has been in the pool for SftpModuleProperties.POOL_LIFE_TIME
(default is 10
seconds), it is removed from the pool and closed. (If it was in the pool for that
time, this means it was idle for that time: no operation was performed on it.) If
no operation on the SftpFileSystem
occurs at all for this time, it's possible that
the pool is emptied, and the next operation has to create and initialize a new
SftpClient
and channel.
If an application doesn't want this, it can define SftpModuleProperties.POOL_CORE_SIZE
,
which must be smaller than POOL_SIZE
. By default, it is zero. If greater than zero,
that many SftpClient
s are kept in the pool (and that many channels are kept open)
even if they are idle.
It should be noted that the SFTP server may also decide to close channels whenever
it wants. This will close the channel and the SftpClient
on the client side. If it
happens while a client-side operation is ongoing, the operation will fail with an
exception; it it happens on an idle SftpClient
in the pool, the SftpClient
is
simply removed from the pool.
If the whole SSH session is closed, the SftpFileSystem
is closed. When a
SftpFileSystem
is closed, all SftpClient
s in the pool are closed, and no new
clients will be added to the pool.
If there are more then POOL_SIZE
threads using the same SftpFileSystem
, it is
possible that all POOL_SIZE
clients are already in use when a thread tries to
do a file system operation. In this case, a new SftpClient
is created, which
will be closed after the operation. This is a sign of the POOL_SIZE
being too
small for the application, or that the application is badly designed. Using too
many threads for remote file operations is not a good idea in SFTP: all traffic
in the end goes over a single network connection anyway. Using a limited number
of threads may bring some speedup compared to strictly sequential operations
because the handling of the data received is offloaded to these threads, while
the next message can already be sent or received. But copying 1000 files using
1000 threads and SSH channels is nonsense; it's far better to handle that many
files in batches with a smaller number of threads (maybe 8).
In any case, SftpModuleProperties.POOL_SIZE
should be large enough to accommodate
the number of threads the client application is going to use for operations on
the SftpFileSystem
. If there are more threads, performance may degrade.
Versions of Apache MINA sshd <= 2.10.0 tried to mitigate this performance drop
for such "extra" threads by keeping the SftpClient
in a ThreadLocal
, so that
such "extra" threads could re-use the SftpClient
. This mechanism has been removed
because it sometimes caused memory leaks. The mechanism was also flawed because
there were use cases where it just could not work correctly.
Design your application such that is uses a small maximum number of threads that
perform operations on a SftpFileSystem
instance. Set SftpModuleProperties.POOL_SIZE
such that it is >= the maximum number of threads that operate concurrently on the
file system. The default pool size is 8.