Skip to content

SMB Documentation

Axel Mahr edited this page Jan 17, 2024 · 10 revisions

The SMB Work Flow of pcapFS

With this wiki entry, we want to document how pcapFS handles SMB traffic internally. We give a detailed explanation of the whole SMB "work flow" of pcapFS and highlight assumptions made on the way as well as SMB scenarios where pcapFS has its weaknesses.

Overview

The functionality of pcapFS regarding SMB currently includes creation of SMB control files (one SMB control file per underlying TCP connection) which contain information about all transferred SMB messages. PcapFS also creates so-called SMB server files. These are the server-side files which are accessed during a captured SMB connection. These files are either accessed directly by SMB2_CREATE messages or we know from the context that the respective file(s) exists, e.g. through conducted search queries (SMB2_QUERY_DIRECTORY and SMB2_QUERY_INFO messages). Knowing these files, pcapFS reconstructs, as far as possible, the known parts of the server-side directory hierarchy. For now, only file metadata (file name, timestamps, file size) is set for SMB server files.

Details about how everything works is given in the further course. Let's start with a quick overview over the SMB-related source code files with their respective responsibilities:

  • In smb.cpp/smb.h, everything regarding SMB starts, it is also responsible for creating SMB control files which protocol all transferred SMB messages.
  • smb_packet.cpp/smb_packet.h are responsible for parsing SMB packets. One SMB packet consists of one SMB header and one SMB message body.
  • smb_messages.h parses the SMB message body of an SMB packet w.r.t. the message type.
  • smb_structs.h and smb_constants.h define structs, enums, response codes etc. needed for parsing and memorizing relevant SMB-related information.
  • smb_utils.cpp/smb_utils.h contain functions frequently needed for handling SMB traffic.
  • serverfile.cpp/serverfile.h define a super-class for virtual files representing server-side files which are accessed via protocols like SMB, NFS, etc.
  • smb_serverfile.cpp/smb_serverfile.h inherit from the serverfile super-class and represent server-side files ("real" files and directories) accessed via SMB.
  • smb_manager.cpp/smb_manager.h is the connecting piece between the parts responsible for parsing SMB traffic and SMB server files. It memorizes and manages per SMB server endpoint all server files as well as all SMB-related mappings that are needed to be kept in mind.

SMB Parsing

Here, we give a detailed explanation of the SMB parsing process. The parsing is designed on the basis of Microsoft's documentation of SMB version 2 and 3, so you need to be somehow familiar with that.

There are multiple ways of how SMB traffic is embedded in network packets. SMB can be realized on top of raw TCP, NetBIOS over TCP, QUIC and RDMA. PcapFS currently only supports SMB over TCP and SMB over NetBIOS over TCP. When such communication is detected, smb.cpp initiates the parsing of the "SMB packets" contained in the respective TCP payload. One SMB packet consists of one SMB header and one SMB message body.

Depending on the SMB version, different SMB headers are used which can be distinguished by different magic numbers. PcapFS focuses on SMB packets containing the SMB2 Header which is the standard header for SMB version 2 and 3. When the SMB2 header is detected, the whole SMB message body is parsed in a detailed manner. The main SMB-related functionality of pcapFS - especially regarding SMB server files - is built upon information extracted from packets containing this header. For all other header types, the only information extracted from them is, if possible, the message type (which is then documented in the resp. control file). In the further course, we assume to have packets with an SMB2 header.

SMB packets can be chained together. By this, multiple SMB packets can be embedded in the payload of one TCP packet. Offsets into the virtual TCP files differ for chained and unchained packets. Therefore, we need to carefully consider chained packets and distinguish chained packets which are the last part of a chain from other chained packets. This is done by taking the chainOffset field and related operations flag of the SMB2 header into account.

For parsing the message bodies of SMB packets, pcapFS provides one dedicated class for almost each message type. These classes represent the respective messages and enable easy access of information contained in the message which is needed to provide more information about the message in the control file as well as for creating and updating SMB server files. Parsing may fail if the message or field sizes are not correct, the structureSize field does not equal the obligated documented value or the SMB message type is unknown. Then, a generic SmbMessage class is instantiated and no further information is extracted. One special message type is the Error Response which is sent by an SMB server when an error occurs during handling the client's request message. PcapFS identifies an Error Response Message by detecting a structureSize value of 9 and a non-zero status code in the SMB2 packet header.

For each SMB message body, its message length is calculated. This needs to be done for the case that the respective SMB packets are chained in order to calculate the correct offsets. When they are not chained, the size field from NBSS header can be taken.

When detecting an SMB2_CREATE, SMB2_QUERY_DIRECTORY or SMB2_QUERY_INFO message, the SMB manager is invoked for updating the state of SMB server files according to the message. This is explained in the corresponding section below.

But before that, we need to look closer at relevant information - especially mappings - that is needed to be kept in mind. Some information is required to be managed per SMB connection and other stuff globally for all SMB connections to the same SMB server.

What is managed per SMB connection?

While parsing all SMB packets (including the contained SMB messages), pcapFS manages one smbContext struct per SMB connection. smbContext holds information which needs to be memorized along one SMB connection. This includes amongst other things:

  • a reference to the underlying virtual TCP file
  • information which needs to be remembered between request and response of SMB2_CREATE, SMB2_QUERY_DIRECTORY, SMB2_QUERY_INFO and SMB2_TREE_CONNECT messages (file name, fileId, fileInfoClass, ...)
  • treeId-treename mapping: Each SMB server can have multiple separate directory trees which are identified by the treeId field of the SMB2 header. So, the treeId indicates which directory tree the respective SMB message refers to. The name of a tree corresponds to the name of its root directory (This is also the root directory of the directory tree which is derived by pcapFS containing the known SMB server files which are located there). The mapping is only unique for one SMB connection. This means that in a scenario with distributed access to the same SMB server over multiple TCP connections at the same time, it is possible that among these multiple connections, the same treeId is referring to different trees. Thus, this mapping needs to be managed per connection. (Other mappings, like e.g. fileId-filename, are the same over multiple simultaneous connections to the same SMB server. So, this must be memorized globally). The treeId-treename mapping is extracted from SMB2_TREE_CONNECT request and responses. Because of that, the tree name cannot be determined for a given treeId if the corresponding SMB2_TREE_CONNECT request and response were not captured. When this is the case and the corresponding SMB message accesses a file which will be created as an SMB server file by pcapFS, we cannot determine to which tree it belongs, i.e., in which derived directory tree to put it. This results in the creation of a derived tree with the generic tree name "treeid_x" with x being the treeId number. The respective accessed file is then inserted there. Again, in a scenario where we have accesses to the same SMB server over multiple TCP connections at the same time, this approach can lead to redundancies of the same files being saved in different derived directory trees (with different tree names according to their treeIds).

What is globally managed?

The SMB manager is - besides managing SMB server files - also responsible for managing two mappings which need to be maintained globally because they pertain all connections to the same SMB server, or to be more precise, all connections to the same tree of the same SMB server. Such a tree is identified by the ServerEndpointTree struct which consists of the SMB server's IP address and port as well as the respective tree name. Per ServerEndpointTree the following mappings are memorized by the SMB manager:

  • fileId-filename mapping: Each server-side file is addressed using its fileId instead of its file name. The usual procedure for the client to interact with a server-side file, is that at first, the file needs to be opened. For that the client sends an SMB2_CREATE request containing the file name. The server responds with a newly created fileId (file handle) corresponding to the file which is valid until the file is closed with the SMB2_CLOSE message. Having the fileId, the client does whatever they want to to with the file (obtain metadata information, read, write, ...). The fileId-filename mapping is memorized globally because the same fileId can be used over different connections to the same ServerEndpointTree, i.e., it is possible that in one connection, the fileId is obtained by an SMB2_CREATE response and in another simultaneous connection to the same tree, this fileId is used. For pcapFS, this behavior may lead to undetected SMB server files in the following scenario: Assume we have two SMB connections to the SMB server at the same time and in the connection which started later than the other, the fileId is obtained by an SMB2_CREATE response. When then the other connection, which started earlier, uses this file handle, pcapFS does not know to which file name it belongs. This is because pcapFS does not parse the whole capture file chronologically but connection-wise. Thus, the SMB connection, which uses the fileId, is parsed before the SMB connection where the corresponding fileId-filename mapping can be derived from.
  • filename-FilePtr mapping: By this mapping, the derived server-side files (SMB server files) are managed. The FilePtr is a pointer to a virtual SMBServerFile (this can also be a directory) which later becomes a real file in the resulting server-side directory hierarchy derived by pcapFS. More to that in the subsequent section.

Management of SMB Server Files

Now that we roughly know how pcapFS handles SMB parsing and which information needs to be memorized at which abstraction layer, it remains to be explained how SMB server files are extracted and how the server-side directory hierarchies are derived. All of this is done by the SMB manager. Currently, SMB server files are created/updated via three different SMB message types, SMB2_CREATE response, SMB2_QUERY_INFO response and SMB2_QUERY_DIRECTORY response. When one of these message types is detected, the SMB manager takes over right after the message is parsed. Depending on the different file metadata contained in the message, different file properties can be set or updated. When the SMB manager encounters a message regarding a file which is priorly unknown for the respective tree, a new SMBServerFile is created and its metadata is set according to the information contained in the respective SMB message. SMBServerFile is a specialized ServerFile for SMB. The difference of a ServerFile from other virtal files is that it contains more timestamps and a pointer to its parent directory which is also a ServerFile. So, starting from an SMBServerFile, a cascade of parent directory pointers can be built up until the root directory of the corresponding tree (whose name is obtained by the treeId-treename mapping) is reached. By that, pcapFS can easily build up the respective directory hierarchy at the mount point. For each newly created SMBServerFile, its parent directories can be determined because the file name for the SMBServerFile (as it has to be derived from the fileId-filename mapping) luckily always includes its absolute path beginning with the first subdirectory of the corresponding tree (There is one exception when handling SMB2_QUERY_DIRECTORY responses, look below for further info).

Let's look closer at what pcapFS does for each of the three mentioned message types.

SMB2_CREATE response

First of all, the fileId-filename mapping of the file requested via the SMB2_CREATE message is updated for the tree it belongs to. When the file is not yet present as an SMB Server File in the filename-FilePtr mapping, a new SMBServerFile is created and its metadata is initialized with the file information contained in the SMB2_CREATE response (namely timestamps, filesize and the information whether it is a directory or not). In order to set the pointer to the file's parent directory, pcapFS iterates backwards through the file's absolute path and recursively creates SMB server files for all parent directories which are not yet represented by an SMBServerFile instance. If the file requested via the SMB2_CREATE message is already present as an SMB Server File in the filename-FilePtr mapping, its metadata is updated if the file's lastChangeTime contained in the SMB2_CREATE response's message body is newer.

SMB2_QUERY_INFO response

The detection of an SMB2_QUERY_INFO response initiates SMB server file creations/changes only if the underlying query info type is SMB2_0_INFO_FILE and the file info class is FileAllInformation, FileBasicInformation or FileNetworkOpenInformation. All other info types/classes don't contain (enough) needed file information. SMB2_QUERY_INFO messages address files via their fileIds. Hence, the fileId-filename mapping for the requested file needs to be known in advance for all file info classes except for FileAllInformation which also contains the file name. So, when the fileId-filename mapping is not known for the requested file's fileId and we have a FileAllInformation info class, pcapFS is still able to establish that mapping through the file name contained in FileAllInformation. The creation/updating process for the SMBServerFile instance corresponding to the file requested via the SMB2_QUERY_INFO message works the same as explained above for an SMB2_CREATE response.

SMB2_QUERY_DIRECTORY response

In contrast to the other two message types, SMB2_QUERY_DIRECTORY responses can contain information for multiple files, i.e., all files in the current directory which match the search patterns specified in the SMB2_QUERY_DIRECTORYrequest. The file info classes which are relevant for pcapFS are:

  • FileDirectoryInformation,
  • FileFullDirectoryInformation,
  • FileIdFullDirectoryInformation,
  • FileBothDirectoryInformation,
  • FileIdBothDirectoryInformation,
  • FileIdExtdDirectoryInformation.

They contain all timestamps, filename and relevant file attributes for every matching file of the requested directory. The directory is addressed using its fileId and, for creating/updating the SMB server files corresponding to the files listed in the SMB2_QUERY_DIRECTORY response, we need to know (the absolute path of) the directory name. This is no issue when the fileId-filename mapping for that directory is already known. However, pcapFS needs to tackle somehow the case that the name for the fileId is not known. This case might not occur very often because before a SMB2_QUERY_DIRECTORY request, the corresponding directory has to be accessed via SMB2_CREATE and, with every SMB2_CREATE response, the respective fileId-filename mapping for that directory gets memorized. Nevertheless, it can happen that the SMB2_CREATE request and SMB2_QUERY_DIRECTORY request for the same directory are chained together. Then, the client specifies the fileId fffff...f in the SMB2_QUERY_DIRECTORY request indicating that they refer to the directory accessed via the SMB2_CREATE request right before. Then, only looking at the SMB2_QUERY_DIRECTORY messages, pcapFS does not know the name of the requested directory (since the fileId fffff...f doesn't resolve to a known file name). Thus, pcapFS takes the name specified in the last SMB2_CREATE request (only if this SMB2_CREATE referred to a directory) as directory name. Then, pcapFS is able to assemble the absoute path for every file listed in the SMB2_QUERY_DIRECTORY response and can create/update the corresponding SMBServerFile instances. If the name specified in the last SMB2_CREATE request is not available (i.e., it is empty), pcapFS puts the respective SMB server files into the root directory of the current tree. This is done because it is common for SMB to have SMB2_CREATE messages with empty file name field when it is referred to the root directory. So, in this case, pcapFS would put the SMB server files in the correct directory. But this may also be wrong when, e.g., the SMB2_CREATE messages regarding the directory requested via the SMB2_QUERY_DIRECTORY messages are not captured and the directory is not the root directory or when the SMB2_CREATE messages are transmitted in a later-starting simultaneous connection to the same tree (This corresponds to the issue mentioned above when explaining the fileId-filename mapping).

Creation of derived server-side directory hierarchies

Now that we know the absolute path for every SMB server file through the cascade of parent directory pointers, it is pretty easy to derive the resulting directory hierarchies for all SMB server endpoint trees and incorporate them into the directory layout. For that, pcapFS flips the cascade of parent directory pointers for every SMB server file and then inserts each resulting tree at the mount point's subdirectory where the underlying SMB connection satisfies the property corresponding to the directory.

Memorizing SMB server file hierarchies in the index file

...

Summary of SMB-related issues

...

TODOs

  • write file contents into SMB server files
  • create multiple versions of SMB server files for each point in time where file content/metadata was changed
Clone this wiki locally