Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 13 The Virtual Filesystem #77

Open
jason--liu opened this issue Jul 13, 2021 · 0 comments
Open

Chapter 13 The Virtual Filesystem #77

jason--liu opened this issue Jul 13, 2021 · 0 comments

Comments

@jason--liu
Copy link
Owner

jason--liu commented Jul 13, 2021

虚拟文件系统(有时称为虚拟文件交换机或更常见的简称为 VFS)是内核的子系统,它实现了提供给用户空间程序的文件和文件系统相关接口

通用文件系统接口

VFS 是一种粘合剂,它使 open()、read() 和 write() 等系统调用能够在不考虑文件系统或底层物理介质的情况下工作。VFS 和块 I/O 层一起提供了抽象、接口和粘合,允许用户空间程序发出通用系统调用以通过任何文件系统上的统一命名策略访问文件,文件系统本身存在于任何存储介质上。

文件系统抽象层

这种适用于任何类型文件系统的通用接口之所以可行,只是因为内核围绕其低级文件系统接口实现了一个抽象层。抽象层通过定义所有文件系统支持的基本概念接口和数据结构来工作。实际上,除了文件系统本身之外,内核中没有任何东西需要了解文件系统的底层细节。比如

ret = write (fd, buf, len);

流程如下图
image

Unix文件系统

历史上,Unix提供了4个基本文件系统抽象:files, directory entries, inodes, and mount points。在 Unix 中,文件系统挂载在称为命名空间的全局层次结构中的特定挂载点。 这使所有挂载的文件系统都显示为单个树中的条目。
文件是有序字节串。文件是有目录管理,目录类似于文件夹,通常包含相关文件,目录可以嵌套以形成路径路径的每个组成部分称为一个directory entry
image

VFS对象和他们的数据结构

VFS是面向对象的。VFS有四种主要对象:

  • The superblock object, which represents a specific mounted filesystem.
  • The inode object, which represents a specific file
  • The dentry object, which represents a directory entry, which is a single component of a path.
  • The file object, which represents an open file as associated with a process.

每个对象里面包含operations对象,这些对象描述了内核操作对于对象的方法:

  • The super_operations object, which contains the methods that the kernel can invoke on a specific filesystem, such as write_inode() and sync_fs()
  • The inode_operations object, which contains the methods that the kernel can invoke on a specific file, such as create() and link()
  • The dentry_operations object, which contains the methods that the kernel can invoke on a specific directory entry, such as d_compare() and d_delete()
  • The file_operations object, which contains the methods that a process can invoke on an open file, such as read() and write()

内核中每个已经注册的文件系统由file_system_type表示,挂载点由vfsmount表示,这个结构包含一些挂载点信息,比如挂载位置和标志。每个进程里面由fs_struct描述对应的文件系统和file描述对应的文件。

Superblock Object

超级块对象由每个文件系统实现,用于存储描述特定文件系统的信息,存储在磁盘上特定的扇区。

Filesystems that are not disk-based (a virtual memory–based filesystem, such as sysfs, for example) generate the superblock on-the-fly and store it in memory

对于sysfs这里文件系统,superblock存放在内存中。superblock对象由super_block表示

#include <linux/fs.h>
struct super_block {
	struct list_head s_list; 			/* list of all superblocks */
	dev_t s_dev; 						/* identifier */
	unsigned long s_blocksize; 			/* block size in bytes */
	unsigned char s_blocksize_bits; 	/* block size in bits */
	unsigned char s_dirt; 				/* dirty flag */
	unsigned long long s_maxbytes; 		/* max file size */
	struct file_system_type s_type; 	/* filesystem type */
	struct super_operations s_op; 		/* superblock methods */
	struct dquot_operations *dq_op; 	/* quota methods */
	struct quotactl_ops *s_qcop;		 /* quota control methods */
	struct export_operations *s_export_op; /* export methods */
	unsigned long s_flags; 				/* mount flags */
	unsigned long s_magic; 				/* filesystem’s magic number */
	struct dentry *s_root; 				/* directory mount point */
	struct rw_semaphore s_umount; 		/* unmount semaphore */
	struct semaphore s_lock; 			/* superblock semaphore */
	int s_count; 						/* superblock ref count */
	int s_need_sync; 					/* not-yet-synced flag */
	atomic_t s_active; 					/* active reference count */
	void *s_security; 					/* security module */
	struct xattr_handler **s_xattr; 	/* extended attribute handlers */
	struct list_head s_inodes; 			/* list of inodes */
	struct list_head s_dirty; 			/* list of dirty inodes */
	struct list_head s_io; 				/* list of writebacks */
	struct list_head s_more_io; 		/* list of more writeback */
	struct hlist_head s_anon; 			/* anonymous dentries */
	struct list_head s_files; 			/* list of assigned files */
	struct list_head s_dentry_lru; 		/* list of unused dentries */
	int s_nr_dentry_unused; 			/* number of dentries on list */
	struct block_device *s_bdev; 		/* associated block device */
	struct mtd_info *s_mtd; 			/* memory disk information */
	struct list_head s_instances; 		/* instances of this fs */
	struct quota_info s_dquot; 			/* quota-specific options */
	int s_frozen; 						/* frozen status */
	wait_queue_head_t s_wait_unfrozen; 	/* wait queue on freeze */
	char s_id[32]; 						/* text name */
	void *s_fs_info; 					/* filesystem-specific info */
	fmode_t s_mode; 					/* mount permissions */
	struct semaphore s_vfs_rename_sem; /* rename semaphore */
	u32 s_time_gran; 					/* granularity of timestamps */
	char *s_subtype; 					/* subtype name */
	char *s_options; 					/* saved mount options */
};

创建、管理和销毁supberbloc对象的代码在fs/super.c中。
超级块对象是通过 alloc_super() 函数创建和初始化的。挂载后,文件系统调用此函数,从磁盘中读取其超级块,并填充其超级块对象

Superblock操作集

Superblock对象里面最重要的就是s_op了,指向一个superblock 操作表struct super_operations,定义如下

#include <linux/fs.h>

struct super_operations {
	struct inode *(*alloc_inode)(struct super_block *sb);
	void (*destroy_inode)(struct inode *);
	void (*dirty_inode) (struct inode *);
	int (*write_inode) (struct inode *, int);
	void (*drop_inode) (struct inode *);
	void (*delete_inode) (struct inode *);
	void (*put_super) (struct super_block *);
	void (*write_super) (struct super_block *);
	int (*sync_fs)(struct super_block *sb, int wait);
	int (*freeze_fs) (struct super_block *);
	int (*unfreeze_fs) (struct super_block *);
	int (*statfs) (struct dentry *, struct kstatfs *);
	int (*remount_fs) (struct super_block *, int *, char *);
	void (*clear_inode) (struct inode *);
	void (*umount_begin) (struct super_block *);
	int (*show_options)(struct seq_file *, struct vfsmount *);
	int (*show_stats)(struct seq_file *, struct vfsmount *);
	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
};

这个结构中的每一项都是一个指针,指向一个对超级块对象进行操作的函数。下面看一些常用函数
image
image
所有这些函数都由 VFS 在进程上下文中调用。如果需要,除了dirty_inode() 之外的所有函数都可以阻塞。上面一些函数可以位NULL,如果关联的指针为 NULL,则 VFS 要么调用通用函数,要么什么都不做,具体取决于操作。

Inode Object

inode 对象表示内核操作文件或目录所需的所有信息。

For Unix-style filesystems, this information is simply read from the on-disk inode. If a filesystem does not have inodes, however, the filesystem must obtain the information from wherever it is stored on the disk. Filesystems without inodes generally store file-specific information as part of the file

内核里面inode由struct inode表示

struct inode {
	struct hlist_node 				i_hash; 				/* hash list */
	struct list_head 				i_list; 				/* list of inodes */
	struct list_head 				i_sb_list; 				/* list of superblocks */
	struct list_head 				i_dentry; 				/* list of dentries */
	unsigned long 					i_ino; 					/* inode number */
	atomic_t 						i_count; 				/* reference counter */
	unsigned int 					i_nlink; 				/* number of hard links */
	uid_t 							i_uid; 					/* user id of owner */
	gid_t 							i_gid; 					/* group id of owner */
	kdev_t 							i_rdev;					 /* real device node */
	u64 							i_version; 				/* versioning number */
	loff_t 							i_size; 				/* file size in bytes */
	seqcount_t 						i_size_seqcount; 		/* serializer for i_size */
	struct timespec 				i_atime; 				/* last access time */
	struct timespec 				i_mtime; 				/* last modify time */
	struct timespec 				i_ctime; 				/* last change time */
	unsigned int 					i_blkbits; 				/* block size in bits */
	blkcnt_t 						i_blocks; 				/* file size in blocks */
	unsigned short 					i_bytes; 				/* bytes consumed */
	umode_t 						i_mode;					/* access permissions */
	spinlock_t 						i_lock; 				/* spinlock */
	struct rw_semaphore 			i_alloc_sem; 			/* nests inside of i_sem */
	struct semaphore 				i_sem; 					/* inode semaphore */
	struct inode_operations 		*i_op; 					/* inode ops table */
	struct file_operations 			*i_fop;					 /* default inode ops */
	struct super_block 				*i_sb; 					/* associated superblock */
	struct file_lock 				*i_flock; 				/* file lock list */
	struct address_space 			*i_mapping; 			/* associated mapping */
	struct address_space 			i_data; 				/* mapping for device */
	struct dquot 					*i_dquot[MAXQUOTAS]; 	/* disk quotas for inode */
	struct list_head 				i_devices; 				/* list of block devices */
	union 				{
		struct pipe_inode_info 		*i_pipe; 				/* pipe information */
		struct block_device 		*i_bdev; 				/* block device driver */
		struct cdev 				*i_cdev; 				/* character device driver */
	};// 给定的inode同一时刻只能表示三者之一,所有用union
	unsigned long 					i_dnotify_mask; 		/* directory notify mask */
	struct dnotify_struct 			*i_dnotify; 			/* dnotify */
	struct list_head 				inotify_watches; 		/* inotify watches */
	struct mutex 					inotify_mutex; 			/* protects inotify_watches */
	unsigned long 					i_state; 				/* state flags */
	unsigned long 					dirtied_when; 			/* first dirtying time */
	unsigned int 					i_flags; 				/* filesystem flags */
	atomic_t 						i_writecount; 			/* count of writers */
	void 							*i_security; 			/* security module */
	void 							*i_private; 			/* fs private pointer */
};

inode 代表文件系统上的每个文件,但 inode 对象仅在访问文件时在内存中构造.

Inode操作集

inode函数操作集描述了 VFS 可以在 inode 上调用的文件系统实现的功能。

struct inode_operations {
	int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
	struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *);
	int (*link) (struct dentry *,struct inode *,struct dentry *);
	int (*unlink) (struct inode *,struct dentry *);
	int (*symlink) (struct inode *,struct dentry *,const char *);
	int (*mkdir) (struct inode *,struct dentry *,int);
	int (*rmdir) (struct inode *,struct dentry *);
	int (*mknod) (struct inode *,struct dentry *,int,dev_t);
	int (*rename) (struct inode *, struct dentry *,
					struct inode *, struct dentry *);
	int (*readlink) (struct dentry *, char __user *,int);
	void * (*follow_link) (struct dentry *, struct nameidata *);
	void (*put_link) (struct dentry *, struct nameidata *, void *);
	void (*truncate) (struct inode *);
	int (*permission) (struct inode *, int);
	int (*setattr) (struct dentry *, struct iattr *);
	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
	int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
	ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
	ssize_t (*listxattr) (struct dentry *, char *, size_t);
	int (*removexattr) (struct dentry *, const char *);
	void (*truncate_range)(struct inode *, loff_t, loff_t);
	long (*fallocate)(struct inode *inode, int mode, loff_t offset,
						loff_t len);
	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
					u64 len);
};
// TODO
### Dentry Object
VFS 经常需要执行特定于目录的操作例如路径名查找为了实现路径查找VFS实现了directory entry(dentry)。dentry 是路径中的特定组件解析路径是非常耗时的操作,`dentry`对象使得这个过程容易很多> /bin/vi/ /, bin, and vi are all dentry objects.The first two are directories and the last is a regular file. Dentries might also include mount points. In the path /mnt/cdrom/foo, the components /, mnt, cdrom, and foo are all dentry objects.

dentry对象由`struct dentry`表示

```C
struct dentry {
	atomic_t d_count; 			/* usage count */
	unsigned int d_flags; 		/* dentry flags */
	spinlock_t d_lock; 			/* per-dentry lock */
	int d_mounted; 				/* is this a mount point? */
	struct inode *d_inode; 		/* associated inode */
	struct hlist_node d_hash; 	/* list of hash table entries */
	struct dentry *d_parent; 	/* dentry object of parent */
	struct qstr d_name; 		/* dentry name */
	struct list_head d_lru; 	/* unused list */
	union {
		struct list_head d_child; 	/* list of dentries within */
		struct rcu_head d_rcu; 		/* RCU locking */
	} d_u;
	struct list_head d_subdirs; /* subdirectories */
	struct list_head d_alias; /* list of alias inodes */
	unsigned long d_time; 			/* revalidate time */
	struct dentry_operations *d_op; /* dentry operations table */
	struct super_block *d_sb; 		/* superblock of file */
	void *d_fsdata; 				/* filesystem-specific data */
	unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* short name */
};

与前两个对象不同,dentry 对象不对应任何类型的磁盘数据结构。

Dentry State

一个有效的dentry object可以是三种状态之一:used, unused, or negative.

A used dentry corresponds to a valid inode (d_inode points to an associated inode) and indicates that there are one or more users of the object (d_count is positive).A used dentry is in use by the VFS and points to valid data and, thus, cannot be discarded.
An unused dentry corresponds to a valid inode (d_inode points to an inode), but the VFS is not currently using the dentry object (d_count is zero). Because the dentry object still points to a valid object, the dentry is kept around—cached—in case it is needed again.
A negative dentry is not associated with a valid inode (d_inode is NULL) because either the inode was deleted or the path name was never correct to begin with

在内存紧缺的情况下,unusednegative的内存都可以被回收

Dentry Cache

挨个解析路径名称里面的dentry object是非常耗时的,因此内核将dentry objects缓存起来,称为dcache
dentry cache包含3个部分:

  • Lists of “used” dentries linked off their associated inode via the i_dentry field of the inode object.
  • A doubly linked “least recently used” list of unused and negative dentry objects.
  • A hash table and hashing function used to quickly resolve a given path into the associated dentry object

hash table由dentry_hashtable表示,hash值由d_hash()决定,通过d_lookup()查找hash表,如果在dcache中找到就返回,否则返回NULL。
举例

As an example, assume that you are editing a source file in your home directory, /home/dracula/src/the_sun_sucks.c. Each time this file is accessed (for example, when you first open it, later save it, compile it, and so on), the VFS must follow each directory entry to resolve the full path: /, home, dracula, src, and finally the_sun_sucks.c.To avoid this time-consuming operation each time this path name is accessed, the VFS can first try to look up the path name in the dentry cache. If the lookup succeeds, the required final dentry object is obtained without serious effort. Conversely, if the dentry is not in the dentry cache, the VFS must manually resolve the path by walking the filesystem for each component of the path.After this task is completed, the kernel adds the dentry objects to the dcache to speed up any future lookups.

dcache 还为 inode 缓存提供前端,即 icache。只要 dentry 被缓存,相应的 inode 也会被缓存。

Dentry操作集

Dentry操作集由dentry_operations表示。

#include <linux/dcache.h>

struct dentry_operations {
	int (*d_revalidate) (struct dentry *, struct nameidata *);
	int (*d_hash) (struct dentry *, struct qstr *);
	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
	int (*d_delete) (struct dentry *);
	void (*d_release) (struct dentry *);
	void (*d_iput) (struct dentry *, struct inode *);
	char *(*d_dname) (struct dentry *, char *, int);
};

TODO:函数介绍

File对象

我们要查看的最后一个主要 VFS 对象是文件对象,文件对象用来表示进程打开的文件。对象由open()创建,由close()销毁。因为同一个文件可能被多个进程打开,因此可能有多个file对象对应这个文件。

The object points back to the dentry (which in turn points back to the inode) that actually represents the open file.The inode and dentry objects, of course, are unique.

可以通过dentry和inode找到file对应的真实文件。内核中file由struct file表示

#include <linux/fs.h>
struct file {
	union {
		struct list_head fu_list; 	/* list of file objects */
		struct rcu_head fu_rcuhead; /* RCU list after freeing */
	} f_u;
	struct path f_path; 			/* contains the dentry */
	struct file_operations *f_op; 	/* file operations table */
	spinlock_t f_lock; 				/* per-file struct lock */
	atomic_t f_count; 				/* file object’s usage count */
	unsigned int f_flags; 			/* flags specified on open */
	mode_t f_mode; 					/* file access mode */
	loff_t f_pos; 					/* file offset (file pointer) */
	struct fown_struct f_owner; 	/* owner data for signals */
	const struct cred *f_cred; 		/* file credentials */
	struct file_ra_state f_ra; 		/* read-ahead state */
	u64 f_version; 					/* version number */
	void *f_security; 				/* security module */
	void *private_data; 			/* tty driver hook */
	struct list_head f_ep_links; 	/* list of epoll links */
	spinlock_t f_ep_lock; 			/* epoll lock */
	struct address_space *f_mapping; /* page cache mapping */
	unsigned long f_mnt_write_state; /* debugging state */
};

与dentry对象类似,file对象在物理磁盘上没有真实对应数据。

The file object does point to its associated dentry object via the f_dentry pointer.The dentry in turn points to the associated inode, which reflects whether the file itself is dirty.

File操作集

file对象的操作集由file_operations表示

#include <linux/fs.h>
struct file_operations {
	struct module *owner;
	loff_t (*llseek) (struct file *, loff_t, int);
	ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
	ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
	ssize_t (*aio_read) (struct kiocb *, const struct iovec *,
							unsigned long, loff_t);
	ssize_t (*aio_write) (struct kiocb *, const struct iovec *,
							unsigned long, loff_t);
	int (*readdir) (struct file *, void *, filldir_t);
	unsigned int (*poll) (struct file *, struct poll_table_struct *);
	int (*ioctl) (struct inode *, struct file *, unsigned int,
					unsigned long);
	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *, fl_owner_t id);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*aio_fsync) (struct kiocb *, int datasync);
	int (*fasync) (int, struct file *, int);
	int (*lock) (struct file *, int, struct file_lock *);
	ssize_t (*sendpage) (struct file *, struct page *,
	int, size_t, loff_t *, int);
	unsigned long (*get_unmapped_area) (struct file *,
					unsigned long,
					unsigned long,
					unsigned long,
					unsigned long);
	int (*check_flags) (int);
	int (*flock) (struct file *, int, struct file_lock *);
	ssize_t (*splice_write) (struct pipe_inode_info *,
			struct file *,
			loff_t *,
			size_t,
			unsigned int);
	ssize_t (*splice_read) (struct file *,
				loff_t *,
				struct pipe_inode_info *,
				size_t,
				unsigned int);
	int (*setlease) (struct file *, long, struct file_lock **);
};

//TODO 函数介绍

与文件系统相关的数据结构

除了基本的 VFS 对象外,内核还使用其他标准数据结构来管理与文件系统相关的数据。第一个对象用于描述文件系统的特定变体,例如 ext3、ext4 或 UDF。第二个数据结构描述文件系统的挂载实例。内核中由file_system_type表示文件系统。

#include <linux/fs.h>
struct file_system_type {
	const char *name; 		/* filesystem’s name */
	int fs_flags; 			/* filesystem type flags */
	/* the following is used to read the superblock off the disk */
	struct super_block *(*get_sb) (struct file_system_type *, int,
	char *, void *);
	
	/* the following is used to terminate access to the superblock */
	void (*kill_sb) (struct super_block *);
	
	struct module *owner; 			/* module owning the filesystem */
	struct file_system_type *next; 	/* next file_system_type in list */
	struct list_head fs_supers; 	/* list of superblock objects */
	
	/* the remaining fields are used for runtime lock validation */
	struct lock_class_key s_lock_key;
	struct lock_class_key s_umount_key;
	struct lock_class_key i_lock_key;
	struct lock_class_key i_mutex_key;
	struct lock_class_key i_mutex_dir_key;
	struct lock_class_key i_alloc_sem_key;
};

get_sb() 函数从磁盘读取超级块并在加载文件系统时填充超级块对象。无论当前系统中这个文件有多少挂载/未挂载的文件系统实例,这个文件系统都只有一个file_system_type。挂载点由vfsmount_structure表示

#include <linux/mount.h>
struct vfsmount {
	struct list_head mnt_hash; 		/* hash table list */
	struct vfsmount *mnt_parent; 	/* parent filesystem */
	struct dentry *mnt_mountpoint; 	/* dentry of this mount point */
	struct dentry *mnt_root; 		/* dentry of root of this fs */
	struct super_block *mnt_sb; 	/* superblock of this filesystem */
	struct list_head mnt_mounts; 	/* list of children */
	struct list_head mnt_child; 	/* list of children */
	int mnt_flags; 					/* mount flags */
	char *mnt_devname; 				/* device file name */
	struct list_head mnt_list; 		/* list of descriptors */
	struct list_head mnt_expire; 	/* entry in expiry list */
	struct list_head mnt_share; 	/* entry in shared mounts list */
	struct list_head mnt_slave_list; /* list of slave mounts */
	struct list_head mnt_slave; 	/* entry in slave list */
	struct vfsmount *mnt_master; 	/* slave’s master */
	struct mnt_namespace *mnt_namespace; /* associated namespace */
	int mnt_id; 					/* mount identifier */
	int mnt_group_id; 				/* peer group identifier */
	atomic_t mnt_count; 			/* usage count */
	int mnt_expiry_mark; 			/* is marked for expiration */
	int mnt_pinned; 				/* pinned count */
	int mnt_ghosts; 				/* ghosts count */
	atomic_t __mnt_writers; 		/* writers count */
};

维护所有挂载点列表的复杂部分是文件系统和所有其他挂载点之间的关系,vfsmount 中的各种链表会跟踪这些信息。vfsmount 结构还存储在 mnt_flags 字段中挂载时指定的标志(如果有)。定义在<linux/mount.h>
image

和进程有关的数据结构

系统上的每个进程都有自己的打开文件列表、根文件系统、当前工作目录、挂载点等。三个数据结构将 VFS 层和系统上的进程联系在一起:files_structfs_structnamespace
进程打开的文件和文件描述符都在file_struct中,由进程的file域表示

struct files_struct {
	atomic_t count; /* usage count */
	struct fdtable *fdt; /* pointer to other fd table */
	struct fdtable fdtab; /* base fd table */
	spinlock_t file_lock; /* per-file lock */
	int next_fd; /* cache of next available fd */
	struct embedded_fd_set close_on_exec_init; /* list of close-on-exec fds */
	struct embedded_fd_set open_fds_init /* list of open fds */
	struct file *fd_array[NR_OPEN_DEFAULT]; /* base files array */
};

数组 fd_array 指向打开的文件对象列表。

Because NR_OPEN_DEFAULT
is equal to BITS_PER_LONG, which is 64 on a 64-bit architecture; this includes room for 64 file objects. If a process opens more than 64 file objects, the kernel allocates a new array and points the fdt pointer at it.

另外一个进程相关的是fs_struct结构,它包含了进程文件系统相关信息,结构定义在

<linux/fs_struct.h>
struct fs_struct {
	int users; 		/* user count */
	rwlock_t lock; 	/* per-structure lock */
	int umask; 		/* umask */
	int in_exec; 	/* currently executing a file */
	struct path root; /* root directory */
	struct path pwd; /* current working directory */
};

root根目录,pwd当前目录。
最后一个是namespace结构,定义在<linux/mnt_namespace.h>,由进程的mnt_namespace域指向。这使每个进程都能对系统上已挂载的文件系统有一个独特的视角。

struct mnt_namespace {
	atomic_t count; 		/* usage count */
	struct vfsmount *root; 	/* root directory */
	struct list_head list; 	/* list of mount points */
	wait_queue_head_t poll; /* polling waitqueue */
	int event; 				/* event count */
};

list成员指定构成命名空间的已挂载文件系统的双向链表

For most processes, the process descriptor points to unique files_struct and fs_struct structures. For processes created with the clone flag CLONE_FILES or CLONE_FS, however, these structures are shared.3 Consequently, multiple process descriptors might point to the same files_struct or fs_struct structure.The count member of each structure provides a reference count to prevent destruction while a process is still using the structure

The namespace structure works the other way around. By default, all processes share the same namespace. (That is, they all see the same filesystem hierarchy from the same mount table.) Only when the CLONE_NEWNS flag is specified during clone() is the process given a unique copy of the namespace structure. Because most processes do not provide this flag, all the processes inherit their parents’ namespaces. Consequently, on many systems there is only one namespace, although the functionality is but a single CLONE_NEWNS flag away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant