俞一凡 Yifan Yu 1600012998
I completed this lab with 1 challenge, reported at the end of this report.
Host Machine: MacBook Pro (Retina 13-inch Early 2015)
OS: macOS Mojave 10.14.1
CPU: 2.7 GHz Intel Core i5
Memory: 8 GB
Virtual Machine: Oracle VM VirtualBox 6.0
Memory: 1.5 GB
OS: Ubuntu 16.04 LTS (32bit x86)
Gcc: Gcc 5.4.0
Make: GNU Make 4.1
Gdb: GNU gdb 7.11.1
The main new component for this part of the lab is the file system environment, located in the new fs
directory. New source files for this lab:
fs/fs.c
: Code that mainipulates the file system's on-disk structure.fs/bc.c
: A simple block cache built on top of our user-level page fault handling facility.fs/ide.c
: Minimal PIO-based (non-interrupt-driven) IDE driver code.fs/serv.c
: The file system server that interacts with client environments using file system IPCs.lib/fd.c
: Code that implements the general UNIX-like file descriptor interface.lib/file.c
: The driver for on-disk file type, implemented as a file system IPC client.lib/console.c
: The driver for console input/output file type.lib/spawn.c
: Code skeleton of the spawn library call.
The main purpose of this lab is to implement a much simpler but powerful enough file system to provide the basic features: creating, reading, writing, and deleting files organized in a hierarchical directory structure.
Before starting, following the instruction on the Lab webpage and get clear of the following concepts of On-Disk File System Structure:
- Sectors and Blocks
- Superblocks
- File Meta-data
- Directories versus Regular Files
--
The file system functions needs to implement are: reading blocks into the block cache and flushing them back to disk; allocating disk blocks; mapping file offsets to disk blocks; and implementing read, write, and open in the IPC interface.
The file system environment in our operating system needs to be able to access the disk, so we first need to implement the disk access functionality.
It is easy to implement disk access in user space this way as long as we rely on polling, "programmed I/O" (PIO)-based disk access and do not use disk interrupts. The x86 processor uses the IOPL bits in the EFLAGS register to determine whether protected-mode code is allowed to perform special device I/O instructions such as the IN and OUT instructions.
Exercise 1.
i386_init
identifies the file system environment by passing the typeENV_TYPE_FS
to your environment creation function,env_create
. Modifyenv_create
inenv.c
, so that it gives the file system environment I/O privilege, but never gives that privilege to any other environment.Make sure you can start the file environment without causing a General Protection fault. You should pass the "fs i/o" test in
make grade
.
Only need to do this:
// LAB 5: Your code here.
if (type == ENV_TYPE_FS)
e->env_tf.tf_eflags |= FL_IOPL_3;
--
Question 1. Do you have to do anything else to ensure that this I/O privilege setting is saved and restored properly when you subsequently switch from one environment to another? Why?
No. EFLAGS
register will be saved and restored in struct Trapframe
during environment switching as done in earlier labs.
--
In our file system, a simple "buffer cache" (a block cache) will be implemented with the help of the processor's virtual memory system. The code for the block cache is in fs/bc.c
.
The file system will be limited to handling disks of size 3GB or less. A large, fixed 3GB region of the file system environment's address space is reserved, from 0x10000000 (DISKMAP
) up to 0xD0000000 (DISKMAP+DISKMAX
), as a "memory mapped" version of the disk. The diskaddr
function in fs/bc.c
implements this translation from disk block numbers to virtual addresses (along with some sanity checking).
Since it would take a long time to read the entire disk into memory, we'll implement a form of demand paging, wherein we only allocate pages in the disk map region and read the corresponding block from the disk in response to a page fault in this region.
After the block cache is initialized, simply store pointers into the disk map region in the super
global variable. After this point, we can simply read from the super
structure as if they were in memory and our page fault handler will read them from disk as necessary.
Exercise 2. Implement the
bc_pgfault
andflush_block
functions infs/bc.c
.bc_pgfault
is a page fault handler, just like the one your wrote in the previous lab for copy-on-write fork, except that its job is to load pages in from the disk in response to a page fault. When writing this, keep in mind that (1)addr
may not be aligned to a block boundary and (2)ide_read
operates in sectors, not blocks.The
flush_block
function should write a block out to disk if necessary.flush_block
shouldn't do anything if the block isn't even in the block cache (that is, the page isn't mapped) or if it's not dirty. We will use the VM hardware to keep track of whether a disk block has been modified since it was last read from or written to disk. To see whether a block needs writing, we can just look to see if thePTE_D
"dirty" bit is set in theuvpt
entry. (ThePTE_D
bit is set by the processor in response to a write to that page; see 5.2.4.3 in chapter 5 of the 386 reference manual.) After writing the block to disk,flush_block
should clear thePTE_D
bit usingsys_page_map
.Use
make grade
to test your code. Your code should pass "check_bc", "check_super", and "check_bitmap".
In pc_bgfault
:
// ...
// Allocate a page in the disk map region, read the contents
// of the block from the disk into that page.
// Hint: first round addr to page boundary. fs/ide.c has code to read
// the disk.
//
// LAB 5: you code here:
addr = (void *) ROUNDDOWN(addr, PGSIZE);
if ((r = sys_page_alloc(0, addr, PTE_W | PTE_U | PTE_P)) < 0)
panic("in bc_pgfault, sys_page_alloc: %e", r);
if ((r = ide_read(blockno * BLKSECTS, addr, BLKSECTS)) < 0)
panic("in bc_pgfault, ide_read: %e", r);
// ...
flush_block
:
void
flush_block(void *addr)
{
uint32_t blockno = ((uint32_t)addr - DISKMAP) / BLKSIZE;
int r;
if (addr < (void*)DISKMAP || addr >= (void*)(DISKMAP + DISKSIZE))
panic("flush_block of bad va %08x", addr);
// LAB 5: Your code here.
if (!va_is_mapped(addr) || !va_is_dirty(addr))
return;
addr = (void *) ROUNDDOWN(addr, PGSIZE);
if ((r = ide_write(blockno * BLKSECTS, addr, BLKSECTS)) < 0)
panic("in flush_block, ide_write: %e", r);
if ((r = sys_page_map(0, addr, 0, addr, uvpt[PGNUM(addr)] & PTE_SYSCALL)) < 0)
--
After fs_init
sets the bitmap
pointer, we can treat bitmap
as a packed array of bits, one for each block on the disk, indicating whether the block is free.
Exercise 3. Use
free_block
as a model to implementalloc_block
infs/fs.c
, which should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changedbitmap
block to disk withflush_block
, to help file system consistency.Use
make grade
to test your code. Your code should now pass "alloc_block".
Use free_block
as a model to write alloc_block
:
int
alloc_block(void)
{
// The bitmap consists of one or more blocks. A single bitmap block
// contains the in-use bits for BLKBITSIZE blocks. There are
// super->s_nblocks blocks in the disk altogether.
// LAB 5: Your code here.
uint32_t blockno = (super->s_nblocks - 1) / BLKBITSIZE + 3;
for (; blockno < super->s_nblocks; blockno++) {
if (bitmap[blockno/32] & (1<<(blockno%32))) {
bitmap[blockno/32] &= ~(1<<(blockno%32));
flush_block((void *)DISKMAP + (2 + blockno / BLKBITSIZE) * BLKSIZE);
return blockno;
}
}
return -E_NO_DISK;
--
Exercise 4. Implement
file_block_walk
andfile_get_block
.file_block_walk
maps from a block offset within a file to the pointer for that block in thestruct File
or the indirect block, very much like whatpgdir_walk
did for page tables.file_get_block
goes one step further and maps to the actual disk block, allocating a new one if necessary.Use make grade to test your code. Your code should pass "file_open", "file_get_block", and "file_flush/file_truncated/file_rewrite", and "testfile".
Very similar to the operations walking page directory and page tables, e.g. pgdir_walk
. See fs/fs.c
for code.
--
Since other environments can't directly call functions in the file system environment, we'll expose access to the file system environment via a remote procedure call, or RPC, abstraction, built atop JOS's IPC mechanism. Graphically, a call to the file system server (say, read) looks like
Regular env FS env
+---------------+ +---------------+
| read | | file_read |
| (lib/fd.c) | | (fs/fs.c) |
...|.......|.......|...|.......^.......|...............
| v | | | | RPC mechanism
| devfile_read | | serve_read |
| (lib/file.c) | | (fs/serv.c) |
| | | | ^ |
| v | | | |
| fsipc | | serve |
| (lib/file.c) | | (fs/serv.c) |
| | | | ^ |
| v | | | |
| ipc_send | | ipc_recv |
| | | | ^ |
+-------|-------+ +-------|-------+
| |
+-------------------+
Exercise 5. Implement serve_read in
fs/serv.c
.
serve_read
's heavy lifting will be done by the already-implementedfile_read
infs/fs.c
(which, in turn, is just a bunch of calls tofile_get_block
).serve_read
just has to provide the RPC interface for file reading. Look at the comments and code inserve_set_size
to get a general idea of how the server functions should be structured.Use
make grade
to test your code. Your code should pass "serve_open/file_stat/file_close" and "file_read" for a score of 70/150.
int
serve_read(envid_t envid, union Fsipc *ipc)
{
struct Fsreq_read *req = &ipc->read;
struct Fsret_read *ret = &ipc->readRet;
if (debug)
cprintf("serve_read %08x %08x %08x\n", envid, req->req_fileid, req->req_n);
// Lab 5: Your code here:
struct OpenFile *o;
int r;
if ((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
r = file_read(o->o_file, (void *)ret->ret_buf, req->req_n, o->o_fd->fd_offset);
if (r >= 0)
o->o_fd->fd_offset += r;
return r;
}
--
Exercise 6. Implement serve_write in
fs/serv.c
and devfile_write inlib/file.c
.
Use make grade
to test your code. Your code should pass "file_write", "file_read after file_write", "open", and "large file" for a score of 90/150.
int
serve_write(envid_t envid, struct Fsreq_write *req)
{
if (debug)
cprintf("serve_write %08x %08x %08x\n", envid, req->req_fileid, req->req_n);
// LAB 5: Your code here.
struct OpenFile *o;
int r;
if ((r = openfile_lookup(envid, req->req_fileid, &o)) < 0)
return r;
r = file_write(o->o_file, (void *)req->req_buf, req->req_n, o->o_fd->fd_offset);
if (r >= 0)
o->o_fd->fd_offset += r;
return r;
}
--
spawn
(see lib/spawn.c
) creates a new environment, loads a program image from the file system into it, and then starts the child environment running this program. The parent process then continues running independently of the child. The spawn
function effectively acts like a fork
in UNIX followed by an immediate exec
in the child process.
The reason that spawn
is implemented rather than a UNIX-style exec
is that spawn
is easier to implement from user space in "exokernel fashion", without special help from the kernel.
Exercise 7.
spawn
relies on the new syscallsys_env_set_trapframe
to initialize the state of the newly created environment. Implementsys_env_set_trapframe
inkern/syscall.c
(don't forget to dispatch the new system call insyscall()
).Test your code by running the
user/spawnhello
program fromkern/init.c
, which will attempt to spawn/hello
from the file system.Use
make grade
to test your code.
static int
sys_env_set_trapframe(envid_t envid, struct Trapframe *tf)
{
// LAB 5: Your code here.
// Remember to check whether the user has supplied us with a good
// address!
struct Env *env;
if (envid2env(envid, &env, 1) == -E_BAD_ENV)
return -E_BAD_ENV;
user_mem_assert(curenv, (const void *) tf, sizeof(struct Trapframe), PTE_U | PTE_W);
env->env_tf = *tf;
env->env_tf.tf_cs = GD_UT | 3;
env->env_tf.tf_eflags |= FL_IF;
env->env_tf.tf_eflags &= ~FL_IOPL_MASK;
return 0;
}
Remeber to dispatch this new system call in syscall()
.
--
The UNIX file descriptors are a general notion that also encompasses pipes, console I/O, etc. In JOS, each of these device types has a corresponding struct Dev
, with pointers to the functions that implement read/write/etc. for that device type. lib/fd.c
implements the general UNIX-like file descriptor interface on top of this. Each struct Fd
indicates its device type, and most of the functions in lib/fd.c
simply dispatch operations to functions in the appropriate struct Dev
.
lib/fd.c
also maintains the file descriptor table region in each application environment's address space, starting at FDTABLE
. This area reserves a page's worth (4KB) of address space for each of the up to MAXFD
(currently 32) file descriptors the application can have open at once. At any given time, a particular file descriptor table page is mapped if and only if the corresponding file descriptor is in use. Each file descriptor also has an optional "data page" in the region starting at FILEDATA
, which devices can use if they choose.
File descriptor state will be shared across fork and spawn. Right now, on fork, the memory will be marked copy-on-write, so the state will be duplicated rather than shared. On spawn, the memory will be left behind, not copied at all. We will change fork to know that certain regions of memory are used by the "library operating system" and should always be shared. This is what PTE_SHARE
bit in inc/lib.h
is for.
Exercise 8. Change
duppage
inlib/fork.c
to follow the new convention. If the page table entry has thePTE_SHARE
bit set, just copy the mapping directly. (You should usePTE_SYSCALL
, not0xfff
, to mask out the relevant bits from the page table entry.0xfff
picks up the accessed and dirty bits as well.)Likewise, implement
copy_shared_pages
inlib/spawn.c
. It should loop through all page table entries in the current process (just like fork did), copying any page mappings that have thePTE_SHARE
bit set into the child process.
In lib/fork.c
, just add this condition to duppage
:
if (pte & PTE_SHARE) {
if ((r = sys_page_map(0, va, envid, va, pte & PTE_SYSCALL)) < 0)
return r;
}
In lib/spawn.c
:
static int
copy_shared_pages(envid_t child)
{
// LAB 5: Your code here.
int r;
for (uintptr_t va = 0; va < UTOP; va += PGSIZE) {
if ((uvpd[PDX(va)] & PTE_P) && (uvpt[PGNUM(va)] & PTE_SHARE) &&
(uvpt[PGNUM(va)] & PTE_P) && (uvpt[PGNUM(va)] & PTE_U)) {
int perm = uvpt[PGNUM(va)] & PTE_SYSCALL;
if ((r = sys_page_map(0, (void *)va, child, (void *)va, perm)) < 0)
return r;
}
}
return 0;
}
Run relevant tests and make grade
, the results meet expectations.
For Lab 4's sfork
challenge, I redefined the thisenv
global symbol to be a macro, which always dereference the top of the user stack (USTACKTOP-4
) as a struct Env *
, and always put the address of the pointer to current Env
's at the top of its stack.
For this to work with Lab 5, I have to change the corresponding place in spawn()
, specifically these two places:
// -4 to work with lab4 sfork's thisenv setting
string_store = (char*) UTEMP + PGSIZE - string_size - 4;
and
// -4 to work with lab4 sfork's thisenv setting
assert(string_store == (char*)UTEMP + PGSIZE - 4);
--
For the shell to work, we need a way to type at it. In QEMU, input typed in the graphical window appear as input from the keyboard to JOS, while input typed to the console appear as characters on the serial port. kern/console.c
already contains the keyboard and serial drivers that have been used by the kernel monitor since lab 1, now attach these to the rest of the system.
Exercise 9. In your
kern/trap.c
, callkbd_intr
to handle trapIRQ_OFFSET+IRQ_KBD
andserial_intr
to handle trapIRQ_OFFSET+IRQ_SERIAL
.
In trap_dispatch()
:
// ...
// Handle keyboard and serial interrupts.
// LAB 5: Your code here.
if (tf->tf_trapno == IRQ_OFFSET + IRQ_KBD) {
kbd_intr();
return;
}
if (tf->tf_trapno == IRQ_OFFSET + IRQ_SERIAL) {
serial_intr();
return;
}
// ...
--
Exercise 10. The shell doesn't support I/O redirection. It would be nice to run
sh <script
instead of having to type in all the commands in the script by hand, as you did above. Add I/O redirection for<
touser/sh.c
.Test your implementation by typing
sh <script
into your shellRun make
run-testshell
to test your shell. testshell simply feeds the above commands (also found infs/testshell.sh
) into the shell and then checks that the output matchesfs/testshell.key
.
In user/sh.c
's runcmd()
, simply:
case '<': // Input redirection
// Grab the filename from the argument list
if (gettoken(0, &t) != 'w') {
cprintf("syntax error: < not followed by word\n");
exit();
}
// Open 't' for reading as file descriptor 0
// (which environments use as standard input).
// We can't open a file onto a particular descriptor,
// so open the file as 'fd',
// then check whether 'fd' is 0.
// If not, dup 'fd' onto file descriptor 0,
// then close the original 'fd'.
// LAB 5: Your code here.
if ((fd = open(t, O_RDONLY)) < 0) {
cprintf("open %s for read: %e", t, fd);
exit();
}
if (fd) {
dup(fd, 0);
close(fd);
}
break;
Challenge! Implement Unix-style
exec
.
Because exec
will cover the address space of the current process, we cannot implement exec
without the help of kernel. My exec
will read the ELF program file into memory, then kernel will take the rest work to finish ELF loading.
exec
is quite similar to spawn
, basically does the following steps (see lib/exec.c
for code):
- Open the ELF file with filename
prog
- Do sanity check on the ELF header
- Set up
void *nextpg = ROUNDUP((void *)end, PGSIZE)
end
points to the end of current program's.bss
segment, andnextpg
always points to next available virtual page
- alloc a page at
UTEMP
to store the mapping of segments and the stack (struct SegmentInfo
ininc/syscall.h
), which will be passed to the kernel - Set up
Trapframe
- Set up the stack, and store the stack page
- Set up segments in the ELF file
- Call
sys_env_load_elf
system call and pass theTrapframe
and the mapping to the kernel
sys_env_load_elf
sets the Trapframe
of the current environment and maps pages according to the mapping. Now loading is finished and the current process will start running the new program.
For testing, I provide the test program testexec
, which simply run call exec
and pass the arguments it received. Run make run-icode-nox
and type testexec echo hello world
, the result is the same as running echo hello world
.
Also, Lab 4's sfork
's conflict with spawn
also exists here with exec
, fix it as described above after Ex.8.