《MIT 6.828 Homework 2: Shell》解题报告

Homework 2的网站链接：MIT 6.828 Homework 2: shell

题目

下载sh.c文件，在文件中添加相应代码，以支持以下关于shell的功能：
- 实现简单shell命令，比如cat/echo/grep/ls/sort/uniq/wc等
- 实现I/O重定向
- 实现管道
Optional challenge exercises:
- Implement lists of commands, separated by ";"
- Implement sub shells by implementing "(" and ")"
- Implement running commands in the background by supporting "&" and "wait"

解答

我关于Homework 2的实现代码见whl1729/mit_6.828。简单说下我的完成情况。

已完成

已实现cat/echo/grep/ls/rm/sort/uniq/wc等命令的基本功能。所谓基本功能是指实现了对应命令的默认选项，而还不支持更多选项。比如，真实shell环境中的sort支持-n（按数值排序）和-r（逆序）、grep支持正则表达式等，这些我都还不支持。
已实现I/O重定向
已实现管道

未完成

尚未支持简单命令的更多选项。
还没做challenge的功能。

已发现并解决的bug

Bug 1：执行`ls | sort | uniq`命令时始终只打印一行信息

详细现象：执行各个普通命令及大部分多级管道命令时功能都正常，唯独当执行sort命令后接其他命令时有异常。比如当前目录下有5个文件，执行ls | sort是正常的，可以显示排序后的5行内容，但再后接uniq或grep等命令时，始终只打印一行信息。
定位过程：
- 一开始怀疑是多级管道的实现有问题，比如某个文件描述符忘记close、两个进程之间读写时序没同步等，将各种可能的情况尝试后问题仍存在。
- 由于只有sort命令后接其他命令时存在问题，接下来怀疑是sort的实现有问题。首先怀疑排序算法有bug，但注释后问题仍然存在。
- 将sort函数内的大部分代码注释掉，只保留读输入和写输出的几行代码，并参考其他简单命令的实现，最终发现是write(fileno(stdout), plines[pos], LINE_NUM));这一行出错，修改为write(fileno(stdout), plines[pos], strlen(plines[pos]);就可以了。
问题根因：write函数原型是ssize_t write(int fd, const void *buf, size_t count);，我以为如果count > strlen(buf)，最多只会写strlen(buf)个字符，然而事实上并非如此！如果不出现信号中断、存储空间不够或count大于系统限制的字节数RLIMIT_FSIZE等情况（一般不会出现），write函数会写够count个字符！因此，我的sort函数每次调用write时，除了写文件名，还写了大量的null字符，于是下一个进程在第一次read的时候，就读到了文件结束符，导致永远只能打印一行。
反思：这个问题定位了3天，累计约10个小时。定位效率太低了！原因是自己没搞懂原理，盲目乱试，最后还是靠不断注释代码来找到原因的。教训是定位问题前先把基本原理搞清楚，包括系统知识、函数用法等；定位过程中要合情推理，有依有据地排查。总之，还需要多总结反思、提高定位效率。

Bug 2：执行`cat < y | sort`命令时提示"unknown command"

定位过程：由于我之前测试管道命令时没试过同时使用重定向，我很快就怀疑到自己的代码可能不支持解析“管道+重定向”的情形，在梳理了一遍命令解析的过程后（见下文“函数调用关系”中parsecmd的笔记）发现已经支持。然后在检查管道的实现代码时，发现自己在执行每条命令时默认其为普通命令，没考虑到重定向的场景。修改后问题解决。

    if (cmds[pos]->type == ' ')

    {

        runecmd(cmds[pos]);

    }

    else

    {

        runrcmd((struct redircmd *)cmds[pos]);

    }

Bug 3: 通过`./a.out < p.sh`执行'ls > y'命令时y文件无内容

详细现象：当我先运行./a.out，然后再输入ls > y时，y文件内容正常；而当我先新建p.sh文件、在里面输入ls > y并报错，然后执行./a.out < p.sh，发现y文件为空。
定位过程：出现问题后，怀疑是ls的实现有问题。ls的实现是参考stackflow的问题How do you get a directory listing in C?来写的，主要代码如下。我很快就怀疑到可能是puts函数没输出到正确的地方，于是改用write，结果问题就解决了。然而，根据puts的Manual Page中的信息“puts() writes the string s and a trailing newline to stdout.”，puts也是将字符串输出到stdout，使用puts和write有啥区别？我还不知道根因是什么？

// the code in stackflow:

    while (ep = readdir(dp))

        puts(ep->d_name);

// the code after my modification:

    while (ep = readdir(dp))

    {

        len = strlen(ep->d_name);

        ep->d_name[len++] = '\n';

        ep->d_name[len] = 0;

        if (write(fileno(stdout), ep->d_name, len) < 0)

        {

            fprintf(stderr, "failed to write %s when ls\r\n", ep->d_name);

        }

    }

尚未解决的bug

Bug 4: 先执行管道命令再执行`rm y`时会重复删除y导致报错

详细现象：通过'./a.out < t.sh'来执行t.sh脚本中的多条shell命令时，如果脚本中的rm y命令前面没有任何管道命令，则能正常删除文件y并无报错；如果rm y前面含有管道命令，比如ls | uniq，也会删除掉文件y，但是会报错，remove函数返回错误码-1.如果先运行./a.out，再依次输入ls | grep和rm y这两条命令，则不会报错。
定位过程：
- 将errno的值及其含义打印出来：err=2(No such file or directory)，居然是文件不存在。但是我在执行前明明已经创建了该文件。
- 在函数入口增加打印，发现remove函数居然被调用了2次。
- 怀疑是部分文件描述符忘记close，在runpcmd函数退出前增加遍历close所有文件符，但问题依然存在。
- 进一步发现执行完管道命令后，后面的所有命令都会被执行2次，而且是执行完一遍所有后面的命令后再重新执行第2遍。问题的关键是第2遍执行的命令是从哪里读取的？究竟是怎样导致重复的？折腾了两天还是定位不出来，暂时放弃了，等以后有空再研究。

sh.c源码剖析

函数调用关系

peek: 跳过输入字符串*ps中开头的空白字符，然后读取一个非空字符，判断它是否在指定字符串toks中出现，若出现则返回非零值，否则返回0
gettoken: 读取一个单词或"<|>"这几个字符，将其起始和结束位置分别保存在q和eq中，读取后ps偏移到下一个非空字符串的起始处。如果读取结果为"<|>"或结束符，则返回对应字符的值，否则返回字符'a'的值（代表解析结果是普通字符串）。
parsecmd

parsecmd

  |- parseline 解析本行的命令

       |- parsepipe

            |- parseexec 解析普通命令或重定向命令

                 |- execcmd 申请struct execcmd所需内存并初始化

                 |- parseedirs 解析重定向命令并返回解析结果

                      |- peek 判断下一个非空字符是否为"<"或">"，若是，则继续执行下面的操作，否则直接返回

                      |- gettoken 获取字符"<"或">"

                      |- gettoken 获取文件名

                      |- redircmd 将命令名、重定向符和文件名组成redircmd结构体并返回

                 |- peek 若接下来的非空字符为"|"，则结束解析；否则继续解析

                 |- gettoken 读取一个单词（即命令名或输入选项），并保存在cmd中

                 |- parseredir 判断下一个非空字符是否为"<"或">"，若是则解析重定向命令，否则继续解析普通命令

            |- peek 判断下一个字符是否为"|"，若是则继续下面的操作，否则直接返回parseexec的解析结果

            |- gettoken 跳过字符"|"

            |- pipecmd 将当前已解析的命令cmd和下一个命令组成pcmd，注意此处递归调用parsepipe，最终得到一棵右倾斜树

  |- peek 解析完成后，判断本行后面是否还有多余字符，若有则报错

关于代码的疑问

Q：没理解peek函数返回时为什么还要&&一下？

A：理解了。peek函数检查下一个非空白字符是否在目标字符串中，因此首先需要跳过接下来的空白字符后。跳过之后，判断下一个字符是否为字符串结束符0，若是则返回0，否则看下一个字符是否在toks出现，若是则返回非零值，若否则返回0.综上，相与的目的是为了让遇到字符串结束符和字符没在toks出现时都返回0.如果把字符串结束符作为第二个输入参数传给strchr，strchr会返回非零值，因为任何一个字符串都含有字符串结束符。

int peek(char **ps, char *es, char *toks)

{

  char *s;

  s = *ps;

  while(s < es && strchr(whitespace, *s))

    s++;

  *ps = s;

  return *s && strchr(toks, *s);

}

Q: 使用管道时为什么要fork一个子进程？直接在原进程操作不可以吗？
Q: 以下代码为什么一定要close p[1]？

pipe(p);

if(fork() == 0) {

    close(0);

    dup(p[0]);

    close(p[0]);

    close(p[1]);

    exec("/bin/wc", argv);

} else {

    close(p[0]);

    write(p[1], "hello world\n", 12);

    close(p[1]);

}

读代码时的笔记

isatty: test whether a file descriptor refers to a terminal. tty stands for TeleTYpewriter.
- HEADER FILE: #include <unistd.h>
- SYNOPSIS: int isatty(int fd);
- DESCRIPTION: The isatty() function tests whether fd is an open file descriptor referring to a terminal.
- RETURN VALUE: isatty() returns 1 if fd is an open file descriptor referring to a terminal; otherwise 0 is returned, and errno is set to indicate the error.
- ERRORS: EBADF -- fd is not a valid file descriptor. EINVAL -- fd refers to a file other than a terminal. POSIX.1 specifies the error ENOTTY for this case.
The function fileno() examines the argument stream and returns its integer file descriptor.
stdin: Originally I/O happened via a physically connected system console (input via keyboard, output via monitor), but standard streams abstract this. When a command is executed via an interactive shell, the streams are typically connected to the text terminal on which the shell is running, but can be changed with redirection or a pipeline. More generally, a child process will inherit the standard streams of its parent process.
char * strchr (char * str, int character);: Returns a pointer to the first occurrence of character in the C string str. If the character is not found, the function returns a null pointer.

写代码时的笔记

open
- man open
- header file: sys/types.h, sys/stat.h, fcntl.h
- synopsis: int open(const char *pathname, int flags);
- flags: The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR.
- return: the return value of open() is a file descriptor. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
read
- man read
- header file: unistd.h
- synopsis: ssize_t read(int fd, void *buf, size_t count);
- return: On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. On error, -1 is returned, and errno is set appropriately.
write
- man write
- header file: unistd.h
- synopsis: ssize_t write(int fd, const void *buf, size_t count);
- return: On success, the number of bytes written is returned (zero indicates nothing was written). On error, -1 is returned, and errno is set appropriately.
sort: Why does the UNIX sort utility ignore leading spaces without the option -b?
How do you get a directory listing in C?
char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.

调试时的笔记

xv6书中说“every process has a private space of file descriptors starting at zero”，那么不同进程的文件描述符对应的文件对象是相互独立的吗？比如进程a的文件描述符1对应stdout，进程b的文件描述符1也对应stdout，这两个stdout是不同的对象还是相同的对象？它们会相互影响吗？如果进程a和b分别写stdout时会有问题吗？

答：写了个测试代码如下所示，最终父进程和子进程打印的fd均为3，可见这两个fd对应的对象是不同文件。

    int fd = -1;

    if (fork() == 0)

    {

        fd = open("child.txt", O_RDWR | O_CREAT);

        printf("child fd=%d.\r\n", fd);

    }

    else

    {

        fd = open("parent.txt", O_RDWR | O_CREAT);

        printf("parent fd=%d.\r\n", fd);

    }

在xv6书中则有如下代码，用来解释“Although fork copies the file descriptor table, each underlying file offset is shared between parent and child.”。最终打印结果是“hello world”。这说明父进程和子进程的stdout都是对应同一个终端。

    if(fork() == 0) {

        write(1, "hello ", 6);

        exit();

    } else {

        wait();

        write(1, "world\n", 6);

    }

《MIT 6.828 Homework 2: Shell》解题报告的更多相关文章

《MIT 6.828 Homework 1: boot xv6》解题报告
本作业的网站链接:MIT 6.828 Homework 1: boot xv6 问题 Exercise: What is on the stack? While stopped at the abov ...
《MIT 6.828 Lab1: Booting a PC》实验报告
<MIT 6.828 Lab1: Booting a PC>实验报告本实验的网站链接见:Lab 1: Booting a PC. 实验内容熟悉x86汇编语言.QEMU x86仿真器.P ...
《MIT 6.828 Lab 1 Exercise 12》实验报告
本实验的网站链接:MIT 6.828 Lab 1 Exercise 12. 题目 Exercise 12. Modify your stack backtrace function to displa ...
《MIT 6.828 Lab 1 Exercise 11》实验报告
本实验的网站链接:MIT 6.828 Lab 1 Exercise 11. 题目 The above exercise should give you the information you need ...
《MIT 6.828 Lab 1 Exercise 10》实验报告
本实验的网站链接:MIT 6.828 Lab 1 Exercise 10. 题目 Exercise 10. To become familiar with the C calling conventi ...
《MIT 6.828 Lab 1 Exercise 8》实验报告
本实验的网站链接:MIT 6.828 Lab 1 Exercise 8. 题目 Exercise 8. Read through kern/printf.c, lib/printfmt.c, and ...
《MIT 6.828 Lab 1 Exercise 7》实验报告
本实验链接:mit 6.828 lab1 Exercise 7. 题目 Exercise 7. Use QEMU and GDB to trace into the JOS kernel and st ...
《MIT 6.828 Lab 1 Exercise 4》实验报告
本实验链接:mit 6.828 lab1 Exercise 4. 题目 Exercise 4. Read about programming with pointers in C. The best ...
《MIT 6.828 Lab 1 Exercise 3》实验报告
本实验的网站链接:mit 6.828 lab1 Exercise 3. 题目 Exercise 3. Take a look at the lab tools guide, especially th ...

随机推荐

Educational Codeforces Round 74 (Rated for Div. 2)【A,B,C【贪心】,D【正难则反的思想】】
A. Prime Subtractiontime limit per test2 secondsmemory limit per test256 megabytesinputstandard inpu ...
路由器配置——PAP与CHAP认证
一.实验目的:掌握PAP与CHAP认证配置二.拓扑图: 三.具体步骤配置: (1)R1路由器配置: Router>enable --进入特权模式 Router#configure termi ...
Vue(一)
什么是Vue? 用于构建用户界面的渐进式框架
pymysql基本使用
导入模块 import pymysql 连接(看一下password有没有,没有的话会怎么样,有很多参数,有些参数也是可以不传的) conn = pymysql.connect(user = &quo ...
PostMan的详细介绍
无论是接口调试还是接口测试,postman都算的上很优秀的工具,好多接口测试平台.接口测试工具框架的设计也都能看到postman的影子,我们真正了解了这款工具,才可以在这个基础上进行自己的设计和改造. ...
MySQL数据分析—（4）关系数据库的三个逻辑框架
(一)前面课程逻辑梳理任何一门数据的软件也好,数据构架也好,或者说是数据学科也好,最终都是要解决实际问题的,大家说是不是? 前面jacky讲为什么要引入数据库的时候,举了一个案例,大家还记的吗?大家 ...
ubuntu 14.04 安装ntp
安装 sudo apt-get install ntp 修改ntp.conf配置文件 sudo vi /etc/ntp.conf 修改为如下内容 # Enable this if you want s ...
Codeforces 1239D. Catowice City
传送门如果选择 $i$ 当陪审团成员,那么 $i$ 认识的猫一定不能参加又因为总人数和猫数要为 $n$ ,那么 $i$ 认识的猫的主人也一定要当陪审团成员(不然总数不够) 所以可以考虑这样构图, ...
Js 之移动端图片上传插件mbUploadify
一.下载 https://pan.baidu.com/s/1NEL4tkHoK4ydqdMi_hgWcw 提取码:vx7e 二.Demo示例 <div class="weui_uplo ...
powderdesinger显示中英文表名
菜单->Tool->Model Options->Name Convention->右侧display中选择显示name还是code.不支持同时显示,但可以选择显示code, ...