diff --git a/01.html b/01.html index c1dbf005..b29447fc 100644 --- a/01.html +++ b/01.html @@ -2,100 +2,749 @@ - GitHub rate limit reached - Grip - +
-
-
-

GitHub Rate Limit Reached

-

- The GitHub API rate limit - - has been reached for the hour. -

- -
- - -
-

What?

-

- GitHub imposes a limit of 60 requests/hour when using their API without authentication. -

- -

Why?

-

- This prevents people from anonymously abusing GitHub's system. -

-

- As for Grip, it's built to appear as close to GitHub as possible. Using - GitHub's API allows Grip to immediately and accurately reflect any updates - from GitHub, without the delay of busy maintainers or requiring you to upgrade. -

- -

Ok, fine. Where do I go from here?

-

- Until the offline renderer is complete, you can run Grip using
- the --user and --pass arguments to use basic auth,
- giving you 5,000 requests/hour. Run grip -h for details. -

-

- I do apologize for the inconvenience. If you need help, or have ideas on improving this - experience, please reach out joe@joeyespo.com -

+
+
+
+
+
+
+
+ + + +
+ +
+
+

+ 01.md +

+
+
+ +
+
+

Introduction

+
    +
  • All information on http://mff.devnull.cz/c-prog-lang +
      +
    • see the References section for recommended materials
    • +
    +
  • +
  • Make sure you DO SUBSCRIBE TO THE MAILING LIST (see the seminar http page +above).
  • +
  • Getting credits - see the seminar web page above.
  • +
  • What is C.
  • +
  • Popularity of C: +TIOBE_index on wikipedia, direct +link to the current TIOBE_index +
  • +
  • C89, C99, C11 (ie. 2011), C17 (only fixes issues found in C11) standards, +and upcoming C23. Due to time constraints, we will focus on C99.
  • +
  • Why it's worth learning C? +
      +
    • Helps with better understanding of computers
    • +
    • Lingua franca of programming
    • +
    • Lots of important code in C (Linux and Android kernel, Apple iOS +kernel +(Darwin, +also see PureDarwin), +major parts of Windows kernel, OpenSSH, OpenSSL, +Apache, NGINX, cURL, etc. etc.)
    • +
    • Based on the previous, C programmers will be needed virtually for ever
    • +
    • Fast, very portable + +
    • +
    • Battle proven
    • +
    • Great cost/benefit ratio wrt spent time learning the language
    • +
    • Still cool and fun
    • +
    +
  • +
  • Objectives of the seminar +
      +
    1. You should be able to write and understand non-trivial C code (we +focus on C99).
    2. +
    3. You should be able to recognise whether C is appropriate for solving +a specific problem.
    4. +
    5. You should understand why it may be so easy to get burned when +working in C. + +
    6. +
    +
  • +
  • We are here to help you understand concepts, see the big picture, and learn +new stuff. SO, IF YOU HAVE A QUESTION, ASK. +
      +
    • Ideally use the mailing list so that it's beneficial for others, too.
    • +
    +
  • +
  • Please do read the C style document and DO USE the C style. The link is on +the seminar page, or you can get it here: +https://devnull-cz.github.io/unix-linux-prog-in-c/cstyle.html +
  • +
  • Source code files are in +https://github.com/devnull-cz/c-prog-lang/tree/master/src +
  • +
+

Objective of the first class

+
    +
  • The objective of today's class is to provide all the basic building blocks so +that you can write code for moving a star (*) in a loop from left to right, +and back, on a terminal line, as a home assignment.
  • +
  • To see what we mean, compile it via cc src/moving-star.c (clone the git repo +first, see +Introduction +for more info) and run it via ./a.out.
  • +
  • Obviously, do not look at our code until you have written your own.
  • +
  • Now, let's move on!
  • +
+

First C program: "Hello, world"

+
    +
  • πŸ‘€ hello-world1.c +will compile with warnings
  • +
  • use gcc hello-world1.c to compile it, ./a.out to run it +
      +
    • or gcc -o hello-world1 hello-world1.c if you want a specific output
    • +
    +
  • +
  • we recommend the ViM editor to edit your files +(use ":syntax on" for syntax highlighting if not the default settings) + +
  • +
  • C runtime system is very small, printf is not part of it, that's why you need +an include file to provide the function prototype (ie. the return type and its +parameters)
  • +
  • fixed code with no warnings: πŸ‘€ hello-world2.c +
  • +
+

Basics

+
    +
  • +

    to get the source code for the examples, do the following on the command line. +However, do not look at it until you write your own version. Just compile and +run it first to see what each program does.

    +
     git clone https://github.com/devnull-cz/c-prog-lang.git
    +
    +
  • +
  • +

    each program must have a main() function

    +
      +
    • well, there are exceptions: see +freestanding environment +if interested. Another exception is implementation defined. It is +out of scope for this seminar though.
    • +
    +
  • +
  • +

    string literals (AKA string constants): πŸ‘€ printf.c

    +
  • +
  • +

    use the return operator to return a function value

    +
      +
    • in the main() funtion, return exits the program
    • +
    • in the shell, use echo $? to see the return value of the most +recently run program on the foreground
    • +
    • only the least significant byte taken as an unsigned integer (0-255) +is relevant
    • +
    • πŸ‘€ return.c +
    • +
    • if you do not use return from main() and the ending } is reached, +the program returns 0 (in C89 it would be a random number though).
    • +
    +
  • +
  • +

    you must declare a variable before you can use it

    + +
  • +
  • +

    printf() can use conversion specifications, each starts with %

    +
      +
    • +int i; printf("%d\n", i); +
        +
      • a character like d is called a conversion specifier +
      • +
      +
    • +
    • see man 3 printf for the gory details
    • +
    • number of conversions must match the number of arguments +
        +
      • πŸ‘€ printf2.c +
      • +
      • the compiler may or may not warn you but it will let you do it +(use -Werror to treat warnings as errors). With gcc, use +the -Wall option to show all warnings.
      • +
      • it will print garbage for conversions without a matching +argument -- whatever is on the stack (x86 32 bit) or in a +specific register (x86 64 bit) is printed.
      • +
      +
    • +
    +
  • +
  • +

    you can declare and initialize a variable at the same time

    +
      +
    • int i = 13;
    • +
    • +13 is called an initializer +
    • +
    • you can initialize a variable with another variable, and so on
    • +
    +
  • +
+
	int i = 13;
+	int j = i;
+	int k = i + j;
+
    +
  • +

    arithmetics

    +
      +
    • +== is for equality, = for an assignment +
        +
      • memory was precious in the past, and programs usually had more +assignments than comparisons
      • +
      +
    • +
    • ++, -, /, * +
    • +
    • +++, -- +
        +
      • +int i; i = 13; printf("%d\n", i++); will print 13 and then +increment i +
      • +
      • +int i; i = 13; printf("%d\n", ++i); will first increment i +then print 14 +
      • +
      • +++i is an expression, not a variable, so you cannot assign to it +
          +
        • this will not compile: ++i = 13; +
        • +
        +
      • +
      +
    • +
    • πŸ‘€ arithmetics.c +
    • +
    +
  • +
  • +

    save for an assignment, anywhere you can use a variable, you can use an +expression (e.g. you cannot do i + 1 = j)

    +
     printf("%d\n", 100 * 2);
    +
  • +
  • +

    if both operands are ints, the result is an int

    +
      +
    • +printf("%d\n", 9 / 5) will print 1 +
    • +
    +
  • +
  • +

    while loop

    + +
  • +
  • +

    if statement

    + +
  • +
  • +

    floating point numbers

    +
      +
    • πŸ‘€ float.c +
    • +
    • see the optional minimum field width and precision
    • +
    • experiment!!!
    • +
    +
  • +
  • +

    πŸ”§ print out a table for inch to centimeter conversion for 1-9 inches, +use ints only (not floats)

    +
      +
    • πŸ‘€ inches-to-cm.c +
    • +
    • use \t escape sequence for printf to print tabelators
    • +
    • like this:
    • +
    +
     printf("\tX\tY\n");
    +
      +
    • example output:
    • +
    +
     Inches	Centimeters
    + 1	2
    + 2	5
    + 3	7
    + 4	10
    + 5	12
    + 6	15
    + 7	17
    + 8	20
    + 9	22
    +
    +
  • +
  • +

    πŸ”§ use floats for the conversion code

    +
      +
    • πŸ‘€ inches-to-cm2.c +
        +
      • '\t' in a string will print a tab
      • +
      • +5 is the minimum field width
      • +
      • +.2 is the precision
      • +
      • see the printf(3) man page for details
      • +
      +
    • +
    • example output:
    • +
    +
     Inches	Centimeters
    + 1	 2.54
    + 2	 5.08
    + 3	 7.62
    + 4	10.16
    + 5	12.70
    + 6	15.24
    + 7	17.78
    + 8	20.32
    + 9	22.86
    +
    +
  • +
  • +

    πŸ”§ print fahrenheit to centigrade table. Use floats.

    +
      +
    • the formula to convert F to C is: (F - 32) Γ— 5/9. E.g. 72F is 22.22C.
    • +
    • πŸ‘€ fahr-to-cent.c +
    • +
    • example output:
    • +
    +
       0	-17.78
    +  20	 -6.67
    +  40	  4.44
    +  60	 15.56
    +  80	 26.67
    + 100	 37.78
    + 120	 48.89
    + 140	 60.00
    + 160	 71.11
    + 180	 82.22
    + 200	 93.33
    + 220	104.44
    + 240	115.56
    + 260	126.67
    + 280	137.78
    +
    +
  • +
+

Assignment for the first class

+
  /*
+   * Implement a moving star that zick zacks on the same line between some
+   * boundary (say 50 character wide).
+   *
+   * You will only need:
+   *
+   *	- while loop
+   *	- if statement (do not use else)
+   *	- printf()
+   *	- use a character '\r' to return to the beginning of a line
+   *
+   *	- use "poll(NULL, 0, <ms>);" to sleep <ms> miliseconds, do not worry
+   *	  about not understanding what exactly it does.  To make the compiler
+   *	  not complain, use "#include <poll.h>".  Alternatively, you can use
+   *	  "sleep(1)" (#include <unistd.h>) but it is too slow then.  For
+   *	  example:
+   *
+   *		poll(NULL, 0, 50);
+   *
+   *    - you will also need "fflush(stdout)" after each line is printed.  As
+   *      standard error is buffered in the C library, the text will generally
+   *      not be printed by printf() until a new line is printed, which will
+   *      never be the case here. So, the fflush() call makes sure all buffered
+   *      text is printed out.
+   *
+   * We expect something like this, with the star moving between those two
+   * column-like barriers:
+   *
+   * |                                            *     |
+   */
+
+

When you have a working solution, you can check out our code at +src/moving-star.c.

+ +
+
+
+ + + +
+
+
+
- +
\ No newline at end of file diff --git a/02.html b/02.html index c1dbf005..06ec86ec 100644 --- a/02.html +++ b/02.html @@ -2,100 +2,1186 @@ - GitHub rate limit reached - Grip - +
-
-
-

GitHub Rate Limit Reached

-

- The GitHub API rate limit - - has been reached for the hour. -

- -
- - -
-

What?

-

- GitHub imposes a limit of 60 requests/hour when using their API without authentication. -

- -

Why?

-

- This prevents people from anonymously abusing GitHub's system. -

-

- As for Grip, it's built to appear as close to GitHub as possible. Using - GitHub's API allows Grip to immediately and accurately reflect any updates - from GitHub, without the delay of busy maintainers or requiring you to upgrade. -

- -

Ok, fine. Where do I go from here?

-

- Until the offline renderer is complete, you can run Grip using
- the --user and --pass arguments to use basic auth,
- giving you 5,000 requests/hour. Run grip -h for details. -

-

- I do apologize for the inconvenience. If you need help, or have ideas on improving this - experience, please reach out joe@joeyespo.com -

+
+
+
+
+
+
+
+ + + +
+ +
+
+

+ 02.md +

+
+
+ +
+
+

Warm-up

+

πŸ”§ Convert units of measurement

+

Print conversion table for Ell (rope length +used by Frodo and Sam in The Lord of the Rings had some 30 ells) to inches +and centimeters, i.e. table with 3 columns separated by tabs.

+

Print the centimeter value as float with 2 digit precision. The cm value +will be the last column.

+

Each 10 lines print a line (sequence of - characters, say 20 times). +The line will immediately follow the table header and then will appear +every 10 lines. Use while cycle to print the line with - characters.

+

Print 30 numeric rows.

+

Sample output:

+
Ell	Inches	Centimeters
+--------------------
+1	45	114.30
+2	90	228.60
+3	135	342.90
+4	180	457.20
+5	225	571.50
+6	270	685.80
+7	315	800.10
+8	360	914.40
+9	405	1028.70
+10	450	1143.00
+--------------------
+11	495	1257.30
+12	540	1371.60
+13	585	1485.90
+14	630	1600.20
+15	675	1714.50
+16	720	1828.80
+17	765	1943.10
+18	810	2057.40
+19	855	2171.70
+20	900	2286.00
+--------------------
+21	945	2400.30
+22	990	2514.60
+23	1035	2628.90
+24	1080	2743.20
+25	1125	2857.50
+26	1170	2971.80
+27	1215	3086.10
+28	1260	3200.40
+29	1305	3314.70
+30	1350	3429.00
+
+

πŸ”‘ ell-in-cm.c

+

Source code management

+
    +
  • Keep all of your code somewhere. Use a distributed source code management +(SCM) system, ie. Git or +Mercurial. +
      +
    • You could even keep your repo in your home directory in the Linux lab as +some of those machines are accessible via SSH from anywhere or use services +like Gitlab or Github and +such.
    • +
    • We do recommend you never use centralized SCMs like +Subversion or +CVS, unless you +have to (e.g. working on existing legacy software), as those are things of +the past century.
    • +
    +
  • +
+

Comments

+
    +
  • +

    /* One line comment */

    +
  • +
  • +

    Multiline comment:

    +
  • +
+
  /*
+   * Multiline comment.  Follow the C style.
+   * Multiline comment.  Follow the C style.
+   */
+
+
    +
  • +

    // One line comment from C99+

    +
  • +
  • +

    Use comments sparingly.

    +
      +
    • Not very useful:
    • +
    +
    /* Increment i */
    +++i;
    +
      +
    • Produce meaningful comments, not like this:
    • +
    +
    /* Probably makes sense, but maybe not */
    +if (...)
    +         do_something()
    +
  • +
  • +

    Pick a reasonble style and stick to it. Mixing one line comments using both +// and /* */ is not the best style.

    +
  • +
  • +

    In general, you can always figure out what the code does, it is just a matter +of time to get there, but it may be impossible to figure out why the code +works that way. The reason might be historical, related to some other +decisions or existing code, purely random, or something else. If not clear, +commenting code on the why is extremely important. See also Chesterton's +fence principle.

    +
  • +
+

Preprocessor

+

The main purposes of the preprocessor are: string replacement, file inclusion, +general code template expansion (macros), and managing conditional compilation.

+
    +
  • +

    String replacement:

    +
      +
    • Basic defines: #define FOO or #define FOO 1 +
    • +
    • A define without a value is still meaningful for conditional compilation.
    • +
    +
  • +
  • +

    Including files:

    +
      +
    • +#include "foo.h" (start in current directory and then continue the +search in system paths) or #include <foo/bar.h> (just system paths) +
        +
      • Some compilers display the include search paths (e.g. clang with -v).
      • +
      • Use the -I compiler option to add search paths to the list.
      • +
      +
    • +
    +
  • +
  • +

    Conditional compilation:

    +
      +
    • +#if, #ifdef, #ifndef, #else, #endif +
        +
      • +#if can be used with expressions: +
        #if MY_VERS >= 42
        +...
        +#endif
        +
      • +
      +
    • +
    • Also useful for header guards (to avoid including same header file +multiple times): +
      #ifndef FOO_H
      +#define FOO_H
      +...
      +#endif
      +
    • +
    • Can be used e.g. for debug code: +
      #ifdef DEBUG
      +... // here can be anything (where valid):
      +    // statements, variable declarations/definitions, function definitions, ...
      +#endif
      +
        +
      • Then the compiler can be run with -DDEBUG to enable the code.
      • +
      +
    • +
    +
  • +
  • +

    Macros: for more complicated code snippets, e.g. #define IS_ZERO(a) a == 0

    +
      +
    • +

      The argument will be replaced with whatever is given.

      +
    • +
    • +

      Use parens for #define to prevent problems with macro expansion:

      +
        +
      • #define X (1 + 1)
      • +
      • Same for more complicated macros: +#define MUL(a, b) ((a) * (b)) +
      • +
      +
    • +
    +
  • +
+

πŸ‘€ mul.c

+

To see the result of running preprocessor on your code, use cpp or +the -E option of the compiler.

+

πŸ”§ Task: reimplement fahr-to-cent.c using defines instead of literal numbers

+

πŸ”‘ fahr-to-cent_defines.c

+

Expressions

+
    +
  • +

    Every expression has a value.

    +
  • +
  • +

    A logical expression has a value of either 0 or 1, and its type is always +an int.

    +
  • +
+
1 > 10	... 0
+10 > 1	... 1
+
+printf("%d\n", 1 < 10);
+--> 1
+/* Yes, "equal to" in C is "==" as "=" is used for an assignment */
+printf("%d\n", 100 == 101);
+--> 0
+
+
    +
  • +

    Even constants are expressions (more on that later), e.g. 1.

    +
  • +
  • +

    As the while statement is defined in C99 6.8.5 as follows:

    +
  • +
+
while (expression) statement
+

...and given that a constant is also an expression, a neverending while loop +can be written for example as follows. It is because it will loop until the +expression becomes 0. That is never happening in this case.

+
while (1) {
+	...
+}
+
+

You could also write while (2), while (1000), or while (-1) and it would +still be a neverending loop but that is not how C programmers do it.

+
    +
  • Note that the statement from the spec definition can be a code block as you +can see in the example code above, more on that later.
  • +
+

The break statement

+
    +
  • The break statement will cause a jump out of a most inner while loop (well, +any kind of loop but we only introduced the while loop so far).
  • +
+
int finished = 0;
+while (1) {
+	if (finished)
+		break;
+	/* not finished work done here */
+	call_a_function();
+	k = xxx;
+	...
+	if (yyy) {
+		...
+		finished = 1;
+	}
+	/* more work done here */
+	...
+}
+
+
    +
  • There is no break <level> to say how many levels to break as might be found +e.g. in a unix shell.
  • +
+

Basic operators

+
    +
  • An equality operator is == since a single = is for an assignment.
  • +
+
int i = 13;
+if (i == 13) {
+	// will do something here
+}
+
    +
  • Logical AND and OR:
  • +
+
if (i == 13 && j < 10) {
+	// ...
+
+if (i == 1 || k > 100) {
+	// ...
+
    +
  • +

    You do not need extra ()'s as || and && have lower priority than == and +<, >, <=, >=, and !=. We will learn more about operator priority in +later lectures.

    +
  • +
  • +

    Non-equality is !=

    +
  • +
+
if (i != 13) {
+	// ...
+}
+

The comma operator

+

Useful to perform expression evaluations in one place. The first part is evaluated, then the second part. +The result of the expression is the result of the second part, e.g.:

+
while (a = 3, b < 10) {
+...
+}
+
+

The cycle will be controlled by the boolean result of the second expression.

+

This is not limited just to 2 expressions, you can add more comma operators. It +is left associative.

+

Note that the comma used in a variable declaration (int a, b = 3;) or a +function call is not comma operator.

+

πŸ”§ What will be returned? πŸ‘€ comma.c

+

This is handy for cycle control expressions.

+

The boolean type

+

There is a new _Bool type as of C99. Put 0 as false, and a non-zero (stick +to 1 though) as a true value.

+

The keyword name starts with an underscore as such keywords were always reserved +in C while bool, true, nor false never were. So, some older code might +actually use those names so if C99 just put those in the language, it would have +broken the code and that is generally not acceptable in C.

+

If you are certain that neither bool, true, nor false are used in the code +on its own, you can use those macros if you include <stdbool.h>.

+

In that case, the macro bool expands to _Bool. true expands to 1 and +false to 0 and both may be also used in #if preprocessing directives.

+

See C99 section 7.16 for more information.

+

See πŸ‘€ bool.c

+

Numbers and types

+
    +
  • For example, the 1, 7, and 20000 integer literals are always integers of +type int if they fit in. +
      +
    • The range of an int is [-2^31, 2^31 - 1] on 32/64 bit CPUs, that means 4 +bytes of storage. However, an int may be stored in only two bytes as +well. The range would be [-2^15, 2^15 - 1] then. You will likely never +encounter such old platforms unless you look for them.
    • +
    • A larger number will automatically become a long int, then a long long int if the number literal does not fit a (signed) long. That means if +an unsigned long long type is stored in 8 bytes, one cannot use a decimal +constant of 2^64 in the code and expect it to hold such a value:
    • +
    +
  • +
+
$ cat main.c
+int
+main(void)
+{
+	unsigned long long ull = 18446744073709551616;
+}
+
+$ gcc -Wall -Wextra -Wno-unused main.c
+main.c: In function β€˜main’:
+main.c:4:34: warning: integer constant is too large for its type
+    4 |         unsigned long long ull = 18446744073709551616;
+      |                                  ^~~~~~~~~~~~~~~~~~~~
+
+
    +
  • +

    If you printed ull (using %llu as for unsigned long long int), you +would probably get 0. More on that later.

    +
  • +
  • +

    Hexadecimal numbers start with 0x or 0X. Eg. 0xFF, 0Xaa, 0x13f, +etc. In contrast to decimal constants, one can use a hexa constant for +2^64, which is 0xFFFFFFFFFFFFFFFF, even if unsigned long long is stored +in 8 bytes. More on that later.

    +
  • +
  • +

    Octal numbers start with 0. Eg. 010 is 8 in decimal. Also remember the +Unix file mask (umask), eg. 0644.

    +
  • +
  • +

    'A' is called a character constant and is always of type int. See man ascii for their numeric values. The ASCII standard defines characters with +values 0-127.

    +
  • +
  • +

    Note when we say a character, we mean a value that represents a character +from the ASCII table. A character is not the same thing as char.

    +
  • +
  • +

    Types float, double

    +
      +
    • If you man 3 printf, you can see that %f is of type double. You +can use:
    • +
    +
  • +
+
float pi = 3.14
+printf("%f\n", pi);
+
- `float`s are automatically converted to `double`s if used as arguments
+  in functions with variable number of arguments (known as *variadic
+  function*), i.e. like printf()
+
+
    +
  • +

    char (1 byte), short (usually 2 bytes), long (4 or 8 bytes), long long +(usually 8 bytes, and can not be less). It also depends on whether your +binary is compiled in 32 or 64 bits.

    +
      +
    • πŸ”§ See what code your compiler emits by default (i.e. without +using either -m32 or -m64 options) +
        +
      • Use the file command to display the information about the +binary.
      • +
      +
    • +
    +
  • +
  • +

    See also 5.2.4.2 Numerical limits +in the C spec. +For example, an int must be at least 2 bytes but the C spec does not prevent +it from being 8 bytes in the future.

    +
  • +
  • +

    chars and shorts are automatically converted to int if used as arguments +in variadic functions, and also if used as operands in many operators. More +on that later.

    +
  • +
  • +

    As 'X' is an int but within 0-127 (see above on the ASCII standard), it is +OK to do the following as it is guaranteed to fit even when the char type is +signed:

    +
  • +
+
char c = 'A';
+
    +
  • In printf, you need to use the same type of an argument as is expected by +the conversion specified. Note that e.g. integers and floating point numbers +have different representation, and printing an integer as a double (and vice +versa) will lead to unexpected consequences. More on that later.
  • +
+
printf("%f\n", 1);
+
+$ ./a.out
+0.000000
+
+

πŸ‘€ print-int-as-double.c

+

Signedness

+
    +
  • Each integer type has a signed and unsigned variant. By default, the +numeric types are signed aside from the char which depends on the +implementation (of the C compiler). If you need an unsigned type, use the +unsigned reserved word. If you need to ensure a signed char, use signed char explicitly.
  • +
+
signed int si;	// not used though, just use 'int si'
+unsigned int ui;
+unsigned long ul;
+unsigned long long ull;
+...
+
    +
  • +

    For ints, you do not even need to use the int keyword, ie. signed i, +unsigned u are valid but it is recommended to use int i and unsigned int u anyway.

    +
  • +
  • +

    You can use long int and long long int or just long and long long, +respectively. The latter is mostly used in C.

    +
  • +
  • +

    char and short int are converted to int in variadic functions (we will +talk more about integer conversions later in semester). That is why the +following is correct as the compiler will first convert variable c to int +type, then put it on the stack (common argument passing convention on +IA-32) +or in a register up to certain number of arguments (common x86-64 +calling convention).

    +
  • +
+
/* OK */
+char c = 127;
+printf("%d\n", c);
+
+/* OK */
+short sh = 32768;
+printf("%d\n", sh);
+

Modifiers for printf()

+
    +
  • +

    l for long, eg. long l; printf("%ld\n", l);

    +
  • +
  • +

    ll for long long, eg. long long ll; printf("%lld\n", ll);

    +
  • +
  • +

    u is unsigned, x is unsigned hexa, X is unsigned HEXA

    +
  • +
+
unsigned int u = 13;
+printf("%u\n", u);
+
+unsigned long long llu = 13;
+printf("%llu\n", llu);
+
+unsigned int u = 13;
+printf("%x\n", u);
+// --> d
+printf("%X\n", u);
+// --> D
+
    +
  • The following is a problem though if compiled in 32 bits as you put 4 bytes on +the stack but printf will take 8 bytes. Older compilers may not warn you at +all!
  • +
+
/* DEFINITELY NOT OK.  Remember, 13 is of the "int" type. */
+printf("%lld\n", 13);
+
+$ cc -m32 wrong-modifier.c
+wrong-modifier.c:6:19: warning: format specifies type 'long
+long' but the argument has type 'int' [-Wformat]
+	printf("%lld\n", 13);
+		~~~~     ^~
+		%d
+1 warning generated.
+$ ./a.out
+2026120757116941
+
+
    +
  • When compiled in 64 bits, it is still as incorrect as before but it will +probably print 13 anyway as 13 is assigned to a 64 bit register (because +of commonly used calling convention on x86-64). +So, if you use that code successfully in 64 bits you might be +surprised if the code is then compiled in 32 bits and "suddenly gets broken". +It was broken from the very beginning.
  • +
+
$ cc -m64 wrong-modifier.c
+wrong-modifier.c:6:19: warning: format specifies type 'long
+long' but the argument has type 'int' [-Wformat]
+	printf("%lld\n", 13);
+		~~~~     ^~
+		%d
+1 warning generated.
+$ ./a.out
+13
+
+

πŸ‘€ wrong-modifier.c

+

Suffixes

+
    +
  • +

    You can explicitly specify integer constants with different integer types +using suffices:

    +
      +
    • +13L and 13l is a long +
    • +
    • +13LL and 13ll is a long long (Ll and lL is illegal)
    • +
    • +13u and 13U is an unsigned int +
    • +
    • +13lu and 13LU is an unsigned long +
    • +
    • +13llu and 13LLU is an unsigned long long +
    • +
    +
  • +
  • +

    So, 0xFULL and 0XFULL is an unsigned long long 15 :-)

    +
  • +
+
printf("%llu\n", 0xFULL);
+// --> 15
+printf("%lld", 13LL);	/* OK */
+// --> 13
+/* NOT OK as long may be 4 bytes while long long is 8+ bytes */
+printf("%ld", 13LL);
+// --> ??
+
    +
  • Escape sequences \ooo and \xhh (not \Xhh) are character sized bit +patterns, either specified as octal or hexadecimal numbers, and representing a +single character. They can be used both in string and character constants +constants.
  • +
+
printf("\110\x6F\154\x61");	// Used in a string literal.
+printf("%c\n", '\x21');		// Used in a character constant.
+// -> Hola!
+

getchar()

+
    +
  • +getchar function reads one character from the process standard input and +returns its value as an int. +
      +
    • When it reaches end of input (for example, by pressing Ctrl-D in the +terminal), it returns EOF +
    • +
    • +EOF is a define, usually set as -1. That is why getchar returns +an int instead of a char as it needs an extra value for EOF.
    • +
    • +getchar needs #include <stdio.h> +
    • +
    • You can verify that EOF is part of <stdio>, search for "getchar" +here: https://pubs.opengroup.org/onlinepubs/9699919799 +
    • +
    +
  • +
+

πŸ”§ Task: write code that will read characters from a terminal and prints them out.

+

It should work like this:

+
$ cat /etc/passwd | ./a.out > passwd
+$ diff passwd /etc/passwd
+$ echo $?
+0
+
+
    +
  • Remember, we said above that an assignment is just an expression, so it has a +value. So, you can do the following:
  • +
+
if ((c = getchar()) == EOF)
+	return (0);
+

instead of:

+
c = getchar();
+if (c == EOF)
+	return (0);
+

However, do not abuse it as you may create a hard to read code. Note the +parentheses around the assignment. The = operator has a lower priority than +the == operator. If the parens are not used, the following would happen:

+

if (c = getchar() == EOF) would be evaluated as:

+

if (c = (getchar() == EOF)), meaning that c would be either 0 or 1 based +on whether we read a character or the terminal input is closed.

+

We will learn more about operator priority later in the semester.

+

πŸ”‘ getchar.c

+

The sizeof operator

+
    +
  • The sizeof operator computes the byte size of its argument which is either +an expression or a type name +
      +
    • This is not a function so you can use it without parens: sizeof foo unless +its argument is a type name, in that case parens are required. However, for +better readability parentheses are usually used.
    • +
    +
  • +
+
sizeof (1);	// OK
+sizeof 1;	// OK but "sizeof (1)" is better.
+sizeof 1 + 1;	// See?
+sizeof (int);	// OK
+sizeof int;	// Syntax error.
+
    +
  • Its type is size_t which is an unsigned integer according to the +standard. However, the implementation (= compiler) can choose whether +it is an unsigned int, an unsigned long int, or an unsigned long long int.
  • +
  • In printf(), the z modifier modifies u to size_t, so this is the +right way to do it:
  • +
+
printf("%zu\n", sizeof (13));
+// --> 4
+
    +
  • +

    You may see code using %u, %lu, %llu for sizeof values. However, +that will only work based on a specific compiler and the architecture and +may not work using a different combination. Always use %zu for arguments +of type size_t.

    +
  • +
  • +

    The expression within the sizeof operator is never evaluated (the +compiler should warn you about such code). Only the size in bytes needed to +store the value if evaluated is returned.

    +
  • +
+
int i = 1;
+printf("%zu\n", sizeof (i = i + 1));
+// --> 4
+printf("%d\n", i);
+// --> 1
+
    +
  • πŸ”§ Try sizeof on various values and types in printf(), compile with +-m 32 and -m 64 and see the difference
  • +
+
sizeof (1);
+sizeof (char);
+sizeof (long);
+sizeof (long long);
+sizeof ('A');
+sizeof ('\075');
+sizeof (1LL);
+// ...
+
    +
  • We will get there later in semester but if you are bored, try to figure out +why the following is going to print 1 4 4:
  • +
+
char c;
+printf("%zu\n", sizeof (c));
+// --> 1
+printf("%zu\n", sizeof (c + 1));
+// --> 4
+printf("%zu\n", sizeof (+c));
+// --> 4
+printf("%zu\n", sizeof (++c));
+// --> 1
+

The sizeof operator is usually evaluated during compilation time however +this is not universally true. For Variable Length Arrays (VLAs) it has to +happen during runtime. The VLAs will be explained later.

+

Integer constants

+
    +
  • +

    An integer constant can be a decimal, octal, or hexadecimal constant.

    +
  • +
  • +

    All of these are equal:

    +
  • +
+
printf("%c\n", 0101);
+// --> A
+printf("%c\n", 0x41);
+// --> A
+printf("%c\n", 65);
+// --> A
+
    +
  • Technically, 0 is an octal constant, not a decimal constant, since an octal +constant always begins with 0. The following will generate an error:
  • +
+
printf("%d\n", 099);
+
main.c: In function β€˜main’:
+main.c:6:17: error: invalid digit "9" in octal constant
+    6 |  printf("%d\n", 099);
+      |                 ^~~
+
+
    +
  • If you use a larger number than one that fits within a byte as an argument for the %c +conversion, the higher bits are trimmed. The rule here is that the int +argument is converted within printf to unsigned char (not just char!), +then printed as a character (= letter). More on the integer conversion in +upcoming lectures. See also +Numbers +on what happens with char or short when passed as argument to a variadic function. +
      +
    • Also note the existence of h and hh modifiers. See the printf() +man page for more information.
    • +
    +
  • +
+
printf("%c\n", 65 + 256 + 256 + 256 * 100);
+// --> still prints A
+
    +
  • Assignment is also an expression, meaning it has a value of the result, so the +following is legal and all variables a, b, and c will be initialized +with 13 (it is right associative).
  • +
+
int a, b, c;
+a = b = c = 13;
+

πŸ”§ Task: print ASCII table

+

Print ASCII table with hexadecimal values like on in the ascii(7) man page in OpenBSD +except for non-printable characters print NP (non-printable).

+

To determine whether a character is printable you can use the isprint() function.

+

Use just while and if (without else).

+

Sample output:

+
00 NP	01 NP	02 NP	03 NP	04 NP	05 NP	06 NP	07 NP	
+08 NP	09 NP	0a NP	0b NP	0c NP	0d NP	0e NP	0f NP	
+10 NP	11 NP	12 NP	13 NP	14 NP	15 NP	16 NP	17 NP	
+18 NP	19 NP	1a NP	1b NP	1c NP	1d NP	1e NP	1f NP	
+20  	21 !	22 "	23 #	24 $	25 %	26 &	27 '	
+28 (	29 )	2a *	2b +	2c ,	2d -	2e .	2f /	
+30 0	31 1	32 2	33 3	34 4	35 5	36 6	37 7	
+38 8	39 9	3a :	3b ;	3c <	3d =	3e >	3f ?	
+40 @	41 A	42 B	43 C	44 D	45 E	46 F	47 G	
+48 H	49 I	4a J	4b K	4c L	4d M	4e N	4f O	
+50 P	51 Q	52 R	53 S	54 T	55 U	56 V	57 W	
+58 X	59 Y	5a Z	5b [	5c \	5d ]	5e ^	5f _	
+60 `	61 a	62 b	63 c	64 d	65 e	66 f	67 g	
+68 h	69 i	6a j	6b k	6c l	6d m	6e n	6f o	
+70 p	71 q	72 r	73 s	74 t	75 u	76 v	77 w	
+78 x	79 y	7a z	7b {	7c |	7d }	7e ~	7f NP	
+
+

πŸ”‘ ascii-hex.c

+

πŸ”§ Home assignment

+

Note that home assignments are entirely voluntary but writing code is the only +way to learn a programming language.

+

πŸ”§ Count digit occurrence

+

If unsure about the behavior, compile our solution and run it.

+
    +
  • Read characters until EOF and count occurence of each 0-9 digit. Only use +what we have learned so far. You may end up with longer code than otherwise +necessary but that is OK.
  • +
+
$ cat /etc/passwd | ./a.out
+0: 27
+1: 37
+2: 152
+3: 38
+4: 39
+5: 43
+6: 34
+7: 35
+8: 29
+9: 31
+
+

πŸ”‘ count-numbers.c

+
    +
  • Variant: instead of printing occurrences, print * characters to get a +histogram. Use log() (see math(3)) to trim the values down.
  • +
+

πŸ”§ To upper

+

Convert small characters to upper chars in input. Use the fact that a-z and +A-Z are in two consequtive sections of the ASCII table.

+

Use the else branch:

+
	if (a) {
+		...
+	} else {
+		...
+	{
+
+

Expected output:

+
	$ cat /etc/passwd  | ./a.out
+	##
+	# USER DATABASE
+	#
+	# NOTE THAT THIS FILE IS CONSULTED DIRECTLY ONLY WHEN THE SYSTEM IS RUNNING
+	# IN SINGLE-USER MODE.  AT OTHER TIMES THIS INFORMATION IS PROVIDED BY
+	# OPEN DIRECTORY.
+	#
+	# SEE THE OPENDIRECTORYD(8) MAN PAGE FOR ADDITIONAL INFORMATION ABOUT
+	# OPEN DIRECTORY.
+	##
+	NOBODY:*:-2:-2:UNPRIVILEGED USER:/VAR/EMPTY:/USR/BIN/FALSE
+	...
+	...
+
+

πŸ”‘ to-upper.c

+ +
+
+
+ + + +
+
+
+
- +
\ No newline at end of file diff --git a/03.html b/03.html index c1dbf005..d947cd0b 100644 --- a/03.html +++ b/03.html @@ -2,100 +2,1061 @@ - GitHub rate limit reached - Grip - +
-
-
-

GitHub Rate Limit Reached

-

- The GitHub API rate limit - - has been reached for the hour. -

- -
- - -
-

What?

-

- GitHub imposes a limit of 60 requests/hour when using their API without authentication. -

- -

Why?

-

- This prevents people from anonymously abusing GitHub's system. -

-

- As for Grip, it's built to appear as close to GitHub as possible. Using - GitHub's API allows Grip to immediately and accurately reflect any updates - from GitHub, without the delay of busy maintainers or requiring you to upgrade. -

- -

Ok, fine. Where do I go from here?

-

- Until the offline renderer is complete, you can run Grip using
- the --user and --pass arguments to use basic auth,
- giving you 5,000 requests/hour. Run grip -h for details. -

-

- I do apologize for the inconvenience. If you need help, or have ideas on improving this - experience, please reach out joe@joeyespo.com -

+
+
+
+
+
+
+
+ + + +
+ +
+
+

+ 03.md +

+
+
+ +
+
+

Warm-up

+

Count words

+

Print out number of words read from the standard input. Ideally, use your +getchar based code from your repository to start with.

+

Always try to reuse code you have already written! Then do not forget to +upload it to your +source code management repository.

+

A word is defined as a group of any characters surrounded by whitespace +characters. Those are: a tabelator, a space, and a newline.

+
    +
  • +

    Write your own check for a whitespace character, do not use library functions +for that.

    +
  • +
  • +

    Check correctness of your implementation with wc -w <file>

    +
  • +
  • +

    What happens if the number of words exceeds the type that stores the count?

    +
  • +
+

πŸ”‘ words.c +Does uses a library check isspace for white space for simplicity.

+

πŸ”‘ words2.c +It is even simpler than the first solution while not using the library function.

+

Example:

+
$ cat /etc/passwd | ./a.out
+70
+
+

Array Intro

+

You define an array like this <element-type> array[<some-number>], for +example:

+
int array[5];
+

The integer value n in [n] specifies the number of array elements.

+
    +
  • An array subscript (also called an index) always starts from 0 and ends +with n - 1 +
  • +
  • A subscript may be any integer expression
  • +
  • The type of array elements is called an element type +
  • +
+

So, in array int array[3], elements will be accessible as a[0] .. a[2]. +Each element is of type int, and therefore you can do e.g. printf("%d\n", a[2]);.

+
    +
  • +

    0 was chosen as the first subscript so that it was easier to work with +pointers, and also for array access efficiency - we will get to pointers +later.

    +
  • +
  • +

    What is not possible to do with arrays in C (limitations are important +knowledge):

    +
      +
    • Associative arrays
    • +
    • Array subscripts returning a sub-array (like found e.g. in Python)
    • +
    • Assigning to an array as a whole
    • +
    +
  • +
  • +

    As with integer and floating point variables, we may initialize an array +during its definition. In general, the contruct for any variable +initialization (not just arrays) is known as an initializer.

    +
  • +
+
short array[3] = { 1, 2, 3 };	// "{ 1, 2, 3 }" is the initializer
+

If the array size is omitted the compiler will compute the size from the number +of initializers. So, you can just do the following:

+
short array[] = { 1, 2, 3 };
+char array[] = { 'h', 'e', 'l', 'l', 'o' };
+

If you need your array to contain only the elements from the initializer, +omitting the array size is the way to go to avoid errors in modifying the +initializer while forgetting to update the array size.

+
    +
  • +

    The sizeof operator on array always gets the array size in bytes. It +will not get the array size in elements.

    +
      +
    • To get the number of elements in an array, you must divide the array +size in bytes by the size of its element. Always use 0 subscript, +see below on why.
    • +
    +
  • +
+
int a[5];
+
+printf("Elements in array: %zu\n", sizeof (a) / sizeof (a[0]));
+

The above is the correct approach that is immune to changing the array +declaration (i.e. the type of elements). As arrays may not be empty, there is +always an element [0] (see below). Do not use the following:

+
sizeof (array) / sizeof (int)
+

πŸ”§ Declare an array of chars without setting the array size, initialize +the array during declaration (see above), then print each element of the array +on a separate line using a while loop.

+

Arrays introduced so far are not dynamic and can not be resized.

+
    +
  • Try to perform an out-of-bound access. What is the threshold for behavior +change on your system ?
  • +
  • Why is it not faulting for the one-off error?
  • +
+

πŸ‘€ array-out-of-bounds.c

+
$ ./a.out
+Number of array elements: 1024
+One-off error (using index 1024)... OK
+Assigning to index 4096... Segmentation Fault
+
+

You can also try to locate where it crashed. For more information, see +some info on debugging.

+

You do not need to initialize all elements. With such type of an +initialization, you always start from subscript 0, and there are no gaps:

+
short array[4] = { 1, 2, 3 };
+

In the example above, the elements not explicitly initalized are set to 0 so +the value of array[3] will be initialized to 0.

+
    +
  • +

    I.e. int array[100] = { 0 }; will have all values set to 0

    +
  • +
  • +

    The initialization is done in code added by the compiler. That code is part +of the runtime system of the language.

    +
  • +
  • +

    Using = {} is not allowed by the C specification (allowed in C++) but +generally accepted. Not with -Wpedantic though:

    +
  • +
+
cc -Wpedantic test.c
+test.c:1:13: warning: use of GNU empty initializer extension
+      [-Wgnu-empty-initializer]
+int a[10] = {};
+	    ^
+1 warning generated.
+
+

Note: global variables are always zeroized.

+

There is a partial array initialization where the initializers are called +designated initializers in the C spec:

+
char array[128] = { [0] = 'A', [2] = 'f', [4] = 'o', [6] = 'o' };
+
    +
  • A subscript is in square brackets
  • +
  • The [subscript] is known as a designator. Increasing ordering is not +required but expected.
  • +
  • The rest of elements will be initialized to zero
  • +
  • If you do not specify the array size, it is taken from the highest designator +index
  • +
  • You can combine designators with fixed order initializers, and you always +start with the next index. For example:
  • +
+
  /* a[5] is 'e' etc.  a[0]..a[3] are NUL characters (= zero bytes). */
+  char a[128] = { [4] = 'h', 'e', 'l', 'l', 'o' };
+
    +
  • You cannot specify a repetition or setting multiple elements to the same value +(there is a gcc extension for that but let's not go there).
  • +
+

πŸ‘€ array-designated-initializers.c

+

Note that the code file right above mentions in a comment that a missing = is +a GCC extension. With a non-GCC compiler it does not compile:

+
$ cc array-designated-initializers.c
+"array-designated-initializers.c", line 15: syntax error before or at: 'A'
+cc: acomp failed for array-designated-initializers.c
+
+

Once an array is declared, its elements cannot be assigned at once. So, you can +only do things as follows:

+
int array[4];
+
+array[0] = 1;
+array[1] = 2;
+array[2] = array[3] = 3;
+// ...
+

You cannot assign an array into another array - has to be done an element by +element.

+
    +
  • Likewise for a comparison
  • +
+

Arrays cannot be declared as empty (int a[0]).

+
    +
  • This is explicitly forbidden by the standard, see +C99 +6.7.5.2 Array declarators.
  • +
  • GCC accepts that though. Do not use it like that.
  • +
+

πŸ‘€ empty-array.c

+

This might be a bit confusing though:

+
$ gcc empty-array.c
+$ ./a.out
+4
+0
+
+

Even if a compiler supports an empty array declaration, sizeof(a) / sizeof(a[0]) construction is always correct to compute a number of array +elements. Plus, remember the compiler does not do any array access when +computing sizeof(a[0]) as the expression is not evaluated (see +the sizeof operator +), the compiler only uses the argument to determine the size.

+

Function introduction

+

Functions can be used to encapsulate some work, for code re-factoring, etc.

+

A function has a single return value and multiple input parameters.

+
    +
  • If you also need to extract an error code with the value or get multiple +return values, that needs to be done via passing data via reference (more on +that when we have pointers).
  • +
  • I.e. this is not like Go that returns an error along with the data.
  • +
+

A function declaration is only declaring an API without its body. In C, it +is called a function prototype and it consists of a type, a function name, and +an parameter list in parentheses. For example:

+
int digit(int c);
+int space(int c);
+float myfun(int c, int i, float f);
+

When defining the function, its body is inclosed in {} (just like the main +function).

+
int
+return_a_number(void)
+{
+	return (1000);
+}
+

When we declare or define a function, the function has parameters. When we +call such a function though, we pass in arguments. So, for the above +mentioned function digit, if called as digit(7), we passed argument 7 as +the parameter c. Note that in the past these were also called, respectively, +formal parameters and actual parameters.

+

Also as with the main function, you can use void instead of the argument +list to say the function accepts no arguments. You could just use () to +indicate the function has no arguments but that is an old way of doing things +and in that case the compiler will not verify the number of arguments when +calling the function (you can check it out for yourself). So, always use +(void) if the function has no arguments.

+
void
+hola(void)
+{
+	printf("Hola!\n");
+}
+

A function call is an expression so you can use it as such:

+
printf("%d\n", return_a_number());
+

The default return value type is an int but a compiler will warn if it is +missing. You should always specify the type in +C99+ +. You may use void which means the function returns no value.

+

Write your function declarations at the beginning of a C file or include those +into a separate header file.

+

If a compiler hits a function whose prototype is unknown, warnings are issued. +Remember πŸ‘€ hello-world1.c +? See also here:

+
$ cat test.c
+int
+main(void)
+{
+	fn();
+}
+
+float
+fn(void)
+{
+	return (1.0);
+}
+$
+$ gcc test.c
+test.c:4:2: warning: implicit declaration of function 'fn' is
+invalid in C99
+      [-Wimplicit-function-declaration]
+	fn();
+	^
+test.c:8:1: error: conflicting types for 'fn'
+fn(void)
+^
+test.c:4:2: note: previous implicit declaration is here
+	fn();
+	^
+1 warning and 1 error generated.
+
+

Parameter names may be omitted in prototypes, i.e.:

+
int myfn(int, int);
+

A variable defined in function parameters or locally within the function +overrides global variables. Each identifier is visible (= can be used) only +within a region of program text called its scope. See +C99, section 6.2.1 Scopes of identifiers for more +information.

+

Within a function body, you may use its parameters as any other local variables.

+
int
+myfn(int i, int j)
+{
+	i = i * j;
+	return (i);
+}
+

As arguments are always passed by value in C, the variable value is not +changed in the caller:

+
/* myfn defined here */
+
+int
+main(void)
+{
+	int i = 3;
+
+	printf("%d\n", myfn(i, i));	// will print 9
+	printf("%d\n", i);		// will print 3
+}
+

Local variables are stored on stack.

+

Argument passing depends on bitness and architecture. E.g. 32-bit x86 puts them +on the stack, 64-bit x64 ABI puts first 6 arguments to registers, the rest on a +the stack.

+

Functions can be recursive. πŸ‘€ recursive-fn.c

+
/*
+ * Print 0..9 recursively using a function call fn(9).
+ */
+#include <stdio.h>
+
+void
+myfn(int depth)
+{
+	if (depth != 0)
+		myfn(depth - 1);
+	printf("%d\n", depth);
+}
+
+int
+main(void)
+{
+	myfn(9);
+}
+

πŸ”§ Print 9..0 using a recursive function.

+

solution πŸ‘€ recursive-fn2.c

+

Arrays and functions

+

Arrays in C are not a first class object, rather it is an aggregation of +elements. An array is one type of an aggregate type, see +C99 spec 6.2.5 Types (paragraph 21) +for more information.

+

An array cannot be returned from a function. A pointer to an array can be +but more on that later.

+

Not allowing to return an array is done for efficiency as copying the whole +array (by value) would be too expensive.

+
    +
  • Watch for array sizes if used within functions.
  • +
  • Arrays as local variable may significantly increase the stack size (and a +stack size is limited in threaded environments).
  • +
+

πŸ‘€ func-large-array.c

+

The following may or may not happen in your environment:

+
$ cc func-large-array.c
+$ ./a.out
+Segmentation fault: 11
+
+

Strings

+

A contiguous sequence of non-zero bytes (chars) terminated by a zero byte is +called a string (C99 7.1.1).

+
char foo[] = { 'b', 'a', 'r', 0 };
+

Note that a string does not have to be declared as an array as shown above, it +may be just a piece of memory that meets the definition of a string above.

+

The crucial fact is that a string is always terminated by a null (zero) byte, +sometimes also called NUL. That is how C knows where the string ends. There +is no metadata involved with a string, e.g. its length. To figure out the +length of a string, the string must be sequentially searched for the first NUL.

+

The string length is the number of bytes preceding the null character.

+

It also means the size of such an character array is one byte more than the +number of non-zero characters in the string. It is because of the terminating +zero (\0). The NUL belongs to the string though.

+
char foo[] = { 'b', 'a', 'r', '\0' };
+
+printf("%zu\n", sizeof (foo));		// prints 4
+

Using '\0' suggests it is a terminating null character but it is just a zero +byte. So, you could use 0, as follows, but it is not generally used:

+
char foo[] = { 'b', 'a', 'r', 0 };
+
    +
  • To print a string via printf(), you use the %s conversion specifier:
  • +
+
char foo[] = { 'b', 'a', 'r', 0 };
+
+printf("%s\n", foo);	/* will print "bar" without the quotes */
+

πŸ”§ What happens if you forget to specify the terminating zero in the above +per-char initializator and try to print the string ?

+

πŸ‘€ array-char-nozero.c

+

Note that the following array of characters contains three strings:

+
char foo[] = { 'a', '\0', 'b', 0, 'c', 0 };
+

And an character array does not need to be or contain a string at all:

+
char not_a_string[] = { 'h', 'i' };
+

A special case of of initializing a character array that is to represent a +string is using double quotes. The terminating zero will be the last character +of the array.

+
char mys[] = "This is a string";
+
+int i = 0;
+
+/* Now print the string from the array one by one. */
+while (mys[i] != '\0')
+	printf("%c", mys[i++]);
+printf("\n");
+
+/* Normally, you would of course do the following. */
+printf("%s\n", mys);
+

πŸ‘€ init-array-from-string.c

+

Note that the above really creates a local variable on the stack with its memory +initialized from the string when the code is executed.

+
$ cc init-array-from-string.c
+$  strings a.out | grep bar
+foobar
+
+

Function return values and ABI

+

Let's look at the following code. Obviously, the function does not return its +value while it definitely should have. What happens if we use the function +return value anyway?

+

πŸ‘€ why-it-works.c

+
#include <stdio.h>
+
+int
+addnum(int n1, int n2)
+{
+        int n = n1 + n2;
+}
+
+int
+main(void)
+{
+        printf("%d\n", addnum(1, 99));
+}
+

It simply cannot work, right? Oh, wait..., it does!?!

+
$ cc why-it-works.c
+$ ./a.out 
+100
+
+

So what happened???

+

Well, it does not really work. It "works" by accident. Let's look at it more +closely. To establish a common environment, I will be using +the Linux lab at MalΓ‘ Strana which you all have access to, so that you can try +and see.

+

Before I begin, let me give you more information:

+
    +
  • It depends on a system and its version, and a compiler and its version +
      +
    • you can also try the clang compiler later on the same machine
    • +
    +
  • +
  • Will be using gcc which defaults to generate 64 bit binaries on the Linux +distro installed on u-pl3.ms.mff.cuni.cz (and other machines in the lab).
  • +
  • 64 bit binaries on x86 use X86-64 ABI +
  • +
  • +ABI is not +API +
  • +
  • By this ABI (it's law!), the first two function integer arguments are passed +through the general purpose edi and esi registers
  • +
  • Integer function return value is passed back to the caller via the general +purpose eax register
  • +
  • The stack on x86 grows down
  • +
+

Let's compile the code and disassemble it. All we need are the main and +addnum function. See my notes inline.

+
$ cc why-it-works.c
+$ objdump -d a.out
+...
+<removed non-relevant stuff>
+...
+0000000000001145 <addnum>:
+    1145:	55                   	push   %rbp
+    1146:	48 89 e5             	mov    %rsp,%rbp
+
+    	- initialize a frame pointer for this function.
+
+    1149:	89 7d ec             	mov    %edi,-0x14(%rbp)
+    114c:	89 75 e8             	mov    %esi,-0x18(%rbp)
+
+	- here we put both function arguments on a stack.  As we already
+	  learned, function arguments may be used within a function as local
+	  variables, so they need to be on a stack.
+
+	- why we put them to -0x14 and -0x18 offsets from the base?  That's just
+	  what this gcc version does.  They are 4 bytes apart as we work with 4
+	  byte integers.
+
+    114f:	8b 55 ec             	mov    -0x14(%rbp),%edx
+    1152:	8b 45 e8             	mov    -0x18(%rbp),%eax
+
+	- moved the values of our local variables to general purpose registers
+
+    1155:	01 d0                	add    %edx,%eax
+
+    	- sum up the values.  As we add edx to eax, the result is in eax.
+
+    1157:	89 45 fc             	mov    %eax,-0x4(%rbp)
+
+	- put the result to the local variable "n" which happens to be at offset
+	  -0x4 from the frame pointer.
+
+	- now, see above, register eax is used in x86 64 ABI for the function
+	  return value.  So, by accident, we have the "right" value in the
+	  register that is supposed to hold the return value!!!
+
+    115a:	90                   	nop
+    115b:	5d                   	pop    %rbp
+    115c:	c3                   	retq   
+
+000000000000115d <main>:
+    115d:	55                   	push   %rbp
+    115e:	48 89 e5             	mov    %rsp,%rbp
+    1161:	be 63 00 00 00       	mov    $0x63,%esi
+    1166:	bf 01 00 00 00       	mov    $0x1,%edi
+
+	- put the values 1 and 99 to the registers representing the 1st and the
+	  2nd argument in x86 64 bit ABI
+
+    116b:	e8 d5 ff ff ff       	callq  1145 <addnum>
+
+	- called our function from main()
+
+    1170:	89 c6                	mov    %eax,%esi
+
+	- x86 64 ABI expects the function return value in the register eax, so
+	  we just put it to the input register esi representing the 2nd argument
+	  for the printf() function.
+	- and, given that we happened to have the right value in eax, the code
+	  looks like it works.
+
+    1172:	48 8d 3d 8b 0e 00 00 	lea    0xe8b(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
+    1179:	b8 00 00 00 00       	mov    $0x0,%eax
+    117e:	e8 bd fe ff ff       	callq  1040 <printf@plt>
+    1183:	b8 00 00 00 00       	mov    $0x0,%eax
+    1188:	5d                   	pop    %rbp
+    1189:	c3                   	retq   
+    118a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
+
+

What if we increment variable n, does it change anything?

+
int
+addnum(int n1, int n2)
+{
+	int n = n1 + n2;
+	++n;
+}
+
$ gcc main.c 
+./a.out 
+100
+
+

Apparently not, let's see the disassembled code:

+
0000000000001145 <addnum>:
+    1145:	55                   	push   %rbp
+    1146:	48 89 e5             	mov    %rsp,%rbp
+    1149:	89 7d ec             	mov    %edi,-0x14(%rbp)
+    114c:	89 75 e8             	mov    %esi,-0x18(%rbp)
+    114f:	8b 55 ec             	mov    -0x14(%rbp),%edx
+    1152:	8b 45 e8             	mov    -0x18(%rbp),%eax
+    1155:	01 d0                	add    %edx,%eax
+    1157:	89 45 fc             	mov    %eax,-0x4(%rbp)
+    115a:	83 45 fc 01          	addl   $0x1,-0x4(%rbp)
+
+    	- OK, the compiler just directly adds 1 to the memory address holding
+	  local variable "n".  As it does not change the register eax, it still
+	  "works".
+
+    115e:	90                   	nop
+    115f:	5d                   	pop    %rbp
+    1160:	c3                   	retq
+
+

So, let's involve another local variable.

+
int
+addnum(int n1, int n2)
+{
+	int n = n1 + n2;
+
+	int i = 13;
+	n = i + n;
+}
+
$ gcc main.c
+$ ./a.out
+13
+
+

Bingo! It no longer "works". What happened?

+
0000000000001145 <addnum>:
+    1145:	55                   	push   %rbp
+    1146:	48 89 e5             	mov    %rsp,%rbp
+    1149:	89 7d ec             	mov    %edi,-0x14(%rbp)
+    114c:	89 75 e8             	mov    %esi,-0x18(%rbp)
+    114f:	8b 55 ec             	mov    -0x14(%rbp),%edx
+    1152:	8b 45 e8             	mov    -0x18(%rbp),%eax
+    1155:	01 d0                	add    %edx,%eax
+
+    	- this was summing up our two function arguments
+
+    1157:	89 45 f8             	mov    %eax,-0x8(%rbp)
+
+	- the result went to the local variable "n"
+
+    115a:	c7 45 fc 0d 00 00 00 	movl   $0xd,-0x4(%rbp)
+
+	- we initialized local variable "i" (0xd in hexa is 13 in decimal)
+
+    1161:	8b 45 fc             	mov    -0x4(%rbp),%eax
+
+	- See?  Now we used eax for further arithmetic operations.  We just put
+	  the current value of local variable "i" to the register.
+
+    1164:	01 45 f8             	add    %eax,-0x8(%rbp)
+
+	- and we added the register value to the present value of local variable
+	  "n".  And as 13 was left in eax, that served as the function return
+	  value.
+
+    1167:	90                   	nop
+    1168:	5d                   	pop    %rbp
+    1169:	c3                   	retq
+
+

πŸ”§ Try the clang compiler and figure out what happened there.

+
$ clang main.c
+main.c:7:1: warning: control reaches end of non-void function [-Wreturn-type]
+}
+^
+1 warning generated.
+$ ./a.out
+0
+
+

What we should take away from this situation:

+
    +
  • Anything looking as working does not mean it is correct.
  • +
  • Ideally, if possible, test on different architectures (like x86, SPARC, ARM, +etc).
  • +
  • Using different compilers and different systems may help as well. For +example, if you develop in a Linux distro with gcc, testing it also on a +macOS laptop would be worth it (C compiler in macOS Xcode IDE is clang).
  • +
  • If something magically stops working that did work before, be ready for +breakage like this. Even something that has worked for ages does not +necessarily means the code must have been correct.
  • +
  • Always use -Wall -Wextra GCC options when building your code. More on that +later in the seminar. See that clang warns by default which is a good +thing.
  • +
+

Do you want more information on x86 ABI and how it really works behind the +scene? Search for Solaris Crashdump Analysis Frank +Hofmann for +more details.

+

You probably do not want all the gory +details +though...

+

πŸ”§ Home assignment

+

Note that home assignments are entirely voluntary but writing code is the only +way to learn a programming language.

+

πŸ”§ Count digit occurrence (use arrays)

+

Write a simple program on counting +the digit occurence +as in one of the previous classes but now use an array for counting the figures.

+

πŸ”§ Count digits, white space characters, and the rest

+

Print total number of (decimal) numbers, white space (tab, space, newline) and +anything else from the input. I.e. the program prints out three numbers.

+

Obviously, reuse code you wrote for digit occurence counting: +count-digit-occurence.md

+
    +
  • write the program using direct comparisons like this:
  • +
+
c >= '0' && c <= '9'
+c == ' ' || c == '\t' || c == '\n'
+
    +
  • put both the checks into separate functions
  • +
  • then write a new version of the program and use: +
      +
    • +isspace() from C99
    • +
    • +isdigit() vs isnumber() - the latter detects digits + possibly +more characters (depending on locale setting)
    • +
    +
  • +
+

πŸ”§ Count letter occurrence

+
    +
  • Count occurrences of all ASCII letters on standard input and print out +a histogram
  • +
  • This will be case insensitive (i.e. implement mytolower(int) first)
  • +
  • The output will look like this:
  • +
+
$ cat /some/file | ./a.out
+a: ***
+b: *
+c: *****************************
+...
+z: *******
+$
+
+
    +
  • Use a function to print a specific number of stars.
  • +
  • Use a function to do all the printing.
  • +
  • Declare array(s) to be as efficient as possible. +
      +
    • That is, having the lowest possible size.
    • +
    • Only use global variables if necessary.
    • +
    +
  • +
+ +
+
+
+ + + +
+
+
+
- +
\ No newline at end of file