Thursday, January 12, 2012

Programming Trivia: Fun with Fork


What is the output of the following program?

// fork.cpp

#include <cstdio>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>


int main() {
int x = 1;
while (x <= 100) {
if (fork() != 0)
break;
printf("%d\n", x);
x = x+1;
}
waitpid(-1, &x, 0);
return 0;
}

Let us say I run the program as

$ ./fork

How about

$ ./fork > file.txt

Think hard before actually running the program ... the output had me stumped for a while until I realized I was wrong!
Feel free to discuss the solution in comments.

Update:
Time for Solution. So I assume that the output for the first case is clear. If not Rob explains it nicely in the comments below. What happens when we redirect the output to a file?

printf() is a C library function that uses buffering on top of the system call write() to optimize the number of expensive calls to the kernel. The C library would be dumb if it made a call to the kernel for every character output. So what is the size of the buffer? It depends ... If the output file descriptor is a terminal then it is line buffered, that is, the buffer is flushed after a new line is seen. However if the output is being sent to a file, then it is block buffered that is the size of block transfers between disk and main memory. So once we redirect it to the file the '\n' in the printf() call does not flush the buffer. Where is the buffer located ... In the process's address space of course. On invoking fork, the child process receives a copy of the parent process's address space. This includes the partially filled buffer containing the output of printf(). Now this child process calls printf() and appends to this buffer. Before exiting all buffers are flushed so the child process sends its own output appended to its' parent's output. Voila we have the output of the parent process appearing twice in the output file. This effect is chained across forks ... !

One must not depend on this buffering to implement any functionality in a program though. However flushing stdio buffers before a fork could be a harmless (and often beneficial) addition to your code!

8 comments:

  1. only the output of the main thread is redirected to the file.txt, rest on console, is it ?

    ReplyDelete
  2. @Saurabh No. Try running the program that is not what happens.

    ReplyDelete
  3. /* SPOILERS AHEAD */
    It will print 1 to 100. Child processes get a copy of the memory space of the parent, so when the first process spawns a child, x will be 1 for both. fork() returns the PID of the child to the spawning process, which will cause it to break from the while loop and hit the waitpid(). Meanwhile the child will get a return of 0 from fork() and continue to print x then increment x. The child will follow the while loop and spawn a new child that has the new x. The original child will break due to the PID returned from fork() and the grandchild will run with x equal to 2, print, increment, repeat. This will continue until the child that gets 100 breaks from the loop and returns (as it has no children to wait on). This will propogate up the tree of processes until as each returns due to its children finishing and the program is complete.

    ReplyDelete
  4. @Rob Try running ./fork and ./fork > file.txt

    ReplyDelete
  5. Ahhh my bad. I missed it. That's bizarre!

    ReplyDelete
  6. printf("%d\n", x);
    fflush(NULL);
    x = x+1;

    this will get what we're expecting.

    ReplyDelete
    Replies
    1. How does this work?

      How can we explain the previous behaviour on redirecting the output to a file?

      Delete
  7. @Swagat Yeah that's what. It is somewhat counter intuitive at first.

    ReplyDelete