Rohit Banga's Blog: January 2012

Thursday, January 12, 2012

Programming Trivia: Fun with Fork

What is the output of the following program?

// fork.cpp

#include <cstdio>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main() {
int x = 1;
while (x <= 100) {
if (fork() != 0)
break;
printf("%d\n", x);
x = x+1;
}
waitpid(-1, &x, 0);
return 0;
}

Let us say I run the program as

$ ./fork

How about

$ ./fork > file.txt

Think hard before actually running the program ... the output had me stumped for a while until I realized I was wrong!

Feel free to discuss the solution in comments.

Update:
Time for Solution. So I assume that the output for the first case is clear. If not Rob explains it nicely in the comments below. What happens when we redirect the output to a file?

printf() is a C library function that uses buffering on top of the system call write() to optimize the number of expensive calls to the kernel. The C library would be dumb if it made a call to the kernel for every character output. So what is the size of the buffer? It depends ... If the output file descriptor is a terminal then it is line buffered, that is, the buffer is flushed after a new line is seen. However if the output is being sent to a file, then it is block buffered that is the size of block transfers between disk and main memory. So once we redirect it to the file the '\n' in the printf() call does not flush the buffer. Where is the buffer located ... In the process's address space of course. On invoking fork, the child process receives a copy of the parent process's address space. This includes the partially filled buffer containing the output of printf(). Now this child process calls printf() and appends to this buffer. Before exiting all buffers are flushed so the child process sends its own output appended to its' parent's output. Voila we have the output of the parent process appearing twice in the output file. This effect is chained across forks ... !

One must not depend on this buffering to implement any functionality in a program though. However flushing stdio buffers before a fork could be a harmless (and often beneficial) addition to your code!

Tuesday, January 10, 2012

Private Tweet Channels - A Revenue Channel

Every time I see a website offering awesome service to millions of users I wonder what their revenue model is. I have been surprised (at least initially) after seeing Google, Facebook, Twitter, Stackoverflow and also Quora. It takes an amazing amount of engineering and money to scale to that large a user base. Every amateur when sees these free services for the first time would wonder - really this is for free. After having seen a bunch of these sites one becomes used to the reality that if the site succeeds then they will figure out a way to make money. Websites typically use a combination of advertisements and premium services offerings to make money.

While advertisements are a common way of making money it cannot be sufficient. I am not an economist but I feel so because every company will invest only a fixed amount in advertisements. If more and more awesome websites start making revenue out of advertisements then the advertising revenue would be divided among all players - say Facebook, Google, Twitter, Stackoverflow. Fiercer competition will ensue on customized advertising engaging the research community to come up with state-of-the-art algorithms to target potential customers. The bottomline is that among competition, customer spending limitations and advertisement revenues upper limits, it is likely that advertisements alone are not sufficient. An economist may give counter arguments and I would be glad to hear those opinions as well.

That said it is imperative for companies to build up premium services to build up an alternate source of revenue. Google has Apps for enterprises, paid storage and I guess some form of enterprise search and many more things will follow. Twitter has started to offer promoted tweets but that is a way of marketing products and businesses - again advertising based revenue. Stackoverflow has StackApps which could be used to build revenue - I am not sure if it already is making non-advertising revenue directly. Facebook must be having plans for enterprise social networks much like Salesforce's Chatter.

A new possibility exists with the way twitter is being used - building up Private Tweet Channels much like Stackoverflow allows QA sites focused on a particular topic. Very often twitter is used by businesses as a medium to ask live questions over the live chat shows, live tweets are streamed on TV channels. This video gives a feel of the use case.

Twitter can really work out premium services for voting campaigns initiated by television channels. Private Tweet Channels to use within an enterprise or amazing analytics for tweets could easily work out as a saleable service. This is more so the case because of the already established popularity of Twitter. Also TV channels would be willing to pay twitter an extra amount only if irresistible features are offered in the premium pack.

Googling for Private Tweet Channels I came across the following video. It seems like a hack to support private tweet groups on twitter or to better phrase it Jugaad.

Friday, January 6, 2012

What goes into the cloud and what runs on my browser?

I previously blogged about Google Chrome. There are both advantages that users will enjoy and drawbacks of using the Cloud based OS. The most pressing advantage can be the ability to prevent software piracy. A user can be made to pay for logging in to the OS and using any of the services that reside in the cloud. Updates to cloud based systems, upgrades to the data formats used to store files on Google Docs can all happen transparently. Not to miss the ability to access documents from anywhere.

At the same time we are seeing the advent of high performance desktop/mobile processors and fast graphics cards - so how to utilize the most of that performance on laptops and netbooks when everything is moving to the cloud. The browser sure will consume an awesome amount of memory and CPU cycles specially for people like me who have a 100 tabs open in Chrome most of the time.

As an aside I hate to see my browser crashing ... i feel the industry as a whole (both software and microprocessor) should seriously consider the browser performance bottleneck.

It makes sense to run computationally intensive tasks using the graphics card on my PC while move the tasks involving "data" to the cloud. It does not matter if FastKat2 runs on my PC or on the cloud as long as it runs fast. But it does matter if I use online document managers (Google Docs) or process/store them on my PC. Here data is involved.

While the earliest computers involved a dumb terminal accessing a mainframe the new generation of PCs is going to see a hybrid of those and the desktops of today. A first design decision for any major application would be what part runs on the cloud and what runs on my browser?

It should not be surprising if programming models come up which make allow easy engineering of code that separates the cloud portion from the browser portion. Well there are many such models in the parallel programming world ... guess what they could be adapted to suit segregation of code for cloud and browser.

Have a nice weekend!

Rohit Banga's Blog