Rohit Banga's Blog

Thursday, January 12, 2012

Programming Trivia: Fun with Fork

What is the output of the following program?

// fork.cpp

#include <cstdio>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main() {
int x = 1;
while (x <= 100) {
if (fork() != 0)
break;
printf("%d\n", x);
x = x+1;
}
waitpid(-1, &x, 0);
return 0;
}

Let us say I run the program as

$ ./fork

How about

$ ./fork > file.txt

Think hard before actually running the program ... the output had me stumped for a while until I realized I was wrong!

Feel free to discuss the solution in comments.

Update:
Time for Solution. So I assume that the output for the first case is clear. If not Rob explains it nicely in the comments below. What happens when we redirect the output to a file?

printf() is a C library function that uses buffering on top of the system call write() to optimize the number of expensive calls to the kernel. The C library would be dumb if it made a call to the kernel for every character output. So what is the size of the buffer? It depends ... If the output file descriptor is a terminal then it is line buffered, that is, the buffer is flushed after a new line is seen. However if the output is being sent to a file, then it is block buffered that is the size of block transfers between disk and main memory. So once we redirect it to the file the '\n' in the printf() call does not flush the buffer. Where is the buffer located ... In the process's address space of course. On invoking fork, the child process receives a copy of the parent process's address space. This includes the partially filled buffer containing the output of printf(). Now this child process calls printf() and appends to this buffer. Before exiting all buffers are flushed so the child process sends its own output appended to its' parent's output. Voila we have the output of the parent process appearing twice in the output file. This effect is chained across forks ... !

One must not depend on this buffering to implement any functionality in a program though. However flushing stdio buffers before a fork could be a harmless (and often beneficial) addition to your code!

Tuesday, January 10, 2012

Private Tweet Channels - A Revenue Channel

Every time I see a website offering awesome service to millions of users I wonder what their revenue model is. I have been surprised (at least initially) after seeing Google, Facebook, Twitter, Stackoverflow and also Quora. It takes an amazing amount of engineering and money to scale to that large a user base. Every amateur when sees these free services for the first time would wonder - really this is for free. After having seen a bunch of these sites one becomes used to the reality that if the site succeeds then they will figure out a way to make money. Websites typically use a combination of advertisements and premium services offerings to make money.

While advertisements are a common way of making money it cannot be sufficient. I am not an economist but I feel so because every company will invest only a fixed amount in advertisements. If more and more awesome websites start making revenue out of advertisements then the advertising revenue would be divided among all players - say Facebook, Google, Twitter, Stackoverflow. Fiercer competition will ensue on customized advertising engaging the research community to come up with state-of-the-art algorithms to target potential customers. The bottomline is that among competition, customer spending limitations and advertisement revenues upper limits, it is likely that advertisements alone are not sufficient. An economist may give counter arguments and I would be glad to hear those opinions as well.

That said it is imperative for companies to build up premium services to build up an alternate source of revenue. Google has Apps for enterprises, paid storage and I guess some form of enterprise search and many more things will follow. Twitter has started to offer promoted tweets but that is a way of marketing products and businesses - again advertising based revenue. Stackoverflow has StackApps which could be used to build revenue - I am not sure if it already is making non-advertising revenue directly. Facebook must be having plans for enterprise social networks much like Salesforce's Chatter.

A new possibility exists with the way twitter is being used - building up Private Tweet Channels much like Stackoverflow allows QA sites focused on a particular topic. Very often twitter is used by businesses as a medium to ask live questions over the live chat shows, live tweets are streamed on TV channels. This video gives a feel of the use case.

Twitter can really work out premium services for voting campaigns initiated by television channels. Private Tweet Channels to use within an enterprise or amazing analytics for tweets could easily work out as a saleable service. This is more so the case because of the already established popularity of Twitter. Also TV channels would be willing to pay twitter an extra amount only if irresistible features are offered in the premium pack.

Googling for Private Tweet Channels I came across the following video. It seems like a hack to support private tweet groups on twitter or to better phrase it Jugaad.

Friday, January 6, 2012

What goes into the cloud and what runs on my browser?

I previously blogged about Google Chrome. There are both advantages that users will enjoy and drawbacks of using the Cloud based OS. The most pressing advantage can be the ability to prevent software piracy. A user can be made to pay for logging in to the OS and using any of the services that reside in the cloud. Updates to cloud based systems, upgrades to the data formats used to store files on Google Docs can all happen transparently. Not to miss the ability to access documents from anywhere.

At the same time we are seeing the advent of high performance desktop/mobile processors and fast graphics cards - so how to utilize the most of that performance on laptops and netbooks when everything is moving to the cloud. The browser sure will consume an awesome amount of memory and CPU cycles specially for people like me who have a 100 tabs open in Chrome most of the time.

As an aside I hate to see my browser crashing ... i feel the industry as a whole (both software and microprocessor) should seriously consider the browser performance bottleneck.

It makes sense to run computationally intensive tasks using the graphics card on my PC while move the tasks involving "data" to the cloud. It does not matter if FastKat2 runs on my PC or on the cloud as long as it runs fast. But it does matter if I use online document managers (Google Docs) or process/store them on my PC. Here data is involved.

While the earliest computers involved a dumb terminal accessing a mainframe the new generation of PCs is going to see a hybrid of those and the desktops of today. A first design decision for any major application would be what part runs on the cloud and what runs on my browser?

It should not be surprising if programming models come up which make allow easy engineering of code that separates the cloud portion from the browser portion. Well there are many such models in the parallel programming world ... guess what they could be adapted to suit segregation of code for cloud and browser.

Have a nice weekend!

Tuesday, December 20, 2011

A Possible Extension to Gmail's Undo Send Feature

Gmail introduced this really cool feature that many find useful. It allows you to undo the email that you sent by mistake. It is like allowing you to put the bullet back into gun after the shot has been fired. But the catch here is that you can undo a message only within a time window of 10 seconds of clicking on send. Gmail does not actually send the message but stores it on its servers for the first 10 seconds during which you have an option to reconsider the contents, recipients list etc.

A way to extend this functionality could be to allow a much longer time window for undoing the send. As long as the recipient is a gmail user and has not yet checked his mailbox (that includes access using POP3, IMAP or forwarding to a non-gmail address) I guess it is fair to allow the user to undo the message. No one has actually seen the message. It is between gmail and me. Neither the recipient nor the cool desktop/mobile apps have checked the inbox so what's the harm ... although it does encourage typing sloppy messages on the first go (but that is loosely equivalent to sloppy code being written due to better debugging tools available).

Though I agree it can be a convoluted and risky feature to implement correctly or for that matter explain to the users. But wouldn't it be cool to just drop a letter in the mail box of your friend while he/she is on vacation and pick it up before he/she gets back from vacation. What do you think?

Also imagine the prospect fooling the eavesdropper on the network that an email with the "contents" has been sent while actually undoing it later using a different network (that you know is not being sniffed).

Thursday, September 1, 2011

MPI Introduction

Sunday, October 3, 2010

Twitter as a Resource Proxy

Twitter is one intriguing medium that constrains you to a mere 140 characters while continuing to draw millions of users worldwide. Twitter sports a high signal-to-noise ratio with users constrained to write pithy messages.

This limitation imposed by twitter somehow attracts more and more users to announce lots and lots of things in the terse messages. With Facebook leading the social networking revolution (I presume it is), it is imperative for twitter to look for other options to make itself more desirable for users.

If you have not seen the new twitter you may want to read this . Here's what is worth observing. Among other things twitter allows you to view pictures and videos posted on a different website within your twitter page.
Forget visiting twitpic and yfrog. All the content will be accessible from within twitter. If you tagged your tweets with the location then a google maps image will be inlaid along with the tweet. That is UI Magic.

The distinguishing trait of twitter which attracts people will still not be violated. It is only that in 140 characters you will be expected to write information enough to identify the source of information.

While currently only pictures and videos are supported, it is likely that in the future twitter will allow you to render custom html within a frame (or whatever) on the twitter home page. Thus you could post entire blogs without being bogged down by the 140 char limit. It might also be possible to decorate content before rendering it on twitter.

Facebook currently generates a thumbnail view of the link that you post. But what I am talking about is completely different. Viewing the content within the twitter page ...
But wait isn't this what Google Reader does. Only thing is that Google Reader is a reader first and then a social networking portal.

With Google Chrome trying to the move all your computing needs to the cloud making your PC a mere viewer, twitter could go one step further by making its website the one stop tab (within chrome of course) to collate information from various sources and relate tweeple with the information.

To sum it up, looking forward it appears twitter will strategize by acting as a proxy for content hosted on remote servers without actually burdening its own servers.

Write an SMS/Tweet ... 3 Marks

At school, I remember that of the many types of writing exercises that we practised one of them was - Writing a Telegram. Mostly we wrote letters and essays, writing a telegram did form a small part of the curriculum.

However with the advent of the Internet and Mobile Phone revolution in India things like telegrams are so passé.
Most people in India currently use SMS to communicate short messages. I am not sure if the school curriculum has been adapted to account for these developments.

Imagine questions like

"Write SMS to tell your boss that you will be late for the meeting." (2 Marks).

"Compose a tweet to announce the launch of your new social networking website. Invent details to market your website. (Character Limit: 140)" (3 Marks).

It would be quaint at first sight but only meaningful to include such questions in the exam in place of those concerning the snail mail.