Tuesday, March 3, 2009

How google stores information of a webpage?

Do any one know how Google stores the information about web pages or a perticular site?

I guess it is stores the data in a different way to match the search keyword.

It stores URL, text on page, image names and alt, link text and link title, meta and title tags in a way for faster calculation of relevancy of a search word. Let us suppose it as one unit. There are many units like this (each for one web page) for one domain will be again treated as one unit. Google may maintains a languages dictionary as a separate server.

For a clear understanding assume a library having lot of books with librarian. If we ask librarian a book suppose "health" he may shows us many books and recommends us few books which are popular. Suppose if it is in case of "women health" he may shows us less number of books when compared to "health" keyword because the "women health" is bit specific. Suppose if we ask book about "men mental health" he may recommend us the book name and author also. Because the requirement is more specific. This book may be categorized under "stress control" also. Because few of the pages in the book were written for it.

Lets consider the way of storing the books in the library. He will arrange the books in a way to find it easily. He stores the books in the racks and sticks the labels to the racks. And he sticks small labels for specific areas (sub categories).

The good librarian is who stores the books in a way to access and retrieve them quickly.
All the books are websites and librarians are search engines.

Search engines are just trying to mimic a human behavior. They are developing their algorithms and injecting into the stream.

