The Naked Scientists

The Naked Scientists Forum

Author Topic: How do search engines work ? & how many web pages are there ?  (Read 4147 times)

Offline neilep

  • Withdrawnmist
  • Naked Science Forum GOD!
  • *******
  • Posts: 20602
  • Thanked: 8 times
    • View Profile
It amazes me how search engines bring up results so quickly !

Does it have something to do with these spider-things that they send out ?...how do they know which key words to keep and log ?

Even so, there must be one hell of a lot of web pages/sites out there...(maybe even more than a hundred !  ;) eh ?)

So, how do search engines work ?...and how many pages/sites are out there ?



 

another_someone

  • Guest
There is a lot of work, and a few patents, involved in all of this.

The spiders are the starting pointing point - they just walk through the Internet following every link from one page to another, and so accumulating a huge database of all the web pages that are somehow linked to some other web page.

The next part is trying to work out what the web pages are relevant to, so that when someone enters a query, you have some hope of finding the appropriate page.

In the early days, they just let the web page designer to list within the page the keywords that it wishes to be used to be triggered in searches.  Not surprisingly, web page designers grossly misused this feature, or alternatively just neglected to include the appropriate information.

Next was trying to read what is actually displayed in the page, but this can be deceptive as a word in one context can have a very different meaning to the same word in another context.

What google was innovative in doing was not only looking at what the web page displayed, but looking at what is linked to the web page (on the assumption that web pages that link to each other are probably talking about the same subject) - so if you think two web pages are talking about cars, and are linked to a page about automobiles, you might guess that they are all talking about motor cars; while two pages that talk about cars then link to a page about railway stations, then they are probably talking about railway cars.
 

Offline neilep

  • Withdrawnmist
  • Naked Science Forum GOD!
  • *******
  • Posts: 20602
  • Thanked: 8 times
    • View Profile
THANK YOU George,

I remember the early days...you used to see pages and pages of words to help the search engines find them.

Thank you for your answer.

I assume you have not been able to answer how many pages there are out there because you are still counting them for me....which is very nice of you.

Ta
 

Offline NewBill

  • Jr. Member
  • **
  • Posts: 43
    • View Profile
I'll admit that Google has become my exclusive search engine, discounting those links sent or referred to me personally.  Google uses a page ranking system.   The more pages that link to a given page the higher it's ranking, a popularity contest if you like.

This is a little worrisome in my opinion.  Popularity is a homogenizing process in somewhat the same way that it seems that the only children's book character is Harry Potter.  I should also say that I also like and consume the Harry Potter stories..
 

another_someone

  • Guest
I assume you have not been able to answer how many pages there are out there because you are still counting them for me....which is very nice of you.

What do you mean by a page?

Just look at this forum for instance - what would you consider a page - there are so many different ways of looking at the information.  For instance, one can look at the list of most recent posts from a user, but the same posts can be viewed by topic - are these two different pages.  Then I can select one of my posts to be edited, thus bringing up an edit screen - that is another page.  But, if I had chosen a different post to edit, then that would a different edit page; thus we have at least as many pages as we have posts on this site.

Each of our users on this forum has several pages dedicated to them, this including historic statistics, others including profile information, etc.

Technically, as far as a web browser is concerned, every time it receives information from a different address (the address including the parameters that may be fed to some script or other), it has a different page - yet the range of parameters you can give a web server are almost limitless, even for a single server.

Interestingly, you actually have a problem of recursion, since every time to do a search on a search engine, the search engine generates it own web pages to serve to your browser, and thus creating new web pages.

Ofcourse, not all web pages are publically accessible, and so not all web pages are available to search engines (even on our web site, some pages are available to search engines, whereas others are not).

It terms of the number of pages that are indexed by a search engine (as distinct from the number of pages that exist):

http://seed.scit.wlv.ac.uk/engines.html
Quote
Google developed from a research project at Stanford. It has a notably clean and simple interface and, thanks to a well-thought out ranking scheme, seems to have a happy knack of getting the right answers. Has the largest set of indexed pages, currently around 1.3 109.
« Last Edit: 30/10/2006 00:07:26 by another_someone »
 

Offline NewBill

  • Jr. Member
  • **
  • Posts: 43
    • View Profile


What do you mean by a page?

Web pages bear no relationship to paper.  A single html document is considered a page.  It can print to many pages.

When you click a link you are taken to another HTML document (page).  A web site. like this one, consists of many web documents.  This forum document is assembled from a database of many comments into many pages.  It is settable by the forum software.  Usually the number of comments that are displayed at once is 20 or so. Then you can select the next or previous page.  If you save a given document when you are viewing it you will get a sense of what a web page is.
 

Offline science_guy

  • Hero Member
  • *****
  • Posts: 701
  • I'm right there... inside neilep's head!
    • View Profile
Quote
thus we have at least as many pages as we have posts on this site.

lets see, neilip makes at least 1-third...

I know the answer!  The number of web pages issssss       A LOT.
 

The Naked Scientists Forum


 

SMF 2.0.10 | SMF © 2015, Simple Machines
SMFAds for Free Forums