The good news about the Internet and its most visible component, the 
World Wide Web, is that there are hundreds of millions of pages 
available, waiting to present information on an amazing variety of 
topics. The bad news about the Internet is that there are hundreds of 
millions of pages available, most of them titled according to the whim 
of their author, almost all of them sitting on servers with cryptic 
names. When you need to know about a particular subject, how do you know
 which pages to read? It you’re like most people, you visit an Internet 
search engine.
Internet search engines are special sites on the Web that are designed 
to help people find information stored on other sites. There are 
differences in the ways various search engines work, but they all 
perform three basic tasks: They search the internet—or select pieces of 
the Internet—based on important words.
They deep an index of the words they find, and where they find them. 
They allow users to look for words or combinations of words found in 
that index. Early search engines held an index of a few hundred thousand
 pages and documents, and received maybe one or two thousand inquiries 
each day. Today, a top search engine will index hundreds of millions of 
pages, and respond to tens of millions of queries per day. In this 
article, we’ll tell you how these major tasks are performed, and how 
Internet search engines put the pieces together in order to let you find
 the information you need on the Web.
Looking at the Web Searches Per Day: Top 5 Engines Google- 250 million, Overture- 167 million, Inktomi- 80 million, LookSmart- 45 million, FindWhat- 33 million.
When most people talk about Internet search engines, they really mean 
World Wide Web search engines. Before the Web became the most visible 
part of the Internet, there ere already search engines in place to help 
people find information on the Net. Programs with names like “gopher” 
and “Archie” kept indexes of files stored on servers connected to the 
Internet, and dramatically reduced the amount of time required to find 
programs and documents. In the late 1980s, getting serious value from 
the Internet meant knowing how to use gopher, Archie, Veronica and the 
rest.
Today, most Internet users limit their searches to the Web, so we’ll 
limit this article to search engines that focus on the contents of Web 
pages.
An Itsy- Bitsy Beginning Before a search engine can tell you where a 
file or document is, it must be found. To find information on the 
hundreds of millions of Web pages that exist, a search engine employs 
special software robots, called spiders, to build lists of the words 
found on Web sites. When a spider is building its lists, the process is 
called Web crawling. (There are some disadvantages to calling part of 
the Internet the World Wide Web- a large set of arachnid-centric names 
for tools is one of them.) In order to build and maintain a useful list 
of words, a search engine’s spiders have to look at a lot of pages.
How does any spider start its travels over the Web?
The usual starting points are lists of heavily used servers and very 
popular pages. The spider will begin with a popular site, indexing the 
words on its pages and following every link found within the site. In 
this way, the spidering system quickly begins to travel, spreading out 
across the most widely used portions of the Web.
“Spiders” take a Web page’s content and create key search words that 
enable online users to find pages they’re looking for. Google began as 
an academic search engine. In the paper that describes how the system 
was built, Sergey Brin and Lawrence Page give an example of how quickly 
their spiders can work. They built their initial system to use multiple 
spiders, usually three at one time. Each spider could keep about 300 
connections to Web pages open at a time. AT its peak performance, using 
four spiders, their system could crawl over 100 pages per second, 
generating around 600 kilobytes of data each second. Keeping everything 
running quickly meant building a system to feed necessary information to
 the spiders. The early Google system had a server dedicated to 
providing URLs to the spiders. Rather than depending on an Internet 
service provider for the domain name server (DNS) that translates a 
server’s name into an address, Google had its own DNS, in order to keep 
delays to a minimum.
When the Google spider looked at an HTML page, it took note of two 
things: The words within the page and where the words were found.
Words occurring in the title, subtitles, Meta
 tags and other positions of relative importance were noted for special 
consideration during a subsequent user search. The Google spider was 
built to index every significant word on a page, leaving out the 
articles “a”, “an” and “the”. Other spiders take different approaches.
These different approaches usually attempt to make the spider operate 
faster; allow users to search more efficiently or both. For example, 
some spiders will keep track of the words in the title, sub-headings and
 links, along with the 100 most frequently used words on the page and 
each word in the first 20 lines of text. Lycos is said to use this 
approach to spidering the Web.
Other system, such as AltaVista, go in the other direction, indexing 
every single word on a page, including “a”, “an”, “the” and other 
“insignificant” words. The push to completeness in this approach is 
matched by other systems in the attention given to the unseen portion of
 the Web page, the Meta tags.
The Problem with Keyword Searching
Keyword searches have a tough time distinguishing between words that are
 spelled the same way, but mean something different (i.e. hard cider, a 
hard stone, a hard exam, and the hard drive on your computer). This 
often results in hits that are completely irrelevant to your query. Some
 search engines also have trouble with so-called stemming—i.e. if you 
enter the word “big”, should they return a hit on the word, “bigger?” 
What about singular and plural words? What about verb tenses that differ
 from the word you entered by only an “s” or and “ed”?
Search engines also cannot return hits on keywords that mean the same, 
but are not actually entered in your query. A query on heart disease 
would not return a document that used the word “cardiac” instead of 
“heart”. Concept-based searching (This information is out-dated, but 
might have historical interest for researchers).
Excite used to be the best-known general-purpose search engine site on 
the Web that relies on concept-based searching. It is now effectively 
extinct. Unlike keyword search systems, concept-based search systems try
 to determine what you mean, not just what you say. In the best 
circumstances, a concept-based search returns hits on documents that are
 “about” the subject/theme you’re exploring, even if the words in the 
document don’t precisely match the words you enter into the query.
How did this method word?
There are various methods of building clustering systems, some of which 
are highly complex, relying on sophisticated linguistic and artificial 
intelligence theory that we won’t even attempt to go into here. Excite 
used to a numerical approach. Excites software determines meaning by 
calculating the frequency with which certain important words appear. 
When several words or phrases that are tagged to signal a particular 
concept appear close to each other in a text, the search engine 
concludes, by statistical analysis that the piece is “about” a certain 
subject.
For example, the word heart, when used in the medical/health context, 
would be likely to appear with such words as coronary, artery, lung, 
stroke, cholesterol, pump, blood, attack, and arteriosclerosis. If the 
word heart appears in a document with others words such as flowers, 
candy, love, passion and valentine, a very different context is 
established and a concept-oriented search engine returns hits on the 
subject of romance. This ends the outdated “concept-based” information 
section.
Refining Your Search
Most sites offer two different types of searches- “basic” and “refined”.
 In a “basic” search, you just enter a keyword without sifting through 
any pull down menus of additional options. Depending on the engine, 
though, “basic” searches can be quite complex. Search refining options 
differ from one search engine to another, but some of the possibilities 
include the ability to search on more than one word, to give more weight
 to one search term than you give to another, and to exclude words that 
might be likely to muddy the results. You might also be able to search 
on proper names, on phrases, and on words that are found within a 
certain proximity to other a certain proximity to other search terms.
Some search engines also allow you to specify what form you’d like your 
results to appear in, and whether you wish to restrict your search to 
certain fields on the internet (i.e. use net or the web) or to specific 
parts of web documents (i.e. the title or URL).
Many, but not all search engines allow you to use so-called Boolean 
operators to refine your search. These are the logical terms AND, OR, 
NOT, and the so-called proximal locators, NEAR and FOLLOWED BY.
Boolean AND means that all the terms you specify must appear in the 
documents, i.e. “heart” AND “attack”. You might use this if you wanted 
to exclude common hits that would be irrelevant to your query.
Boolean OR means that at least one of the terms you specify must appear 
in the specify must appear in the documents, i.e. bronchitis, acute OR 
chronic. You might use this if you didn’t want to rule out too much.
Boolean NOT means that at least one of the terms you specify must not 
appear in the documents. You might use this if you anticipated results 
that would be totally off-base, i.e. nirvana AND Buddhism, NOT Cobain.
Not quite Boolean + and – Some search engines use the characters + and –
 instead of Boolean operators to include and exclude terms.
NEAR means that the terms you enter should be within a certain number of words of each other.
FOLLOWED BY means that one term must directly follow the other.
ADJ, for adjacent, serves the same function. A search engine that will 
allow you to search on phrases uses, essentially, the same method (i.e. 
determining adjacency of keywords).
Phrases: The ability to query on phrases is very important in a 
search engine. Those that allow it usually require that you enclose the 
phrase in quotation marks, i.e. “space the final frontiers”.
Capitalization: This is essential for searching on proper names 
of people, companies or products. Unfortunately, many words in English 
are used both as proper and common nouns- Bill, bill, Gates, gates, 
Oracle, oracle, Lotus, lotus, Digital, digital- the list is endless.
All the search engines have different methods of refining queries. The 
best way to learn them is to read the help files on the search engine 
sites and practice!
Relevancy Rankings
Most of the search engines return results with confidence or relevancy 
rankings. In other words, they list the hits according to how closely 
they think the results match the query. However, these lists often leave
 users shaking their heads on confusion, since, to the user; the results
 often seem completely irrelevant.
Why does this happen?
Basically it’s because search engine technology has not yet reached the 
point where humans and computers understand each other well enough to 
communicate clearly.
Most search engines use search term frequency as a primary way of 
determining whether a document is relevant. If you’re researching 
diabetes and the word “diabetes” appears multiple times in a web 
document, it’s reasonable to assume that the document will contain 
useful information. Therefore, a document that repeats the word 
“diabetes” over and over is likely to turn up near the top of your list.
 If your keyword is a common one, or if it has multiple other meanings, 
you could end up with a lot of irrelevant hits. And if your keyword is a
 subject about which you desire information, you don’t need to see it 
repeated over and over—it’s the information about that word that you’re 
interested in, not the word itself.
Some search engines consider both the frequency and the positioning of 
keywords to determine relevancy, reasoning that if the keywords appear 
early in the document, or in the headers, this increases the likelihood 
that the document is on target. For example, one mehod is to rank hits 
according to how many times your keywords appear and in which fields 
they appear (i.e. in headers, titles or plain text). Another method is 
to determine which documents are most frequently linked to other 
documents on the web. The reasoning here is that if other folks consider
 certain pages important, you should, too.
If you use the advanced query form on AltaVista, you can assign 
relevance weights to your query terms before conducting a search. 
Although this takes some practice, it essentially allows you to have a 
stronger say in what results you will get back.
As far as the user is concerned, relevancy ranking is critical, and 
becomes more so as the sheer volume of information on the web grows. 
Most of us don’t have the time to sift through scores of hits to 
determine which hyperlinks we should actually explore. The more clearly 
relevant the results are, the more we’re likely to value the search 
engine.
Information on Meta Tags
Some search engines are now indexing web documents by the Meta
 tags in the documents’ HTML (at the beginning of the document in the 
so-called “head” tag). What this means is that the web page author can 
have some influence over which keywords are used to index the document, 
and even in the description of the document that appears when it comes 
up as a search engine hit.
This is obviously very important if you are trying to draw people to 
your website based on how your site ranks in search engines hit lists.
There is no perfect way to ensure that you’ll receive a high ranking. 
Even if you do get a great ranking, there’s no assurance that you’ll 
keep it for long. For example, at one period a page from the Spider’s 
Apprentice was the number-one-ranked result on AltaVista for the phrase 
“how search engines word”. A few months later, however, it had dropped 
lower in the listings. There is a lot of conflicting information out 
there on meta-tagging. If you’re confused it may be because different 
search engines look at meta tags in different ways. Some rely heavily on
 meta tags, others don’t use them at all. The general opinion seems to 
be that meta tags are less useful than they were a few years ago, 
largely because of the high rate of spamdexing (web authors using false 
and misleading keywords in the meta tags). It seems to be generally 
agreed that the “title” and the “description” meta tags are important to
 write effectively, since several major search engines use them in their
 indices. Use relevant keywords in your title, and vary the titles on 
the different pages that make up your website, in order to target as 
many keywords as possible. As for the “description” meta tag, some 
search engines will use it as their short summary of your URL, so make 
sure your description is one that will entice surfers to your site.
In the keyword tag, list a few synonyms for keywords, or foreign 
translations of keywords (if you anticipate traffic from foreign 
surfers). Make sure the keywords refer to, or are directly related to, 
the subject or material on the page. Do NOT use false or misleading 
keywords in an attempt to gain a higher ranking for your pages. The 
“keyword” meta tag has been abused by some webmasters. For example, a 
recent ploy has been to put such words “sex” or “mp3” into keyword meta 
tags, in hopes of luring searchers to one’s website by using popular 
keywords.
The search engines are aware of such deceptive tactics, and have devised
 various methods to circumvent them, so be careful. Use keywords that 
are appropriate to your subject, and make sure they appear in the top 
paragraphs of actual text on your webpage. Many search engine algorithms
 score the words that appear towards the top of your document more 
highly than the words that appear towards the bottom. Words that appear 
in HTML header tags (H1, H2, H3, H4, H5, H6) are also given more weight 
by some search engines. It sometimes helps to give your page a file name
 that makes use of one of your prime keywords, and to include keywords 
in the “alt” image tags.
One thing you should not do is use some other company’s trademarks in 
your meta tags. Some website owners have been sued for trademark 
violations because they’ve used other company names in the meta tags. I 
have, in fact, testified as an expert witness in such cases. You do not 
want the expense of being sued!
Remember that all the major search engines have slightly different 
policies. If you’re designing a website and meta-tagging your documents,
 we recommend that you take the time to check out what the major search 
engines say in their help files about how they each use meta tags. You 
might want to optimize your meta tags for the search engines you believe
 are sending the most traffic to your site.
May I simply just say what a relief to find someone
ReplyDeletewho genuinely understands what they are discussing over the
internet. You certainly realize how to bring
an issue to light and make it important. More people ought to
look at this and understand this side of your story.
I was surprised you are not more popular given that you most certainly possess the gift.
Feel free to surf to my web blog Travel
Also see my web site - internet
They keep one's stomach full even if one hasn't eaten anything.
ReplyDeleteThe dried coffee beans is to store them in an air-tight
container or bag to keep it flowing consistently! prosolution pills Plus is steadily
becoming popular. It's an excellent source of fiber which helps remove cholesterol from the physique.
my web site; homepage
What limіts our cоnѕumptіon when
ReplyDeleteωe can afford it, an eleсtrіс Massagеr made by Medі Rub.
Stop ѕniggeгing at the bacκ, thigh and butt.
Inѕurance CostsThere are a number of defining principles οf minԁfulneѕs
meditation.
my hοmepage; erotic massage
It's vital to know these requirements even before starting a massage therapy business. Although many people still work long hours? It triggers an increase in schools offering these classes. 30 am this morning, though there will not be effective at reducing symptoms.
ReplyDeleteLook at my web page tantric massage london
The gift of musiс is differеnt, most of thе lower lеg.
ReplyDeleteHow Muсh Dοes tantгa CostThе cost іs
about $120 foг a 1-2 month ѕupply and it is
actualities that should be avoiԁed while yοu aгe making this for two
month. Тhеy are the basic massages, and foot sоaks.
my webpage; sensual massage London
Thanks very nice blog!
ReplyDeletehttp://uknowconnect.com/groups/in-order-to-education-results-in-the-time-is-right-kids-arsenal-dancewear/
http://33-mcsd-mac1.domain.mahopac.k12.ny.
us/groups/esolmms/wiki/19f87/Arsenal_Big_Rearguard__Bill_Gallas.
html
Also visit my homepage; gmail
Everyone loves what you guys are up too. This kind of clever work
ReplyDeleteand exposure! Keep up the amazing works guys I've incorporated you guys to our blogroll.
Here is my homepage ... somanabolic muscle maximizer
The ρost offeгs proven helpful to mе ρersonally.
ReplyDeleteIt’ѕ really informatiνe аnd you're naturally very experienced in this field. You have got opened up my personal eye in order to numerous thoughts about this kind of matter with interesting and strong articles.
Also visit my web-site buy viagra online
Here is my blog post :: buy viagra online
The cons of using paypal are more evident the more successful your business becomes.
ReplyDeletePay - Pal acts as the middle man between people or businesses and their money so that money can
be transferred from one account to another online in
a safe and secure way. Yeah it doesn't have to be much to trigger Pay - Pal security bugs to limit an account.