Search engine

A search engine is a special kind of software system that helps people find information on the Web. When someone wants to look up something, they type a word or question, called a query, into a web browser or a mobile app. The search engine looks through many web pages and other information to give back a list of search results. These results usually include links and short descriptions to help the person find what they need.

A Google search result for the phrase "magic flute opera"

Behind the scenes, a search engine uses many computers in different places around the world, called a distributed computing system. It works quickly because of a process called indexing, which is updated all the time by special programs known as web crawlers. These crawlers go through many web servers to collect information.

Many search engines have existed since the Web began, but one called Google Search became very popular and is still used by most people today. Other search engines include Bing, Yandex, Yahoo!, DuckDuckGo, and Baidu. Because of this, many websites try to appear in search results through a process called marketing and optimization.

History

Further information: Timeline of web search engines

Pre-1990s

In 1945, Vannevar Bush wrote about a system to help people find information. He called it a memex in his article "As We May Think" in The Atlantic Monthly. The memex was meant to make finding information easier as more data grew. Vannevar Bush imagined libraries with connected notes, like the hyperlinks we use today.

Link analysis later became important for search engines through methods like Hyper Search and PageRank.

1990s: Birth of search engines

The first search engines appeared before the web existed in December 1990. WHOIS let users search in 1982, and the Knowbot Information Service began in 1989. The first search engine to look through files was Archie, which started on September 10, 1990.

Before September 1993, the World Wide Web was organized by hand. There was a list of webservers kept by Tim Berners-Lee at CERN. As more web servers appeared, this list could not keep up. On the NCSA site, new servers were listed under "What's New!".

The first tool to search the content of the Internet was Archie. It stood for "archive" without the "v". It was created by Alan Emtage, a student at McGill University in Montreal, Quebec, Canada. Archie downloaded lists of files from public FTP sites, creating a searchable database of file names.

The rise of Gopher in 1991 led to new search tools like Veronica and Jughead. Like Archie, they searched file names and titles in Gopher systems. Veronica allowed keyword searches of Gopher menu titles. Jughead got menu information from specific Gopher servers.

In the summer of 1993, no search engine existed for the web, but many catalogs were kept by hand. Oscar Nierstrasz at the University of Geneva created W3Catalog, the web's first simple search engine, released on September 2, 1993.

In June 1993, Matthew Gray at MIT created the World Wide Web Wanderer, an early web robot. It was used to measure the size of the web until late 1995. The web's second search engine, Aliweb, appeared in November 1993. It relied on website administrators to provide information about their sites.

JumpStation, created in December 1993 by Jonathon Fletcher, was the first tool to combine crawling, indexing, and searching. Because of limited resources, it only indexed titles and headings from web pages.

One of the first search engines to search all text was WebCrawler, released in 1994. It let users search for any word on any web page, which became the standard. Also in 1994, Lycos was launched and grew to be popular.

The first popular web search engine was Yahoo! Search. Started by Jerry Yang and David Filo in January 1994, it began as a Web directory called Yahoo! Directory. In 1995, a search function was added, making it a favorite way to find web pages.

Many search engines appeared after that, competing for popularity. These included Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista. People could also browse directories instead of searching by keywords.

In 1996, Robin Li developed the RankDex algorithm for ranking search results. It was the first to use hyperlinks to judge website quality, before Google's similar PageRank in 1998. Li later used this technology for the Baidu search engine, launched in China in 2000.

In 1996, Netscape planned to feature one search engine but ended up making deals with five: Yahoo!, Magellan, Lycos, Infoseek, and Excite.

Google started selling search terms in 1998, changing the search engine business.

Many search engine companies grew quickly in the late 1990s but were affected by the dot-com bubble that ended in March 2000.

2000s–present: Post dot-com bubble

Around 2000, Google's search engine became very popular. It used an algorithm called PageRank, created by Sergey Brin and Larry Page, the founders of Google. This method ranks web pages based on how many other popular sites link to them.

Yahoo! used Inktomi's search technology until 2002, when it bought Inktomi, and then Overture in 2003. Yahoo! used Google's search until 2004, when it launched its own search using its acquisitions.

Microsoft started MSN Search in 1998 using Inktomi's results. In 1999, it used Looksmart and AltaVista at times. In 2004, Microsoft began using its own technology with its web crawler called msnbot.

Microsoft launched its rebranded search engine, Bing, on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft agreed that Yahoo! Search would use Microsoft's Bing technology.

As of 2019, active search engine crawlers include those of Baidu, Bing, Brave, Google, DuckDuckGo, Gigablast, Mojeek, Sogou and Yandex.

Timeline (full list)
Year	Engine	Current status
1993	W3Catalog	Inactive
	ALIWEB	Inactive
	JumpStation	Inactive
	WWW Worm	Inactive
1994	WebCrawler	Active
	Go.com	Inactive, redirects to Disney
	Lycos	Active
	Infoseek	Inactive, redirects to Disney
1995	Yahoo! Search	Active, initially a search function for Yahoo! Directory
	Daum	Active
	Search.ch	Active
	Magellan	Inactive
	Excite	Active
	MetaCrawler	Active
	AltaVista	Inactive, acquired by Yahoo! in 2003, since 2013 redirects to Yahoo!
	SAPO	Active
1996	RankDex	Inactive, incorporated into Baidu in 2000
	Dogpile	Active
	HotBot	Inactive (used Inktomi search technology)
	Ask Jeeves	Inactive
1997	AOL NetFind	Active (rebranded AOL Search since 1999)
	goo.ne.jp	Active
	Northern Light	Inactive
	Yandex	Active
1998	Google	Active
	Ixquick	Active as Startpage.com
	MSN Search	Active as Bing
	empas	Inactive (merged with NATE)
1999	AlltheWeb	Inactive (URL redirected to Yahoo!)
	GenieKnows	Inactive, rebranded Yellowee (was redirecting to justlocalbusiness.com)
	Naver	Active
	Teoma	Inactive (redirect to Ask.com)
2000	Baidu	Active
	Exalead	Inactive
	Gigablast	Inactive
2001	Kartoo	Inactive
2003	Info.com	Active
2004	A9.com	Inactive
	Clusty	Active, Yippy, previously Clusty, now owns Togoda.com
	Mojeek	Active
	Sogou	Active
2005	SearchMe	Inactive
2005	KidzSearch	Active, Google Search
2006	Soso	Inactive, merged with Sogou
	Quaero	Inactive
	Search.com	Active
	ChaCha	Inactive
	Ask.com	Inactive
	Live Search	Active as Bing, rebranded MSN Search
2007	wikiseek	Inactive
	Sproose	Inactive
	Wikia Search	Inactive
	Blackle.com	Active, Google Search
2008	Powerset	Inactive (redirects to Bing)
	Picollator	Inactive
	Viewzi	Inactive
	LeapFish	Inactive
	Forestle	Inactive (redirects to Ecosia)
	DuckDuckGo	Active
	TinEye	Active
2009	Bing	Active, rebranded Live Search
	Yebol	Inactive
	Scout (Goby)	Active
	NATE	Active
	Ecosia	Active
	Startpage.com	Active, sister engine of Ixquick
2010	Blekko	Inactive, sold to IBM
	Cuil	Inactive
	Yandex (English)	Active
	Parsijoo	Active
2011	YaCy	Active, P2P
2012	Volunia	Inactive
2013	Qwant	Active
2014	Egerin	Active, Kurdish / Sorani
	Swisscows	Active
	Searx	Active
2015	Yooz	Inactive
2015	Cliqz	Inactive
2016	Kiddle	Active, Google Search
2017	Presearch	Active
2018	Kagi	Active
2020	Petal	Active
2021	Brave Search	Active
2021	You.com	Active
2022	Perplexity	Active

Approach

A search engine does three main things all the time:

Search engines collect information by moving from website to website and checking each one. They save important words and details from these pages in a big list. When you type a question into a search engine, it uses this list to find the best matches quickly.

When you search, you usually type just a few words. The search engine knows which websites have those words and shows them to you. It also lets you change your search to get better results. The goal is to show the most helpful pages first, and many search engines also show ads.

Market share

As of January 2022, Google is the most used search engine in the world. Other popular search engines include Bing, Yandex, and Yahoo!. Many other search engines exist but are used by fewer people.

In Russia, Yandex is the leading search engine. In China, Baidu is the main search engine, and Google does not operate there. In Japan, Google is the most used, and Yahoo! Japan is also popular. In South Korea, Naver leads, but Google's use has grown. In Taiwan, Google is the most used search engine.

Search engine bias

Further information: Algorithmic bias

Search engines try to show the best and most popular websites when you search. But sometimes, they show information that is not fair or balanced. This can happen for different reasons.

For example, companies that pay to advertise might appear more often in the search results. Also, some countries have laws that make certain information illegal. Because of this, search engines might not show those websites in those places.

Sometimes, the way search engines are set up can leave out less popular ideas or focus more on websites from certain countries, like the United States. People have also tried to change search results for their own purposes, such as to influence what others think about important topics. Researchers have studied how search engines affect our understanding of subjects like terrorism in Ireland, climate change denial, and conspiracy theories.

Google Bombing is one way people have tried to change what shows up in search results for political, social, or business reasons.

Customized results and filter bubbles

Some people worry that search engines like Google and Bing change what you see based on what you do online. This can make it feel like you only see things that match what you already think. In 2011, a person named Eli Pariser talked about this idea.

Because of this, other search engines like DuckDuckGo were created. These try not to change what you see based on your past searches. Some researchers say there isn’t strong proof that this is a big problem. They found that most people still see many different ideas when they search online.

Religious search engines

Because the Internet has grown a lot in the Arab and Muslim world, some people made special search engines for them. These search engines help users find information that follows Islamic rules, called "halal", and avoid information that does not, called "haram". Examples include ImHalal, which started in 2011, and Halalgoogling, which began in 2013. These search engines use filters to keep out unwanted content.

Other religious search engines exist too, like Jewogle for Jewish users and SeekFind.org for Christian users. These also filter out websites that go against their beliefs.

Search engine submission

When someone creates a website and wants people to find it easily, they can tell a search engine about it. This is called submitting a website. But usually, you don't need to do this because search engines have special programs called web crawlers that find websites on their own.

You can tell a search engine about just one page, like the main page, or you can tell them about your whole website using something called a sitemap. There are a couple of reasons to tell a search engine about your website: if it's brand new and not found yet, or if you've changed it a lot and want it to show up faster in search results. Some tools can tell many search engines at once and also add links to your site, but this might not always be the best idea because it can affect how well your site shows up in searches.

Comparison to social bookmarking

Technology

Archie

The first web search engine was Archie. It was created in 1990 by Alan Emtage, a student at McGill University in Montreal.

Archie worked by collecting lists of files stored on File Transfer Protocol sites. FTP is a way for computers to share files online. Users could visit FTP sites to download files. Archie helped people find files by putting them in a list, so users did not need to know where to look.

Veronica

In 1993, Veronica was made by the University of Nevada. It searched files stored on Gopher, another way to share information online, just like Archie searched FTP files. A similar tool called Jughead also appeared around that time.

The Lone Wanderer

The World Wide Web Wanderer was created in 1993 by Matthew Gray. It was the first robot to move around the web and count how many websites there were. It also wrote down the addresses of websites, creating the first web database called the Wandex.

Excite

Excite started as a project by six students at Stanford University in 1993. They wanted to make searching the Internet easier by studying how words were used together. Excite became a popular search engine in 1995.

Yahoo!

In 1994, David Filo and Jerry Yang, two students at Stanford University, created Yahoo!. It began as a list of web pages they liked. As more people used it, they sorted the pages into categories, making it easy to search. Yahoo! was not a typical search engine because it was first made by hand, but it later added search features.

Lycos

Lycos was made in 1994 by Michael Mauldin at Carnegie Mellon University.

Types of web search engines

Web search engines help people find information online. They do this in three main ways:

Look for content that matches the words a person searches for.
Keep a list of where that content can be found.
Let users search through that list.

There are three main types of search engines. Some use robots, called crawlers, to go around the web and collect information. Others depend on people to add information. Some use both ways.

Crawler-based search engines send out robots to visit websites, read their content, and follow links to other sites. These robots bring back information to a central place where it is organized and stored. They visit websites often to see what’s new.

Human-powered search engines rely on people to add information.

When you search using a search engine, you are searching through a list, not the live web. This is why you might sometimes find links that no longer work — the list has not been updated.

Different search engines can give different results because they use different ways to decide which results are most helpful. They look at how often certain words appear on a webpage and how other pages link to it.

Modern search engines are very complex and use many computers to handle the huge amount of information on the web. Some search engines, like Google Scholar, focus on finding scientific research. Researchers are working to make search engines better at understanding the meaning behind words in articles.

Type	Example	Description
Conventional	librarycatalog	Search by keyword, title, author, etc.
Text-based	Google, Bing, Yahoo!	Search by keywords. Limited search using queries in natural language.
Voice-based	Google, Bing, Yahoo!	Search by keywords. Limited search using queries in natural language.
Multimedia search	QBIC, WebSeek, SaFe	Search by visual appearance (shapes, colors,..)
Q/A	Stack Exchange, NSIR	Search in (restricted) natural language
Clustering Systems	Vivisimo, Clusty, Togoda
Research Systems	Lemur, Nutch