Transcript
du chat avec Matt Cutts et Stephanie Kerebel, Google
Matt
Cutts et Stephanie Kerebel,
Google
On
ne présente plus Google. Ce moteur de recherche
(dont le nom est dérivé du terme "Googol"
qui signifie "10 à la puissance 100") a été
créé en 1998 par deux étudiants
de l'université de Stanford (comme Yahoo!), Larry
Page et Sergey Brin. Il est issu d'un projet de recherche
de l'université. En juin 1999, les deux co-fondateurs
lèvent 25 millions de $ de capitaux et lancent
réellement l'outil qui est aujourd'hui basé
à Mountain View, en Californie. Il est l'un des
grands vainqueurs de l'année 2001, jouant un
rôle majeur dans le domaine de la recherche d'information
sur le web dans le monde.
Matt Cutts, Software Engineer, et Stephanie Kerebel,
Globalization Specialist chez Google, ont répondé
à vos questions le mardi 26 février 2002.
Matt Cutts a rejoint Google en tant que "Software
Engineer" en janvier 2000. Auparavant, il travaillait
à son "Ph.D. in computer graphics" à l'université
de Caroline du Nord à Chapel Hill. Il a publié
des articles sur le "bureau du futur" ainsi que sur
la reconnaissance d'objet dans les images compressées.
Matt a écrit l'applicatif "Safe Search", le filtre
familial utilisé par Google. Il travaille également
pour le Ministère de la Défense américain
et a également étudié des projets
pour une sociétés de jeux vidéo.
Pour en savoir plus sur son parcours : http://www.cs.unc.edu/~cutts/.
Stephanie Kerebel, Globalization Specialist
chez Google, est responsable de la supervision générale
de Google pour les domaines de la traduction et du support
international. Elle s'occupe également des ventes
et des présentations "business", de l'aspect
publicitaire et promotionnel pour maintenir l'image
de la marque "Google" en dehors des Etats-Unis.
Le Chat avec Matt Cutts et Stephanie Kerebel
s'est déroulé (exceptionnellement en anglais)
le mardi 26 février 2002, à 18h00, en
partenariat avec Canalchat.
Voici le résumé de la conversation en
ligne :
Good evening everybody,
we are very happy to welcome Stéphanie Kerebel
and Matt Cutts on the chat !
Hi there, good morning... Bonjour ! Greetings from Google!
Jules Vo-Dinh : What are
the next evolutions which will see Google ?
That's a good question! I think that we'll see more
freshness, depth, and file types. Over the next year,
Google is going to be developing technology for searching
new file types.
espotting: Will Google sell
addwords and sponsored links direct or through deals
with Overture or Doubleclick type of companies ?
Another good question. We have done deals of both these
types in the past.
Jean-Delatour : Is it still
necessary to fill up the meta-keywords ?
I think that we're open to either kind of deal. Let's
see. Google uses meta-keywords, but not as much as most
other search engines. I would still include them, but
don't worry about putting a ton of effort into it.
Charles : What can I do
to improve my Pagerank ?
The best way to improve your PageRank is to make sure
that the people who should know about really do know
about you; make sure that you look for links in the
Open Directory Project, for example.
nanbowles : When I search
on google.fr am I taping into the same database as my
collegue in the Us taping the same search term into
google.com?
We store all the data in one master index. Then we build
a restrict of French language pages and pages in France.
Note that we don't just consider .fr pages. Instead,
we also look for .com pages which are based in France.
That's one reason why our search on google.fr can cover
more pages.
Olivier : When will you
have headquarters in Paris ? I heard that the French
manager is already known ? Is it true ?
We're just a software engineer and internationalization
experts. We don't know everything. I do know that France
is a very high priority for Google. And I think that
France is slated to be the next office that we open
up. I don't think that we can commit to a firm date,
but I would definitely say within the next few months.
France is an important market, and we can't wait to
open an office there !
agedia.com : Did you use
HTML comments to index web sites ?
I believe that we have the ability to index them, but
we usually don't index comments.
novice : Did you plan a
weblog category (http://blog.google.com) or tab in somewhere
near future ?
What a great question. I think Google is moving more
into "freshness" recently. We've started to crawl millions
of pages each day, and our users really like that. I
don't know whether we'll specifically do something special
for weblogs. We'll have to see how many users ask for
features like that; we love reading weblogs, so it might
make sense to index them someday.
Yarnus : Does Google read
"noembed" and "noframes" tags ?
I'm not sure about noembed tags. I know that we do read
content in the noframes section. We index different
text with different weights though.
Thy : Can Google index urls
with a session ID like site.php?sessionID=qlkhfQFfhjelzhfklj45681qfd
?
Google can index anything :) No, seriously, we have
the ability to index pages like this, but I think that
most of the time we try to avoid pages with session
IDs. The reason is pretty simple, actually. You could
have one single page, but it might look like 10 different
pages by the time we see it with many different session
IDs. If you are a webmaster, here's a simple rule of
thumb. Try to make each page look like a static URL.
If each page looks a simple url, then Google is more
likely to index it.
Sté [absoluNet.net]
: I manage a little French web agency, I want you to
tell us more about the popularity link used in Google,
I wonder if the younger site, good and interesting,
haven't yet link, and if Google don't show them for
a good request, they never be include in the index,
and they cannot be knew and so no have link to them?
In general, Google will find a site as soon as it finds
1-2 links to the main web page. We usually recommend
getting at least one link from the Open Directory Project
at dmoz.org. You might also try to find a directory
of sites like yours and try to get a link there. The
nice thing about Google is that it doesn't take a lot
links to a site for us to find you. I know that my home
page only has 4-5 links to it. :-) .
A.Woumblat : What are the
reasons for a url to disappear from the Google database
?
Most of the time, it's because we only have a certain
amount of space to store pages in our index. I wish
that we could catch every page on the web for every
index, but sometimes we have to choose. Usually, when
a site drops out of the index, it's something simple
such as the site was down when we tried to crawl it,
or we ending up choosing a slightly different set of
sites for the next crawl.
allergic : Have you plan
to increase the number of request words Google can handle
in a query ? For now is 10 max but in some case it is
to low !
That's a good question. I'll pass that suggestion on.
We were worried that people might type reeeeeeaaaaaallllly
loooooong queries. But maybe we can find a way to increase
that a little bit. :) I'll let you know if we can do
that. Don't look for it anytime really soon though.
Thy : Is Google able to
read and understand XML databases ?
We have the ability to parse XML documents, but we haven't
decided exactly how we're going to expose that to the
outside world. As you know, Google currently doesn't
charge for submission or for inclusion, and we like
that policy.
Gaetan : What is the part
of popularity in your ranking ?
Let me take this question as a quick reason to explain
a little bit about PageRank. Suppose that I have 10
links to my site, Gaetan. Now suppose that you have
4-5 links to your site. But if the link to your page
is Le Monde and the link says "Gaetan rules!". Then
that link would get more weight than if I just had my
friends link to my home page. In effect, we look at
the number of links that point to your page, which is
like the popularity, but we also look at the quality
of the links. So PageRank finds out that Le Monde and
similar sites are higher quality than just my home page.
:) .
waw : Since few weeks, refreshed
pages stayed only 3 days in database. After 3 days,
Google send again old title. Is it possible to know
why ?
Basically, we do a full crawl of the web at least once
per month. Those pages are guaranteed to stay in our
listings for several weeks. We also do other crawls
for high-quality pages and news pages. We get millions
of those pages each day. However, we don't always crawl
the same set of pages for those sites. We would probably
pick those pages up in our next crawl, however.
Olivier : Do you plan paid
submission in your index, like Fast, Altavista and Inktomi
?
Hee hee. Right now, we like our "don't charge for anything"
plan. Also, that's been very popular with webmasters.
The fact that we don't charge for inclusion/submission/crawling
means that we can search out the pages that we think
are the best. That is what helps to make Google a really
good search engine. So: I hope not anytime soon ! :)
.
Thom : How many people are
working for Google ? And where ?
Good question. I think that we have just a little over
300 people working for Google right now. Most of the
engineers sit all in one building in Mountain View,
California (that's right in the heart of Silicon Valley).
However, we also have many offices for sales all over
the world. Besides sites in the US, we have offices
in the UK, Germany, and Japan. And we can't wait to
be in France! By the way, if you are a fantastic salesperson
or a head of sales at some other search engine company,
feel free to send us your resume. :) We're looking for
good people in many places in Europe and around the
world.
nanbowles : But, how do
you make money?
I love these questions ! We make money in three ways.
1. Straight advertising. Because we have a very pure,
focused, relevant search, plenty of people are willing
to pay to advertise next to our search results. We have
self-service advertising programs that people can set
up in 15 minutes any time of day.2. We sell search services
to other companies, such as Yahoo, Cisco, Netscape,
RedHat, etc. 3. We just introduced an "enterprise" box.
This is like Google in a box--you just plug it in, and
it can index everything inside your company's intranet.
It works fine inside a firewall--it doesn't have to
talk to Google. So we're building up several ways to
make money. It's nice to be profitable. (Sorry to go
on for so long :).
Louis : How many indexes
do you use ? Do you plan to use an index in Europe ?
We have one master index that we serve everything out
of. We classify pages as belonging to different countries
and languages, and then we can search over that smaller
portion of the master index.
Laurent : If my site is
listed in the Open Directory, will it be more rapidly
included in Google ? Same question with Yahoo! and Nomade.fr
in France ?
I think once we see 3-4 links to your site, you should
be in pretty good shape. Getting high quality links
is always your best bet to get noticed by Google.
junkidu : What is the percentage
of click on the advertises next to your search results
?
It's 4-5 times the industry average. I've seen articles
where people got 10% clickthrough by using very specific
phrases in AdWords. But people should expect maybe 2%
or something if they are careful. Here's a quick AdWords
tip: start with very specific keywords to get a high
clickthrough rate and then slowly make the words less
specific to get more traffic with a slightly lower clickthrough.
nanbowles : How many server
on Google's cluster ?
We lose count. :) It's over 10,000 at this point. We're
starting to get a little scared because we think that
one computer has learned how to order other computers.
;) .
alain : Will you sell your
adwords to French portals (like for Earthlink in US)
when you are in France ?
I certainly think that would make a lot of sense. AdWords
is self-service, and the keywords can target French
words already. We hope to translate our interface into
other languages when we can.
waw : There is a big difference
of results between "snowboard" and "snowboards", first
one give results only from general websites, second
one only from company websites, and I think companies
try to appear in the first one results. Keywords are
only ASCII Code or Google is able to understand the
meaning of a keyword?
Right now, Google is a lot like the index in a book.
You look up one specific word, but Google doesn't always
know to suggestion an alternative. I think that over
time, Google will "understand" language more and more.
Already, we've introduced spell-checking in several
languages that can suggest other words if we think that
there was a typo. It makes sense that we'd like to know
more about what people want over time, instead of just
looking up keywords. :) .
Sté [absluNet.net]
: I wonder why when we come on www.google.fr, the default
search isn't on France or francophone but in web?
That's a good question. I think that we don't want to
restrict the search unless people ask for it. I'll pass
that suggestion on to our UI people though, to get their
reaction.
eurienta.com : How does
Google handle words with accents like Noël?
We've introduced features to handle lots of accents.
I think that we have the ability to do accent-sensitive
search without any problems.
cmic : When are you going
to have your IPO?
Hee Hee. We'll cross that bridge when we come to it.
I think we want to make sure that all our ducks are
in a row.
Olivier : What is the difference
between www.google.com, www2.google.com, www3.google.com,
etc. ? Different indexes ?
That's a good question. These servers are for internal
test purposes. They are visible to the world so that
partners can check out features, etc. We don't make
any guarantees about them--personally, I wouldn't use
them myself.
agedia.com : Does Google
index HTML encoding caracters (é ..)?
I believe that we do, but I'm not 100% positive. We
do handle escaping like this for most cases. It's hard
to write a search engine that handles all languages
and character encodings and accents perfectly, especially
for Chinese/Japanese/Korean, but we're trying.
zouriteman : When we submit
a site to you , why you don't give us a automatic feedback
at our e-mail, when your search engine inspect the site
?
We would probably be accused of sending too much email
spam. :) Seriously, Google can crawl millions of pages
each day, so we'd hate to flood webmasters with too
much mail. The one thing that I'd recommend is to use
a web host where you can get access to server logs.
That way you can just check your server logs to see
when Googlebot visited. We always use a clear user agent
so you can tell when Google dropped by. :)
Paul : Is the "comments"
line on the submission form (http://www.google.com/addurl.html)
useful on Google ?
It doesn't hurt to use it. The bots don't look at it,
but if a human is looking at the submit log, then it
can clarify things.
Thank you very much Matt
Cutts and Stéphanie Kerebel, a word to conclude
the chat ?
Merci de l'intérêt que vous portez a Google
! Nous adorons nos utilisateurs français. (That
was Stephanie). :) I was in France in December--I love
your country! Thanks very much! We have to head out,
but thanks very much for letting us be here, and keep
making google.fr the #1 search engine! :) Take care.
Et mille mercis à Matt Cutts et Stephanie
Kerebel pour avoir répondu en direct aux questions
des "chatteurs" !