Search Engine Optimize (SEO) an AJAX or Web 2.0 Site

Search Engine Optimize (SEO) an AJAX or Web 2.0 Site

One of the three major pillars of Search Engine Optimization is a website’s content, and onsite content optimization. All of the major search engine ranking algorithms have components that relate to the content that is contained on the website. Typically these components relate to Keyword Densities, number of words, content location, and sometimes age of content. In regards to the code that the content is contained in that falls under the topic of structure and not content, and will not be discussed in this article.

Asynchronous JavaScript and XML (AJAX) is an advanced web development method which can be used to create more responsive and interactive dynamic websites. AJAX accomplishes this by making object request calls back to the web server without having to refresh your browser, these object calls are then processed and are typically used to update the content of the page on your website that is currently being viewed. For the sake of this Article I’m going to ignore the XML component of AJAX as the search engines never view any of the XML data. Websites that use Javascript to manipulate content without using AJAX will also suffer from the issues described.

When a search engine sends out a robot / spider to visit your website with the goal of indexing your content it is only looking at what is being presented in the Markup Language. Generally a search engine does not behave like a user when indexing your website, it doesn’t click buttons or links it simply makes note of URLs associated with each page then individually then visits these pages to index them. This largely goes against the goal of AJAX which is to have as few pages as possible by interacting with the web server in a smarter method as the users interact with the website.

To put the last paragraph simply any content that is changed via AJAX or Javascript on a webpage that is not hardcoded in a page won’t be cached by the search engines. This essentially means that if you have great content that the search engines may love but you’re using AJAX you may be missing out on traffic. There are two approaches to rectifying these which may even give you an advantage over sites that don’t utilize Javascript / AJAX.

The first approach is to make sure that your website degrades to normal flat markup language for non javascript capable browsers and search engines. Essentially every time you would have used an AJAX call make sure you have a page with the same content. Unfortunately for a lot of people this could mean a lot of work, for those individual using a database with PHP or ASP it is not too hard to build a site that builds itself with some effective web programming.

The second approach is to use AJAX in a more minimalist fashion. The goal here is to present the search engines with your optimized content while making sure that any AJAX calls a user would do has no bearing on what you want the search engines to see. In fact this can be used to remove content from your website which may negatively affect your rankings such as testimonials. I’ve seen very few testimonials that actually do good things for a sites keyword density, I’ve even been known to optimize testimonials on client’s websites. With Javascript / AJAX you could insert a random testimonial into a page and therefore not affecting that pages keyword density. The only downside to this approach is that some offsite keyword density tools actually use Web Browser rendering engines so they may get false results as it takes the Javascript into account.

Now you may think that I’m anti AJAX from everything that I’ve said, but there are times and places for AJAX, provided it doesn’t affect how the search engines see your beautiful relevant content your trying to rank. AJAX is great to use for Member sections of your website, interactive forms, slideshows, and a lot more it just needs to be leveraged correctly to avoid missing out on search engine visitors. The final thing to keep in mind is that most search engines like to see more than a single page website which many AJAX website appear to be, always strive for at least 5 or more indexable pages as internal links and anchor text can have a lot of value.

Post from: SiteProNews: Webmaster News & Resources

Is social media backlinks really worth?

Is social media backlinks really worth?

Social networking profiles

One of the fastest ways to build backlinks is to register on social media sites with high PageRank: Facebook, Xing, LinkedIn, MySpace, Ecademy, Twitter etc. These sites allow you to set up a user profile with information about you and your company, including a link to your website. The only catch is that not all of these sites’ links are ‘do-follow’ – which means your site will not always receive the ‘link juice’.

Links in Twitter posts

If you place a link to a web page in your Twitter post, keep in mind that all of Twitter’s outbound links are ‘no-follow’. Google and Yahoo do not pass the Trust or PageRank power through the ‘no-follow’ links. Thus Twittering has limited value for your site.

However, such links do have some value. In May 2009, we witnessed the launch of the Topsy site, a technology that transforms Twitter links into a searchable database. Topsy makes it possible for users to search for information (relevancy is determined based on the number of re-tweets). So, any Twitter link now has a chance to be found and followed by Topsy visitors. Remember, increased traffic is the main goal!

Social bookmarks forever?

Is it possible to get permanent links from social bookmarks? Well, yes and no. Most social bookmarking sites will retain your bookmark until it’s popular. But as your bookmark loses popularity over time, it will be moved into the archives. Ideally, a permanent link should stay on the same page with approximately the same PR forever, but in reality, most social bookmarking websites remove links after some time.

Nevertheless, social bookmarking sites are valuable for other reasons. If you have a quality article that becomes popular on social bookmarking sites, people will link to that article in their blogs, and post ‘do-follow’ links on forums.

You can search the Web for the keywords ‘do-follow social bookmarking sites’ to find the latest lists. Networkers have also created services like socialposter.com or socialmarker.com for automatic submission to bookmarking sites.

The truth about blog comments

Blog commenting is probably the most popular – and in many cases, most irritating (because of spammers) – technique of getting permanent links.

The Google PageRank algorithm implies that the more outbound links there are on a page, the less authority or power this page can pass to each of those links. That’s because the page’s PR is distributed evenly between the outbound links. If a webmaster wants to add an outbound link, but doesn’t want Google to follow that link or for PR power to be passed on to the linked page, then that webmaster has to add the ‘nofollow’ attribute to the link. Many bloggers do so to prevent their PR from flowing to the pages cited by commenters. However, this practice is no longer encouraged by Google.

A few weeks ago Matt Cutts blogged about a change in the PR algorithm concerning Google’s approach to passing PageRank through the links with the ‘nofollow’ attribute. Although no PageRank and anchor texts are passed through such links, they are also counted when sharing the outgoing ‘link juice’. The only difference is that it’s neither passed to the linked site nor kept on the page. This means Google disapproves of the practice of using the rel=nofollow attribute for the purpose of not sharing PageRank.

If you own a blog, setting up the ‘nofollow’ attribute to all comments means conserving your blog’s ‘link juice’ and getting fewer comments. The ‘do-follow’ principle can lead to more spam, but it’s a good way to attract webmasters to your site. On the other hand, if you are a webmaster trying to obtain more links by commenting on blogs, don’t rely solely on this method of link popularity improvement. Use a combination of methods, including the time-proven ways of press releases, articles and site submissions to relevant lists and directories, and the newer techniques for site promotion in social media.

Center’d Gets A Facelift, Introduces Semantic Analysis For Smarter Local Activity Guide

Center’d, a local activity guide headed by former Yahoo Local GM Jennifer Dulski, is getting a major upgrade today. Alongside a completely revamped homepage, the site is launching a reworked search engine that it says should outperform the keyword searches found on most other local sites.

Center’d has compiled a database of around 1 million entries for various activities, each of which is categorized into a number of intent-based classifications. To do this, the site has spidered through the web analyzing ‘conversations’ taking place around each entry, taking context into account to determine if a review or comment is positive or negative. It then maps out the results in bar graphs, as seen below. Dulski says that this kind of semantic analysis is better than standard keyword search, and it helps eliminate inaccurate matches – for example it would prevent a review that said “this place is not for kids” from appearing under a query for restaurants “for kids”.

Using this database, the site can also generate city guides for users with a variety of different criteria (for example, you could generate a guide for San Francisco with romance in mind, or you could create one that would take you through the city on the cheap). The site is launching with support for twelve cities intitially, with plans to ramp up to more in the near future. Dulski says that these guides are mostly-automated (which will help it scale), though there is some editor control involved.

Center’d emerged about a year ago from the ashes of Fatdoor, a social network for neighbors. Until now its primary focus has been to serve as a local search engine and event planning site, and now it’s adding a new goal to that list: helping people figure out what to do with their day. Dulski says that many people have been coming to the site to find something to do, without anything in particular in mind. As with the city guides, users can select from a variety of criteria like ‘cheap’ or ‘for kids’, and ask the site to generate a list of possible activities.

In the next few weeks Center’d will also be deploying its suggestion engine to the iPhone, with a mobile application that will allow users to generate a day-long itinerary based on the amount of money they’re able to spend and the type of activities they’d like to persue.

Latent semantic Indexing

Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular Value Decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.[1]

Contents

Called Latent Semantic Indexing because of its ability to correlate semantically related terms that are latent in a collection of text, it was first applied to text at Bell Laboratories in the late 1980s. The method, also called Latent Semantic Analysis (LSA), uncovers the underlying latent semantic structure in the usage of words in a body of text and how it can be used to extract the meaning of the text in response to user queries, commonly referred to as concept searches. Queries, or concept searches, against a set of documents that have undergone LSI will return results that are conceptually similar in meaning to the search criteria even if the results don’t share a specific word or words with the search criteria.

LSI overcomes two of the most severe constraints of Boolean keyword queries: multiple words that have similar meanings (synonymy) and words that have more than one meaning (polysemy). Synonymy and polysemy are often the cause of mismatches in the vocabulary used by the authors of documents and the users of information retrieval systems.[2] As a result, Boolean keyword queries often return irrelevant results and miss information that is relevant.

LSI is also used to perform automated document categorization. In fact, several experiments have demonstrated that there are a number of correlations between the way LSI and humans process and categorize text.[3] Document categorization is the assignment of documents to one or more predefined categories based on their similarity to the conceptual content of the categories.[4] LSI uses example documents to establish the conceptual basis for each category. During categorization processing, the concepts contained in the documents being categorized are compared to the concepts contained in the example items, and a category (or categories) is assigned to the documents based on the similarities between the concepts they contain and the concepts that are contained in the example documents.

Dynamic clustering based on the conceptual content of documents can also be accomplished using LSI. Clustering is a way to group documents based on their conceptual similarity to each other without using example documents to establish the conceptual basis for each cluster. This is very useful when dealing with an unknown collection of unstructured text.

Because it uses a strictly mathematical approach, LSI is inherently independent of language. This enables LSI to elicit the semantic content of information written in any language without requiring the use of auxiliary structures, such as dictionaries and thesauri. LSI can also perform cross-linguistic concept searching and example-based categorization. For example, queries can be made in one language, such as English, and conceptually similar results will be returned even if they are composed of an entirely different language or of multiple languages.

LSI is not restricted to working only with words. It can also process arbitrary character strings. Any object that can be expressed as text can be represented in an LSI vector space.[5] For example, tests with MEDLINE® abstracts have shown that LSI is able to effectively classify genes based on conceptual modeling of the biological information contained in the titles and abstracts of the MEDLINE citations.[6]

LSI automatically adapts to new and changing terminology, and it has been shown to be very tolerant of noise (i.e., misspelled words, typographical errors, unreadable characters, etc.).[7] This is especially important for applications using text derived from Optical Character Recognition (OCR) and speech-to-text conversion. LSI also deals effectively with sparse, ambiguous, and contradictory data.

Text does not need to be in sentence form for LSI to be effective. It can work with lists, free-form notes, email, Web-based content, etc. As long as a collection of text contains multiple terms, LSI can be used to identify patterns in the relationships between the important terms and concepts contained in the text.

LSI has proven to be a useful solution to a number of conceptual matching problems.[8][9] The technique has been shown to capture key relationship information, including causal, goal-oriented, and taxonomic information.

The Above mentioned article has been choosen from wikipedia and this is for informative purpose only

Latent Semantic Analysis Tutorial II

Gathering Titles

In order to give an example of Latent Sematic Analysis, I went to Amazon.com, searched on “investing”, and took the top 10 books titles that were displayed. One of those book titles only had one index word so I dropped it. Here are the other 9 titles with the index words underlined. To be an index word, the word must occur in 2 or more titles, and not be a “noise” or stop word such as “the”, “to”, “of”, etc.

  1. The Neatest Little Guide to Stock Market Investing
  2. Investing For Dummies, 4th Edition
  3. The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns
  4. The Little Book of Value Investing
  5. Value Investing: From Graham to Buffett and Beyond
  6. Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not!
  7. Investing in Real Estate, 5th Edition
  8. Stock Investing For Dummies
  9. Rich Dad’s Advisors®: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss

Latent Semantic Analysis Tutorial

Latent Sematic Analysis or LSA is a way of finding patterns among a collection of documents such as web pages. It is increasingly used by major search engines, such as Google, in ranking websites and determining what AdSense ads to show on a page.

To see how Latent Semantic Analysis works, imagine that you have a collection of documents such as web pages, or in the simple example we show below, book titles. How would you go about finding similarities and differences between the documents?

The Basic Idea

One way is to form a large matrix with each column representing a document, and each row representing a word that has been extracted from the documents. Then, each cell of the matrix is simply the number of times that word appears in that document. For example, if the word “farm” appears in the first document 7 times, then that cell would have a 7 in it. Each cell is simply a count of the number of times that word appears in that document.

The cell numbers are usually massaged, so that whatever patterns are present can be seen more clearly. This step corresponds to “cleaning the data” so that, for example, frequent words are not weighted too heavily, some natural language constructs are simplified, and long documents don’t have an unfair advantage. Some ways that cell numbers are massaged are:

  • Log of Counts – the log of the counts in each cell may be used instead of the actual counts.
  • Stemming - related words may have the same root word and should be considered the same (such as golf and golfing).
  • TF-IDF – (term frequency – inverse document frequency) attempts to measure the importance of a term or word.
  • Entropy – another way to measure term importance based on the distribution of the term through documents.
  • Normalization – sets each document vector to length 1 so documents with more words don’t have an unfair advantage.