8 comments

  1. This is a great primer on search and its problems and ways to think about improving it. I’m curious if you’ve had any success with mapping specific problems you’ve observed back to one of the “5 variables”?

  2. Great article. My company focuses on a complementary area – providing end users with a user friendly GUI, sitting “on top of” an OWL ontology – where “underneath” that, there are custom mappings to any/all corporate structured data, be that in Oracle, or any other rdbms, and other data files/flat files, excel spreadsheets etc, and data out on the web connected through web services. User Friendly GUI->OWL->SPARQL->RDF->Distributed Data. http://www.semanticdiscoverysystems.com

  3. Richard,

    Yes, we have been able to map problems in search instances to the specific variables discussed above.

    For example, earlier this year we conducted a search log analysis to evaluate the quality of results returned for the most common searches. A large number of queries needed to be discarded because we couldn’t figure out what the user wanted to find (the phrasings were commonly vague or oblique). That’s a user input problem; it would be unreasonable to expect that a computer would interpret the user’s intentions better than a human being could.

    For queries where we were confident we understood what the user wanted, large numbers of seemingly irrelevant results were very common. The causes vary from query to query and result to result, and involve a complex interplay of content, index, and engine. But in many cases, a significant contributing factor could be found: the page had no metadata, the navigation was indexed, and so on. Again, any shortcomings of the engine cannot be easily resolved without disrupting well-performing queries, so the problem really has to be seen as one of the implentation.

    Problems in the results display are easily observed in user testing. I’ve frequently seen the search engine return the right result, but the user skips over it because the meta description has no relationship to the query. It’s terribly tragic, because everything has performed exactly as it should but the user has no reasonable basis for knowing it.

    I’ve seen these and other issues across a large set of evaluations, and they consistently map to one or more of the five variables.

  4. James,

    I imagine you’re referring to the section on ontology, and you’re quite right that it’s worth stating the task is nontrivial. That said, there are engine products that offer prebuilt upper and domain-specific ontologies, as well as a growing number of independently built resources available for purchase or public use. Modifying existing work would be much easier than starting from scratch.

  5. Great article John. Especlally aorund user needs. many times with too much focus on getting technology to work we forget about the importance of delivering usefulness.

    The use of meta tags for search engines is a little out-dated.

    Most algorythmic search technologies (such as Google) place little value in the use of standard (or non-standard meta tags).

    This is actually very positive and valid as content value is actually indexed on what the content ‘actually says’ (and references /links to etc) rather than what the author ‘says it says’.

    I still work with content managers that think a well-stuffed meta keywords list will help their ‘searchability’ when in many cases it does nothing or actually works against them.

    Indeed Google doesn’t use meta keywords at all anymore and the description tag is used only for result displays only.

    Organisations would get far more value out of search by good valid mark-up code (as you’ve suggested) and well structured and organised user-focused content (i.e. using the language the users use for search terms), rather than relying on meta tags.

    Generally meta tags are more useful for reference and overall orgsnaistion purposes for content authors rather than search engines and even then with the lack of a defined taxonomy (or even with one) this can quickly fall apart given that many organisations have widely distributed publishing with varying levels of meta-editorial or information structure concern.

    In the end we need to consider content/site quality and the people processes that manage this content as much as the structures around the content as increasingly that is where search technologies are looking.

  6. I thought the ontology part was interesting, and I often wondered how the search engine defined relevance in pages and well this article on how the search engine works, and parses the HTML for relevance enlightened me.

Comments are closed.