Tagging is great, it allows us to describe items we put on to the Web. Sites like Flickr, Del.icio.us and Technorati have successfully implemented tagging solutions to aid in the search of user added material. It's all about user collaboration - the main idea of Web 2.0.

But can tagging be better?

Well, probably.

What do the tags mean?

Tags on Flickr, Del.icio.us and Technorati are just words.

For example the tag 'paris' has different meanings: it could be the city Paris, or the celebrity Paris Hilton. So searching for 'paris' will show results containing the French capital and Paris Hilton. One way to narrow results to the city would be if people had tagged their objects with a tagged such as 'city' also. But this has its own problems: not everyone will tag the object with city and others may use other tags such as 'place'.

Figure 1 - Tagging two objects with different meanings with the same word.

Giving tags meaning

If the tags had meaning then the tag 'celebrity:paris', would be different from the tag 'city:paris'.

Searching for 'celebrity:paris' would find 'Paris Hilton' (not perfect - are there other celebrities called 'Paris'?), and searching for 'city:paris' would find objects tagged with the city of Paris.

The intersection of the two result sets would find 'Paris Hilton' in the city of 'Paris'.

Figure 2 - Tagging two objects with different meanings with the same word, but the word has a meaning which matches the objects meaning.

Searching the meaning

Queries such as 'celebrity:*' could also be performed to return all celebrities, or 'city:*' to return all cities regardless of the actual city or celebrity name.

The query 'celebrity:*' ∩ 'city:paris' would find all celebrities in Paris.

The query '*:paris' would find all objects tagged with 'paris' regardless of the meaning of the word i.e. standard tagging.

Search result suggestions

Suppose the query '*:paris' was ran against the database containing Object 1 and Object 2 from Figure 1. As previously suggested this would be akin to using a standard tagging system as the meaning is not considered.

Figure 3 - The result would be all objects tagged with 'paris' regardless of the meaning. But the application knows the meaning of the tags and can therefore ask the user 'Did you mean... City or Celebrity' to allow the user to narrow the result set.

Tag representation

I have chosen to represent tags as a meaning:value pair, but this could have be represented in other ways. The representation could depend on how tagging is implemented.

XML

This system of tagging could be represented as XML easily for example, something like:

<tag:celebrity>paris</tag:celebrity> and <tag:city>paris</tag:city>

Tag Hierarchy

A further improvement, to the tagging described thus far, would be to introduce a hierarchical system of tag meaning. For example a celebrity is a person so 'Paris Hilton' could be tagged with 'celebritiy:paris' and 'person:paris' or the system could know where celebrity fits into a hierarchy of meanings to allow us to search for person and retrieve 'Paris Hilton'.

Building on the meaning:value pair representation, Object 2 can be tagged with 'person.celebrity:paris'. And Object 1 can be tagged with 'location.city:paris'.

Querying hierarchical tags

  • Every person: 'person.*:*' / Every person called 'Paris': 'person.*:paris'
  • Every celebrity: 'person.celebrity:*'
  • Paris Hilton in Paris: 'person.celebrity:paris' ∩ 'location.city:paris'
  • etc

Implementation

Problems

Tag input

  • Raw tag input - easier to implement, but harder to use. A user would be expected to enter the whole meaning:value pair e.g. 'person.celebrity:paris'. User would need to remember the list of meaning:value pairs, spelling mistakes would cause problems.
  • Guided input, simple version - possibly using a drop down list of tag meanings (with the string value of a readable name e.g. in HTML <option value="person.celebrity">Celebrity</option>, along with a text input box for the tag.
  • Guided input, complete version - harder to implement, restrictive, but easier for end users. The interface would consist of a complete form with components for each type of tag, i.e. an input box specifically for the city, input boxes specifically for celebrities.
Type-ahead combo boxes would be useful in suggesting tags already in use to the user as they type.

Tag storage

Using a relational database.

Tag retrieval

The same kind of problems as with the tag input are found here - should search be performed using raw tags (this could most likely an advanced search option) or guided input. The latter would be the best option to choose as it will make the search process easier.

Multiple Resource Types

Web sites may not store just images or just videos or just articles, but may have a combination of several types of resources that will be needed to be tagged and searched.

Querying Multiple Resources

A proposal for querying such sites would be of the form resource:meaning:value.
  • All videos of Paris Hilton - video:person.celebrity:paris
  • All images of Paris - image:location.city:paris
  • All resources tagged with 'Paris' - *:*:paris

Plural / Singular

Should a search for a certain word in its plural form also return a match for its singular form? Probably, but it depends on the specification i.e. a search may be required to return exactly the tag that matches.

If one was to search for 'celebrities' the result set would be expected to contain objects involving celebrities - something tagged with 'celebrity' would fall into this expected result set.

Figure 4 - The query 'person.celebrity:paris' would be equal to 3 other queries.

This would be the same if 'celebrity' or 'celebrities' is the tag word as well.

Input -> Synonym Selection -> Run Queries -> Combine Queries -> Display Results

Synonyms

The use of synonyms. For example search for 'feline' should also return results for 'cat'.

person.me.about:tagging

I started to think about tagging when I began working on functionpix.com, in August 2006, as the site needed a new way to organise the anticapted influx of user contributed content. A month later I started the third year of my degree which included a course on The Semantic Web. As a result of these I began formulating semantic tagging ideas, and this was a contender for my final year project for which I finally choose an Ant Behaviour Simulation project.

Links

Created: December 19th 2006, updated: 13th January 2007 (about me, added multiple resources, synonyms, added link to semantic wikipedia paper)

It's not what you know, it's whoyouknow.co.uk