Archive for November, 2007

Is MashLogic a Buzzword Compliant company?

I attended an excellent panel discussion last week, featuring clever people who said insightful things about the Semantic Web. Nova Spivack of Radar Networks was a panelist and is an ardent proponent of the Semantic Web. This presentation lays out his worldview pretty well.

I’ve been asked whether MashLogic is a “Semantic Web company”, and my response varies based on who’s asking. I’ll clarify that here.

The Semantic Web is an enabling technology that improves the matching of intent between Consumers and Providers. That makes it something like DNS – end-users are generally unaware of its existence. Hence, I would be hesitant to pitch MashLogic as a Semantic Web company to users and publishers – have you heard of any DNS companies? So here’s what I tell them… “MashLogic is building a service that dynamically enriches web content to adapt to your preferences. We do this by using semantic intelligence (and when available, semantic markup) to uncover relationships between objects on a web page (e.g. names of people) and resources that service those objects (e.g. LinkedIn)“. Not as snappy as I’d like, but it’s the thought that counts.

Now we do use some traditional techniques (e.g. Classification, Extraction, Machine Learning, etc.) to accomplish our semantic goals. To investors and employees, I pitch MashLogic as a company that will help realize the promise of the Semantic Web. Not quite a Semantic Web company, but close enough.

Are we a Web 2.0 company? I don’t mind saying that we are, since we let users “mash up” data from sources of their choosing, we leverage and offer REST APIs, and our UI has the obligatory AJAX calls, progress meters and rounded corners. Detractors will accurately point out that our company name is not vowel-deprived.

Web 3.0? Wikipedia notes “More often (the term Web 3.0) is used as a marketing ploy to hype incremental improvements“. Count me in!

Social Graph? So I’ve been hearing about the Social Graph, which appears to describe a person-centric view of the web, including the social networks they inhabit, their online activities, their blogs and bookmarks, etc. Nova Spivack defines a Semantic Social Graph as a standardized (RDF, OWL) representation of a “regular” Social Graph. I have no idea how MashLogic plays into this, so we’ll just watch from the sidelines for now.

Giant Global Graph? What’s this? Tim Berners-Lee just wrote about GGG, and best I can tell, this is a generalizaton that does away with the person-centricity of the Social Graph and the document-centricity of the WWW. He says “It’s not the documents, it is the things they are about which are important“. Interestingly, we do care quite deeply about Things (capital T) at MashLogic. We look at web pages as implicit representations of Things and their Attributes. Our mashups are simply a way to put Things on Top of Other Things. The GGG is closer to our heart than Social Graphs.
 

From Information Retrieval to Intent Reconciliation

In recent weeks, my RSS reader has delivered a bunch of great articles and announcements related to the Semantic Web, structure and meaning, and “Web 3.0″. There seems to be convergence on the definition of the Semantic Web (web content infused with meta-data) and promising approaches for inferring meaning from unstructured data (domain-specific, ontologies, rules, etc.). The term “Web 3.0″ is not so lucky – reactions range from disdain to dismissal. A few folks were brave enough to proffer opinions and were met with skepticism and rejection on the grounds of the definitions being self-serving. Tough crowd.

At MashLogic we intend to use a variety of techniques to extract concepts and meaning from web content. Such content is semi-structured (pockets of unstructured text blocks within a structure of headings, paragraphs, links, and the like). Our approach is opportunistic in the sense that we will take formal meta-data if we find it, use feeds and APIs when they are available, or apply simple ontologies and rules and see where that takes us.

Getting to the topic of this post… regardless of whether applications match keywords, or apply sophisticated semantic analysis techniques to tease out meaning from content, the overarching goal is generally one of Matching Intent between Consumers and Providers. Content providers use page structure, meta tags, microformats, and markup to give visibility and access to their inventory of information, services, and products. These days most content providers are catering to two constituencies – search engines (and their sinister proxy, SEO) and consumers. On the other side, consumers explicitly express their Intent with keywords and preferences. Providers often combine this with implicit data from clickstreams, demographics, etc.

Information Retrieval implies a paradigm where the Consumer judiciously charts his course through an ocean of information and is ultimately responsible for ensuring that the retrieved information meets his needs. I’d like to think that we are approaching a time of Intent Reconciliation where Providers and Consumers co-operate in helping each other consummate their Intent. They do this by reducing the Ambiguity Gap between Intent and Information with Semantics. From that perspective, Semantics codifies Intent.