Skip to content

Home (old)

Yin Qu (屈垠) edited this page Feb 21, 2014 · 1 revision

Table of Contents

Introduction

BigSemantics is a free and open-source software architecture for developing powerful applications that present interactive metadata semantics from diverse information sources. An example is the Metadata In-Context Expander (MICE) which helps the user maintain context while exploring linked semantics.

BigSemantics provides a meta-metadata language for defining metadata types, a repository of meta-metadata types supporting a wide range of sources, runtime libraries for conveniently accessing and using semantics in different programming languages, and a web service that is available to the public.

Metadata

Metadata means data about data. Suppose you had a library of music as a data set. The songs themselves would be considered data. For each and every song, information such as title, artist, album, genre and duration would be considered metadata, because they are data about data (i.e. the songs in this case). Each individual piece of information, such as the title, is often referred to as a property or a field.

Metadata is also referred to as 'semantic data' or 'semantics' in different contexts.

Meta-Metadata

With BigSemantics, developers and other curators author application-independent, reusable code blocks called wrappers, in the meta-metadata language, to specify data models, extraction rules, presentation semantics, and semantic actions of metadata. The meta-metadata language supports representing nested, cross-linked, and recursive data models. For example, metadata for books can contain metadata for their authors, while metadata for authors can again contain metadata for their books.

At runtime, BigSemantics (in the form of a library or a web service) extracts metadata from web pages using wrappers. To make semantics conveniently accessible from Java or C# programs, BigSemantics automatically generates native classes using data models defined in wrappers, which are called metadata classes. Extracted metadata is mapped to native Java or C# instances of metadata classes for use in program.

How it works

BigSemantics addresses different stages of the life-cycle of metadata: data structure definition, information extraction, presentation, and actions. In contrast, RDF only addresses schema definition and metadata representation. Publishing semantics in RDF and consuming RDF data can be difficult, and is less known to developers.

Data Structure Definition
Wrappers define the data models for different kinds of metadata, such as books or electronic products. Data models are strongly typed, consist of field declarations, and may be nested, cross-linked, or recursive. Defined data structures (i.e. types) can be reused by inheritance, like in object-oriented programming languages.
Information Extraction
For each metadata field, extraction rules can be specified on the corresponding field in the wrapper, to define where to find the relevant information from the source web page, and how to transform it. BigSemantics supports XPath and regular expressions for finding and transforming information.
Semantic Actions
Wrapper authors can specify semantic actions which will be performed on extracted metadata. BigSemantics supports semantic actions such as normalizing the input URL, branching on condition, looping, and "bridge functions" that connect to the applications and execute user defined tasks.
Presentation Semantics
Presentation semantics are high level directives or CSS styles for guiding the presentation of extracted metadata. For example, MICE uses presentation semantics to change font, re-order or hide fields, and create navigation links when displaying metadata.

Wrapper Repository

The web is a heterogeneous and interconnected space of hypermedia. User tasks often require multiple metadata types and information sources. BigSemantics addresses this heterogeneity by providing a repository of meta-metadata wrappers supporting a wide range of types and sources, and a polymorphic type system that maximizes reuse.

The repository includes wrappers ranging from everyday services such as Amazon Product, Trip Advisor, and Google Books, to professional digital libraries such as the ACM Digital Library and IEEE Xplore, as well as movies, games, blog posts, and so on. The repository is distributed with BigSemantics and is hosted on GitHub.

In the repository, wrappers written in meta-metadata are organized by inheritance, in a hierarchical type system, making it easy to reuse and extend. Click the following image to explore supported types and sources using our interactive wrapper ontology visualization. Polymorphism of the metadata type system enables seamlessly integrating new wrappers into existing systems. The repository and the type system help us and interested developers curate wrappers, forming an infrastructural basis to support applications working with heterogeneous and interconnected metadata on the web.

Research

For publications related to BigSemantics (previously known as meta-metadata), see our research page.

Scenarios

Here is an incomplete list of scenarios where you will find BigSemantics useful:

  • Use structured, semantic data from templated pages published by web sites and services;
  • Operate on many, heterogeneous semantic types (and potentially new ones), interdependent on each other;
  • Work with many (and potentially new) web sites and services;
  • Exchange heterogeneous semantic data across system boundaries, such as from cloud to mobile devices;
  • Present heterogeneous semantic data to users;
  • ...

Metadata In-Context Expander

Metadata In-Context Expander (MICE) is a JavaScript application built upon BigSemantics. With MICE, you can dynamically extract metadata from supported websites and see it in your browser with a clean interface. You can also click on the plus sign (+) to expand connected metadata just in place, without having to go to that page.

This demo allows you to also see the wrapper and structure of extracted semantic data. It's a great way of getting a feel of what BigSemantics can do for you.

For the code of and development with MICE, see BigSemanticsJavaScript.

Getting Started

Prerequisites

  • XML: A basic understanding of XML is needed because meta-metadata is an XML based language. If needed please see an XML tutorial.
  • XPath: The ability to form XPath expressions is necessary. These expressions are heavily used to retrieve data from HTML pages. XPath tutorial.
  • Regular Expression: Regular expression are used often in meta-metadata for specifying either URLs or parts of text. Regex tutorial.
  • S.IM.PL: For a deeper understanding of how Meta-Metadata works, it might be helpful to understand some of the core concepts behind S.IM.PL serialization.

Recommended Tools

These tools are not necessary but we have found them helpful in authoring meta-metadata. They help when locating information in a page's HTML and forming XPath expressions:

  • If you use Google Chrome, we recommend using its built-in Developer Tools (accessible from menu).
    • For XPath authoring, we recommend using $x("XPATH-EXPRESSION-HERE") in its JavaScript console, or the XPath Helper extension.
  • If you use Firefox, we recommend the Firebug add-on to help identify the parts of the HTML code which contain the information you need.

Tutorials

Use Pull Requests to share your wrappers to the world!

Code Samples

Code samples can be found in the BigSemanticsSDK project.

Specifications

Support

For general support, please contact us at [email protected].