Graphia is a framework which extracts structured data graphs from factual unstructured texts. Instead of extracting simple relations, or committing to a specific conceptual model, Graphia aims at the extraction of graphs which can represent the complexity of contexts present in texts.

The graph representation adopted by the framework (SDGs – Structured Discourse Graphs) can be naturally serialized as an entity-centric RDF graph, which facilitates the integration and the use of the graph with other resources and applications. Additionally, the graph representation supports a pay-as-you-go / semantic best-effort extraction, where a comprehensive extraction is prioritized over accuracy and where the quality of the extracted graph evolves over time.

Examples of extracted graphs can be found here.

Features included in the framework:

  1. Structured Discourse Graph extraction and visualization.
  2. Named entity resolution to DBpedia entities
  3. Co-reference resolution
  4. Serialization as RDF