Skip to main content Accessibility help
×
Hostname: page-component-84b7d79bbc-tsvsl Total loading time: 0 Render date: 2024-07-28T21:22:06.749Z Has data issue: false hasContentIssue false

17 - Axiomatic geometries for text documents

from Part III - Information geometry

Published online by Cambridge University Press:  27 May 2010

Paolo Gibilisco
Affiliation:
Università degli Studi di Roma 'Tor Vergata'
Eva Riccomagno
Affiliation:
Università degli Studi di Genova
Maria Piera Rogantin
Affiliation:
Università degli Studi di Genova
Henry P. Wynn
Affiliation:
London School of Economics and Political Science
Get access

Summary

Abstract

High-dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modelling. Typical approaches to modelling such data involve, either explicitly or implicitly, arbitrary geometric assumptions. In this chapter, we consider statistical modelling of non-Euclidean data whose geometry is obtained by embedding the data in a statistical manifold. The resulting models perform better than their Euclidean counterparts on real world data and draw an interesting connection between Caronencov and Campbell's axiomatic characterisation of the Fisher information and the recently proposed diffusion kernels and square root embedding.

Introduction

Geometry is ubiquitous in many aspects of statistical modelling. During the last half century a geometrical theory of statistical inference has been constructed by Rao, Efron, Amari, and others. This theory, commonly referred to as information geometry, describes many aspects of statistical modelling through the use of Riemannian geometric notions such as distance, curvature and connections (Amari and Nagaoka 2000). Information geometry has been mostly involved with the geometric interpretations of asymptotic inference. Focusing on the geometry of parametric statistical families ρ = {ρθ : θ ∈ θ Θ}, information geometry has had relatively little influence on the geometrical analysis of data. In particular, it has largely ignored the role of the geometry of the data space X in statistical inference and algorithmic data analysis.

On the other hand, the recent growth in computing resources and data availability has lead to widespread analysis and modelling of structured data such as text and images. Such data does not naturally lie in ℝn and the Euclidean distance and its corresponding geometry do not describe it well.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×