27 results in Fundamentals of Computer Vision
Author Index
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 369-372
-
- Chapter
- Export citation
9 - Parametric Transforms
- from Part III - Image Understanding
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 200-217
-
- Chapter
- Export citation
6 - Noise Removal
- from Part II - Preprocessing
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 92-118
-
- Chapter
- Export citation
-
Summary
To change and to change for the better are two different things.
– German proverb.Introduction
In a photosensitive device such as a phototransistor or charge-coupled device, an incidence of a photon of light may (probabilistically) generate an electronic charge. The number of charges produced should be proportional to the photons per second striking the device. However, the presence of heat (anything above absolute zero) will also randomly produce charges, and therefore signal. Such a signal is called dark current because it is a signal that is produced by a camera, even in the dark. Dark current is one of several phenomena that result in random fluctuations to the output of a camera that we call noise. The nature of noise is closely related to the type of sensor. For example, devices that count emissions of radioactive particles are corrupted by a noise that has a Poisson distribution rather than the Gaussian noise of dark current.
In this chapter, techniques are developed that remove noise and degradations so that features can be derived more cleanly for segmentation. We will introduce each topic in one dimension, to allow the student to better understand the process, and then extend that concept to two dimensions. This is covered in the following sections:
• (Section 6.2) The noise in the image can be reduced simply by smoothing. However, the smoothing process also blurs the edges. This section introduces the subject of reducing the noise while at the same time preserving edges, i.e., edge-preserving smoothing.
• (Section 6.3) An intuitive idea in designing edge-preserving smoothing is that smoothing should be associated with a weight according to the local image data where it is applied. And the weight should be large if two pixels are close spatially and have similar photometric values. Otherwise, the weight should be small. The bilateral filter is an algorithm that realizes this ad hoc “good idea.”
• (Section 6.4) Diffusion is described here to pose the denoising problem as the solution to a partial differential equation (PDE). The challenge is how to find a PDE that causes blurring except at edges.
• (Section 6.5) The Maximum A Posteriori probability (MAP) algorithm is discussed to show how to formulate noise removal as a minimization problem. Here, the challenge is to find an objective function whose minimum is the desired result.
A - Support Vector Machines
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 353-358
-
- Chapter
- Export citation
1 - Computer Vision, Some Definitions, and Some History
- from Part I - Preliminaries
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 3-10
-
- Chapter
- Export citation
-
Summary
No object is mysterious. The mystery is your eye.
– Elizabeth BowenIntroduction
There are two fundamentally different philosophies concerning understanding the brain. (1) Understand the brain first. If we can understand how the brain works, we can build smart machines. (2) Using any technique we can think of, make a smart machine. If we can accomplish that, it will give us some hints about how the brain works. This book is all about the second approach, although it draws from current understanding of biological computing. In this chapter, however, we define a few terms, introduce the greater localglobal problem, and then give a very brief introduction to the function of the mammalian brain.
• (Section 1.2) From signal and systems perspective, we describe the differences between Computer Vision and some other closely related fields of studies, including, e.g., image processing and pattern recognition.
• (Section 1.3) Since almost all problems in Computer Vision involve the issue of localness versus globalness, we briefly explain the “local-global” problem and the “consistency” principle used to solve this problem.
• (Section 1.4) Computer Vision is deep-rooted in biological vision. Therefore, in this section, we discuss the biological motivation of Computer Vision and some amazing discoveries from the study of the human visual system.
Some Definitions
Computer Vision is the process whereby a machine, usually a digital computer, automatically processes an image and reports “what is in the image.” That is, it recognizes the content of the image. For example, the content may be a machined part, and the objective may be not only to locate the part but to inspect it as well.
Students tend to get confused by other terms that often appear in the literature, such as Image Processing, Machine Vision, Image Understanding, and Pattern Recognition.
We can divide the entire process of Image Processing into Low-Level Image Processing and High-Level Image Processing. If we interpret these processes from signal and systems perspective, it is more clear to describe their difference and similarity from the format of input/output of the system. When a Low-Level Image Processing system processes an input image, the output is still an image, but a somewhat different image. For example, it may be an image with noise removed, an image that does not take as much storage space as the input image, an image that is sharper than the input image, etc.
12 - Relating to Three Dimensions
- from Part IV - The 2D Image in a 3D World
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 303-349
-
- Chapter
- Export citation
-
Summary
I've always been passionate about geometry and the study of three-dimensional forms.
– Erno RubikIntroduction
Most of the images we see are projections of surfaces in the three-dimensional world around us. They result from light reflected off surfaces in that 3D world, passing through the lens of a camera, and intersecting the focal plane of the camera. These images that result from light reflected in this way are called “luminance” or “brightness” images in this book.
What was not described earlier in this book is the relationship between the 3D world and 2D images, including matching of one to another. We begin by reviewing the geometry of a simple projective camera and relating it to position of points in three dimensions. What we would really like is a range image from every scene, but that isn't always feasible, so we look at several aspects of the 2D–3D relationship:
• (Section 12.2) It is necessary to determine the 3D position of a point in space that is seen by two cameras, assuming the two cameras are known. This is the problem commonly known as stereopsis. When we say a camera is “known” we mean we know where it is, which way it is pointing, and all its internal parameters like focal length, resolution, and others.
• (Section 12.3) Actually it is not really necessary to know all about the cameras. Almost all the relevant information about both cameras can be determined if there are several points in each camera view that can be put into correspondence. An adequate solution to this problem leads to a wonderful little matrix called the fundamental matrix that contains all we need to know for the two-camera problem. Underlying this work is the correspondence problem, the problem of identifying which point in one image corresponds to which point in the other image. Finding a robust solution to the correspondence problem may be difficult.
• (Section 12.4) Once we have an approach to the correspondence problem, we can do partial matching of images and image stitching.
• (Section 12.5) If, instead of two cameras, we have one camera and a controllable light source, we can still find the 3D location of points in space. We address controllable lighting and how to achieve range imaging in this context.
3 - Review of Mathematical Principles
- from Part I - Preliminaries
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 16-38
-
- Chapter
- Export citation
-
Summary
Practical problems require good math.
– R. ChellappaIntroduction
This chapter is a review of several of the topics that are prerequisite for use of this book as a text. The student should have had an undergraduate calculus experience equivalent to about three semesters and some exposure to differential equations and partial differential equations. The student should have coursework containing concepts from probability and statistics, including prior probabilities, conditional probability, Bayes’ rule, and expectations. Finally, and very important, the student should have strong undergraduate-level training in linear algebra.
This chapter reviews and refreshes many of the concepts in those courses, but only as a review, not as a presentation of totally new material.
• (Section 3.2) We briefly review important concepts in linear algebra, including various vector and matrix operations, the derivative operators, eigendecomposition, and its relationship to singular value decomposition.
• (Section 3.3) Since almost all Computer Vision topics can be formulated as minimization problems, in this section, we briefly introduce function minimization, and discuss gradient descent and simulated annealing, the two minimization techniques that can lead to local and global minima, respectively.
• (Section 3.4) In Computer Vision, we are often interested in the probability of certain measurement occurring. In this section, we briefly review concepts like probability density functions and probability distribution functions.
A Brief Review of Linear Algebra
In this section, we very briefly review vector and matrix operations. Generally, we denote vectors in boldface lowercase, scalars in lowercase italic Roman, and matrices in uppercase Roman.
Vectors
Vectors are always considered to be column vectors. If we need to write one horizontally for the purpose of saving space in a document, we use transpose notation. For example, we denote a vector that consists of three scalar elements as:
The Inner Product
The inner product of two vectors is a scalar, c = xTy. Its value is the sum of products of the corresponding elements of the two vectors:
You will also sometimes see the notation <x,y> used for inner product. We do not like this because it looks like an expected value of a random variable. One sometimes also sees the “dot product” notation x · y for inner product.
Preface
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp xiii-xiv
-
- Chapter
- Export citation
-
Summary
This book introduces the fundamental principles of Computer Vision to the advanced undergraduate or first-year graduate student in mathematics, computer science, or engineering.
The book is deliberately informal. The authors attempt to keep the student interested and motivated to continue reading. The student is often addressed directly, as if the student and the authors were in a classroom together. The style is somewhat casual, the authors use the first person frequently, the passive voice is seldom used, and the occasional joke may appear.
The foundations described in the title of this book take two forms: mathematical principles and algorithmic concepts. The principles and concepts are taught together, by describing a computer vision problem, e.g., segmentation, describing an algorithm that could solve that problem, and explaining the mathematical principles underlying the algorithm.
These mathematical principles include
Linear Operators Taught through a variety of applications, including:
Basis Functions Taught through edge detectors
Gaussian Convolution Taught through development of Gaussian edge kernels
Constrained Optimization Taught through finding optimal edge detection kernels and through principal components
The Pseudoinverse Taught through explaining photometric stereo
Scale Taught through development of Gaussian edge kernels.
Nonlinear Operators Taught through mathematical morphology.
Effects of Sampling Taught by illustrating the use of small kernels for determining orientation.
Use of Optimization Taught through development of noise removal algorithms, adaptive contours for segmentation and graph cuts.
Use of Consistency Taught through Hough-like algorithms, through shape matching, and projection onto a manifold.
Projective Geometry Taught through shape from motion and shape from X algorithms.
These concepts are organized in a progression of levels, moving from pixel-level operations such as noise removal, through edge detection, segmentation, and shape description, and finally to recognition. At each level in the progression, the student learns what specific terminology means, what a good application of the concept to an image does, and one or more approaches to solving the application problem.
Almost all the images used as examples in the book are available for download. When one of these figures is used, the figure caption gives the name of the image.
B - How to Differentiate a Function Containing a Kernel Operator
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 359-361
-
- Chapter
- Export citation
10 - Representing and Matching Shape
- from Part III - Image Understanding
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 218-266
-
- Chapter
- Export citation
-
Summary
Shape is what is left when the effects associated with translation, scaling, and rotation are filtered away.
– David KendallIntroduction
In this chapter, we assume a successful segmentation, and explore the question of characterization of the resulting regions. We begin by considering two-dimensional regions that are denoted by each pixel in the region having value 1 and all background pixels having value 0.We assume only one region is processed at a time, since in studying segmentation, we learned how to realize these assumptions.
When thinking about shapes, and measures for shapes, it is important to keep in mind that certain measures may have invariance properties. That is, a measure may remain the same if the object is (for example) rotated. Consider the height of a person in a picture – if the camera rotates, the apparent height of the person will change, unless, of course, the person rotates with the camera.
The common transformations considered in this chapter are those described by some linear operation on the shape matrix, discussed in section 4.2.3.
In the remainder of this chapter, we will be constantly thinking about operations that exhibit various invariances, and can be used to match shapes of regions.
• (Section 10.2) To understand invariances, we must first understand the deformations that may alter the shape of a region, and most of those can be described by matrix operations.
• (Section 10.3) One particularly important matrix is the covariance matrix of the distribution of points in a region, since the eigenvalues and eigenvectors of that matrix describe shape in a very robust way.
• (Section 10.4) In this section, we introduce some important features used to describe a region. We start from some simple properties of a region like perimeter, diameter, and thinness. We then extend the discussion to some invariant features (to various linear transformations) like moments, chain codes, Fourier descriptors, and the medial axis.
• (Section 10.5) Since, in earlier sections, we have represented the regions by sets of numbers called features, in this section, we discuss how to match such sets of numbers.
13 - Developing Computer Vision Algorithms
- from Part IV - The 2D Image in a 3D World
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 350-352
-
- Chapter
- Export citation
-
Summary
Education is what remains after one has forgotten everything one learned in school.
– Albert EinsteinWe conclude the book with a look back in the context of how one develops good algorithms to solve Computer Vision problems. The steps are listed in order, as much as possible, with reference to materials and examples in the book.
The approach is explained through an example: Suppose you need to develop an algorithm to construct a panorama from two overlapping images. This requires that you find some way to transform coordinates in one image to appropriate coordinates in the next image, as described in Chapter 12. (See Figure 13.1.)
Know the literature If you are working on a problem that you think is important, chances are somebody else has also worked on it. Since the advent of Web searching, it has become easier to search, but don't limit your searches to Google. It's OK if you spend several days looking in the library. Looking at actual paper copies of old journal papers isn't necessarily a bad thing. After all, you plan to spend months or years on this project. Some papers were written before Web searching [13.2]. Often, you can be fortunate enough to find that someone else has done a good literature survey for you. For example, in the topic of the shape from motion, [13.3] and [13.1] thoroughly teach the relevant material.
Form an Objective Function In Chapter 6, we saw one example of finding an image that is the solution to an optimization problem.We found images that resembled the input image but also had some other desirable properties. In Chapter 5 we derived convolution kernels by finding best fits to data. In Chapter 12 objective functions were explicitly used twice.
In some rare instances, it isn't possible to set up a problem in terms of an optimization problem, but it's a good way to start.
Do the Math Correctly Once you have set up an objective function, you need to minimize it. For that, you often need a gradient. Can you find the gradient analytically as in section 6.5.1, or must you use numerical methods to estimate the gradient? In Chapter 12, we solved for a homography by using two images. But even if the math is correct, is it a valid approach?
2 - Writing Programs to Process Images
- from Part I - Preliminaries
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 11-15
-
- Chapter
- Export citation
-
Summary
Computer Science is not about computers any more than astronomy is about telescopes.
– E.W. DijkstraIntroduction
One may take two approaches to writing software for image analysis, depending on what one is required to optimize. One may write in a style that optimizes/minimizes programmer time, or one may write to minimize computer time. In this course, computer time will not be a concern (at least not usually), but your time will be far more valuable. For that reason, we want to follow a programming philosophy that produces correct, operational code in a minimal amount of programmer time. The programming assignments in this book are specified to be written in C or C++, rather than in MATLAB or JAVA. This is a conscious and deliberate decision. MATLAB in particular hides many of the details of data structures and data manipulation from the user. Most of the time, that's a good thing. However, in the course of teaching variations of this course for many years, the authors have found that many of those details are precisely the details that students need to grasp in order to effectively understand what image processing (particularly at the pixel level) is all about.
In this book, at least initially, we want the students to write code that works at the pixel level, so they come to understand what the computer is really doing. Later in the course, the student will slowly move up in levels of abstraction. We point the reader to [2.2, 2.1] for texts that emphasize the use of MATLAB.
Basic Programming Structure for Image Processing
Images may be thought of as two- or three-dimensional arrays. They are usually processed pixel-by-pixel in a raster scan. In order to manipulate an image, two- or three-nested for-loops is the most commonly used programming structure, as shown in Figures 2.1 and 2.2.
In these examples, we use two or three integers (row, col, and frame) as the indices to the row, column, and frame of the image. By increasing row, col, and frame with a step one, we are actually scanning the image pixel-wise from left to right, top to bottom, frame by frame.
Contents
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp vii-xii
-
- Chapter
- Export citation
7 - Mathematical Morphology
- from Part II - Preprocessing
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 119-146
-
- Chapter
- Export citation
-
Summary
I think the universe is pure geometry – basically, a beautiful shape twisting around and dancing over space-time.
– Antony Garrett LisiIntroduction
In this chapter, a study of shape begins. Throughout the remainder of this book we will continue to look at shape in many different ways. Here, the topic is introduced at the pixel level, providing some simple, local operations that modify shapes, changing connectivity, and providing nonlinear ways to treat noise.
• (Section 7.2) Binary morphology changes the shape of binary objects. We define the two basic morphological operators, dilation and erosion, on the basis of which two other operators, opening and closing, can be derived. We also discuss some interesting properties of these morphological operators and their applications in noise removal and edge linking.
• (Section 7.3) We extend the discussion to grayscale morphology and define the dilation and erosion operations when the objects are not described in binary values. We progress from grayscale morphology using flat structuring elements to the use of grayscale structuring elements.
• (Section 7.4) As one of the most important applications of morphological operations, we discuss the distance transform (DT) and different ways to compute the DT.
Morphological operations may be considered as either the result of set operations, or the result of composition of functions. Both these approaches are described in this chapter, in order to provide the student with the insight to see similarities and suitability. We will find that set-theoretic ways of thinking about morphology are most appropriate for binary images, and functional notation is more convenient for grayscale images. In section 7.5, an application example of combining dilation, erosion, and distance transform for the purpose of edge linking is provided to illustrate the power of these operations.
Binary Morphology
In this section, we first introduce two basic morphological operators, dilation and erosion. We then define opening and closing, two operators that are developed based on dilation and erosion. We also discuss interesting properties of these operators and their applications in solving computer vision problems.
8 - Segmentation
- from Part III - Image Understanding
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 149-199
-
- Chapter
- Export citation
-
Summary
The partition between the sage and the fool is more slender than the spider web.
– Kahlil GibranIntroduction
Segmentation is the process of separating objects from background. It is the foundation for all the subsequent processes like shape analysis and object recognition.
A segmentation of a picture is a partitioning into connected regions, where each region is homogeneous in some sense and is identified by a unique label. For example, in Figure 8.2 (a “label image”), region 1 is identified as the background. Although region 4 is also background, it is labeled as a separate region, since it is not connected to region 1.
The term “homogeneous” deserves some discussion. It could mean all the pixels are of the same brightness, but that criterion is too strong for most practical applications. It could mean that all pixels are close to some representative (mean) brightness. Stated more formally [8.56], a region is homogeneous if the brightness values are consistent with having been generated by a particular probability distribution. In the case of range imagery [8.31] where we (might) have an equation that describes the surface, we could say a region is homogeneous if it can be described by the combination of that equation and some probabilistic deformation. For example, if all the points in a region of a range image lie in the same plane except for deviations whose distance from the plane may be described by a particular Gaussian distribution, one could say this region is homogeneous.
In this chapter, several ways to perform segmentation are discussed.We progress through problems and methods of increasing complexity:
• (Section 8.2) Threshold-based techniques are guaranteed to form closed regions because they simply assign all pixels above (or below, depending on the problem) a specified threshold to be in the same region. However, using a single threshold only allows regions to be classified as “foreground” and “background.”
• (Section 8.3) Brightness segmentation into two classes is the simplest case. It becomes more challenging when color images are considered and similarities of colors must be considered.
• (Section 8.4) Another level of complexity occurs when it is necessary to determine whether two apparently separate regions are really one region. That is, do they touch? For this, we need connected components.
C - The Image File System (IFS) Software
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 362-368
-
- Chapter
- Export citation
-
Summary
The objective of quickly writing good imaging software can be accomplished by using the image access subroutines in IFS. IFS is a collection of subroutines and applications based on those subroutines that support the development of image processing software in C and C++.
Advantages of IFS
Advantages of IFS include
bullIFS supports any data type including char, unsigned char, short, unsigned short, int, unsigned int, float, double, complex float, complex double, complex short, and structure.
• IFS supports any image size, and any number of dimensions. One may do signal processing by simply considering a signal as a one-dimensional image.
• IFS is available on most current computer systems, includingWindows on the PC, Linux on the PC, and OS-X on the Macintosh. Files written on one platform may be read on any of the other platforms. Conversion to the format native to the platform is done by the read routine, without user intervention.
• A large collection of functions are available, including two-dimensional Fourier transforms, filters, segmenters, etc.
The IFS Header Structure
All IFS images include a header that contains various items of information about the image, such as the number of points in the image, the number of dimensions for the image, the data format, the units and scan direction of each dimension, and so on. Also associated with the image is the actual data for the image. The image header includes a pointer to the image data. The user manipulates an image by calling some function in the IFS library; one of the arguments to the function will be the address of the header. From the information in the header, the IFS library functions automatically determine where the data is and how to access it. In addition to accessing data in images, the IFS routines automatically take care of allocating space in memory to store data and headers. Everything is totally dynamic in operation; there are no fixed-dimension arrays. This relieves the user of the difficulties involved with accessing data in arrays, when the arrays are not of some fixed size.
Dedication
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp v-vi
-
- Chapter
- Export citation
11 - Representing and Matching Scenes
- from Part III - Image Understanding
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 267-300
-
- Chapter
- Export citation
-
Summary
One of these things is not like the other.
– Sesame StreetIntroduction
In this chapter rather than matching regions that we did in Chapter 10, we consider issues associated with matching scenes.
Matching at this level establishes an interpretation. That is, it puts two representations into correspondence:
• (Section 11.2) In this section, both representations may be of the same form. For example, correlation matches an observed image with a template, an approach called template matching. Eigenimages are also a representation for images that use the concepts of principal components to match images.
• (Section 11.3) When matching scenes, we don't really want to match every single pixel, but only do matching at points that are “interesting.” This requires a definition for interest points.
• (Sections 11.4, 11.5, and 11.6) Once the interest points are identified, these sections develop three methods, SIFT, SKS, and HoG, for describing the neighborhood of the interest points using descriptors and then matching those descriptors.
• (Section 11.7) If the scene is represented abstractly, by nodes in graphs, methods are provided for matching graphs.
• (Sections 11.8 and 11.9) In these sections, two other matching methods, including deformable templates, are described.
As we investigate matching scenes, or components of scenes, a new word is introduced, descriptor. This word denotes a representation for a local neighborhood in a scene, a neighborhood of perhaps 200 pixels, larger than the kernels we have thought about, but smaller than templates. The terms kernels, templates, and descriptors, while they do connote size to some extent, are really describing how this local representation is used, as the reader will see.
Matching Iconic Representations
Matching Templates to Scenes
Recall that an iconic representation of an image is an image, e.g., a smaller image, an image that is not blurred, etc. In this section, we need to match two images.
A template is a representation for an image (or sub-image) that is itself an image, but almost always smaller than the original. A template is typically moved around the target image until a location is found that maximizes some match function. The most obvious such function is the sum squared error, sometimes referred to as the sum-squared difference (SSD),
which provides a measure of how well the template (T) matches the image (f) at point x, y, assuming the template is N × N.
4 - Images: Representation and Creation
- from Part I - Preliminaries
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 39-48
-
- Chapter
- Export citation
-
Summary
Computers are useless. They can only give us answers.
– Pablo PicassoIntroduction
Since you have already had a course in image processing, it should not be necessary to describe how images are formed. Representations, however, are another matter. This chapter discusses various image representation schemes as well as a way of treating images as surfaces.
• (Section 4.2) In this section, we discuss mathematical representations both for the information contained in an image and for the ways in which images are stored and manipulated in a digital machine.
• (Section 4.3) In this section, we introduce a way of thinking about images – as surfaces with varying height – which we will find to be a powerful way to describe both the properties of images as well as operations on those images.
Image Representations
In this section, several ways to represent the information in an image are discussed. These representations include: iconic, functional, linear, probabilistic, and graphical representations. Note that in a digital image, the first dimension is columns and the second is rows. In a 3D digital image, the dimensions are columns, rows, and frames.
Iconic Representations (An Image)
An iconic representation of the information in an image is an image. Yeah, right; and a rose is a rose is a rose. When you see what we mean by functional, linear, and relational representations, you will realize we need a word for a representation that is itself a picture. In the following, we briefly describe 2D, 3D, and range images.
• 2D images: The familiar 2D image is the brightness image, also called luminance image. These include photographs; the things you are used to calling “images” or “pictures.” These might be color or grayscale. (Be careful with the words “black and white,” as that might be interpreted as “binary”). A shadow is a 2D binary image. We usually denote the brightness at a point < x, y > as f (x, y). Note: x and y could be either real numbers or integers. In the integer case, we are referring to discrete points in a sampled image. These points are called “pixels,” short for “picture elements.” In the case of real numbers we are usually thinking of the image as a function.
Subject Index
- Wesley E. Snyder, North Carolina State University, Hairong Qi, University of Tennessee
-
- Book:
- Fundamentals of Computer Vision
- Published online:
- 25 October 2017
- Print publication:
- 28 September 2017, pp 373-377
-
- Chapter
- Export citation