Computer Vision

Joshua Gregory

Notes intro        10

Introduction        10

Lecture 1: Eye and Human Vision        12

The human eye        12

Optics        13

Spectral responses        14

Mach bands        15

Neural processing        16

Lecture 2: Image Formation        16

Decomposition        16

Resolution        17

Fourier Transform        17

What the Fourier Transform actually does        18

A pulse and its Fourier transform        19

Reconstructing signal from its Fourier Transform        20

Magnitude and phase of the Fourier transform of a pulse        21

Lecture 3: Image Sampling        21

Aliasing in Sampled Imagery        21

Aliasing        22

Sampling Signals        22

Wheels Motion        22

Sampling Theory        23

Transform Pair from Sampled Pulse        23

2D Fourier Transform        24

Reconstruction (non examinable)        26

Shift Invariance        26

Rotation Invariance        27

Filtering        27

Other Transforms        28

Applications of 2D Fourier Transform        29

Lecture 4: Point Operators        29

Image Histograms        29

Brightening and Image        29

Intensity Mappings        30

Exponential/Logarithmic Point Operators (non examinable)        30

Intensity normalisation and histogram equalisation        31

Histogram Equalisation        31

Applying intensity normalisation and histogram equalisation        32

Thresholding and eye image        32

Thresholding: Manual vs Automatic        33

Lecture 5: Group Operators        33

Template Convolution        34

3x3 template and weighting coefficients        35

3x3 averaging operator        36

Illustrating the effect of window size        37

Template convolution via the Fourier transform        37

2D Gaussian function        37

2D Gaussian template        39

Applying Gaussian averaging        39

Finding the median from a 3x3 template        39

Newer stuff (non-examinable)        40

Applying non local means        41

Even newer stuff: Image Ray Transform        41

Applying Image Ray Transform        41

Comparing operators        43

Lecture 6: Edge Detection        43

Edge detection        43

First order edge detection        43

Edge detection maths        45

Templates for improved first order difference        45

Edge Detection in Vector Format        45

Templates for Prewitt operator        46

Applying the Prewitt Operator        46

Templates for Sobel operator        47

Applying Sobel operator        47

Generalising Sobel        47

Generalised Sobel (non examinable)        48

Lecture 7: Further Edge Detection        48

Canny edge detection operator        48

Interpolation in non-maximum suppression        49

Hysteresis thresholding transfer function        50

Action of non-maximum suppression and hysteresis thresholding        51

Hysteresis thresholding vs uniform thresholding        51

Canny vs Sobel        51

First and second order edge detection        53

Edge detection via the Laplacian operator        53

Mathbelts on…        54

Shape of Laplacian of Gaussian operator        54

Zero crossing detection        55

Marr-Hildreth edge detection        55

Comparison of edge detection operators        55

Newer stuff - interest detections (non-examinable)        56

Newer stuff - saliency        56

Lecture 8: Finding Shapes        57

Feature extraction by thresholding        57

Template Matching        57

Template matching in:        58

In Noisy images        58

In Occluded Images        58

Encore, Monsieur Fourier! (???) (non examinable)        59

Applying Template Matching (non examinable)        59

Applying SIFT in ear biometrics (non examinable)        60

Hough Transform        60

Applying the Hough transform for lines        61

Hough Transform for Lines … problems        61

Images and accumulator space of polar Hough Transform        62

Applying Hough Transform        62

Lecture 9: Finding More Shapes        63

Hough Transform for Circles        63

Circle Voting and Accumulator Space        64

Speeding it up        64

Applying the HT for circles        64

Integrodifferential operator? (non examinable)        64

Arbitrary Shapes        65

R-table Construction        65

Active Contours (non examinable)        66

Geometric active contours (non examinable)        67

Parts-based shape modelling (non examinable)        67

Symmetry yrtemmyS (non examinable)        68

Lecture 10 Applications/Deep Learning        68

Where is computer vision used?        68

Deep Learning        68

Conclusions        68

Lecture 1: Building machines that see        71

Key terms in designing Computer Vision systems        71

Robustness        71

Repeatability        71

Invariance        71

Constraints        71

Constraints in Industrial Vision        72

Software Constraints        72

Colour-Spaces (non examinable?)        72

RGB Colour-space        72

HSV Colour-space        72

Physical Constraints        73

Vision in the wild        73

Lecture 2: Machine learning for pattern recognition        73

Feature Spaces        73

Key terminology        74

Density and Similarity        74

Distance in featurespace        74

Euclidean distance (L2 distance)        75

Manhattan/Taxicab distance (L1 distance)        76

Cosine Similarity        76

Choosing good featurevector representations for machine learning        77

Supervised Machine Learning: Classification        77

Linear Classifiers        77

Non-linear binary classifiers        78

Multiclass classifiers: KNN        80

KNN Problems (non examinable?)        80

Unsupervised Machine Learning: Clustering        81

K-Means Clustering        81

Lecture 3: Covariance and Principal Components        82

Random Variables and Expected Values        82

Variance        82

Covariance        82

Covariance Matrix        83

Mean Centring        83

Covariance matrix again        84

Principal axes of variation        85

Basis        85

The first principal axis        85

The second principal axis        85

The third principal axis        85

Eigenvectors and Eigenvalues        86

Important Equation        86

Properties        86

Finding Values        87

Eigendecomposition        87

Summary        87

Ordering        87

Principal Component Analysis        87

Linear Transform        87

Linear Transforms        88

PCA        88

PCA Algorithm        89

Eigenfaces        89

Making Invariant        89

Problems        90

Potential Solution… Apply PCA        90

Lecture 4: Types of image feature and segmentation        91

Image Feature Morphology        91

Global Features        91

Grid or Block-based Features        91

Region-based Features        92

Local Features        92

Global Features        92

Image Histograms        92

Joint-colour histogram        93

Image Segmentation        94

What is segmentation?        94

Global Binary Thresholding        94

Otsu’s thresholding method        94

Adaptive / local thresholding        95

Mean adaptive thresholding        95

Segmentation with K-Means        95

Advanced segmentation techniques        96

Connected Components        96

Pixel Connectivity        96

Connected Component        96

Connected Component Labelling        96

The two-pass algorithm        97

Lecture 5: Shape description and modelling        98

Extracting features from shapes represented by connected components        98

Borders        98

Inner Border        98

Outer Border        98

Two ways to describe shape        99

Region Description: Simple Scalar Shape Features        100

Area and Perimeter        100

Compactness        100

Centre of Mass        100

Irregularity / Dispersion        101

Moments        102

Standard Moments        102

Central Moments        102

Normalised Central Moments        103

Boundary Description        103

Chain Codes        103

Chain Code Invariance        103

Chain Code Advantages and Limitations        104

Fourier Descriptors        104

Region Adjacency Graphs        105

Active Shape Models and Constrained Local Models        105

Lecture 6: Local interest points        105

What makes a good interest point?        105

How to find interest points        106

The Harris and Stephens corner detector        107

Basic Idea        107

Harris & Stephens: Mathematics        107

Structure Tensor        108

Eigenvalues of the Structure Tensor        109

Harris & Stephens Response Function        109

Harris & Stephens Detector        110

Scale in Computer Vision        111

The problem of scale        111

Scale space theory        111

The Gaussian Scale Space        112

Nyquist-Shannon Sampling theorem        112

Gaussian Pyramid        113

Multi-scale Harris & Stephens        113

Blob Detection Finally        113

Laplacian of Gaussian        113

Scale space LoG        114

Scale space DoG        115

DoG Pyramid        115

Lecture 7: Local features and matching        116

Local features and matching basics        116

Local Features        116

Why extract local features?        116

Example: Building a panorama        116

Problem 1:        116

Problem 2:        117

Two distinct types of matching problem        117

Narrow-baseline stereo        117

Wide-baseline stereo        117

Two distinct types of matching problem        118

Robust local description        118

Descriptor Requirements        118

Matching by correlation (template matching)        118

(Narrow baseline) template matching        118

Problems with wider baselines        120

Local Intensity Histograms        120

Use local histograms instead of pixel patches        120

Local histograms        120

Overcoming localisation sensitivity        121

Overcoming lack of illumination invariance        121

Local Gradient Histograms        121

Gradient Magnitudes and Directions        121

Gradient Histograms        121

Building gradient histograms        121

Rotation Invariance        121

The SIFT feature        122

Adding spatial awareness        122

SIFT Construction: sampling        123

SIFT Construction: weighting        123

SIFT Construction: binning        123

Matching SIFT features        124

Euclidean Matching        124

Improving matching performance        124

Lecture 8: Consistent matching        124

Feature distinctiveness        124

Constrained matching        125

Geometric Mappings        125

What are geometric transforms?        125

Point Transforms        125

The Affine Transform        125

Translation        126

Translation and Rotation        126

Scaling        126

Aspect Ratio        126

Shear        126

Degrees of Freedom        126

Affine Transform        129

Similarity Transform        130

More degrees of freedom        130

Homogeneous coordinates        131

The Planar Homography (Projective Transformation)        131

Recovering a geometric mapping        131

Simultaneous equations        131

Least-squares        132

Robust Estimation        132

Problem: Noisy data        133

Robust estimation techniques        133

RANSAC: RANdom SAmple Consensus        133

Further applications of robust local matching        134

Object recognition & AR        134

3D reconstruction        134

Problems with direct local feature matching        134

Local feature matching is slow!        134

Efficient Nearest Neighbour Search        134

K-D Trees        134

K-D Tree problems        135

Hashing        137

Sketching        138

Lecture 9: Image search and Bags of Visual Words        138

Text Information Retrieval        138

The bag data structure        138

Bag of Words        138

Text processing (feature extraction)        139

The Vector-Space Model        139

Bag of Words Vectors        140

The Vector-space Model        140

Searching the VSM        141

Recap: Cosine Similarity        141

Inverted Indexes        141

Computing the Cosine Similarity        142

Weighting the vectors        142

Possible weighting schemes        143

Vector Quantisation        143

Learning a Vector Quantiser        143

Vector Quantisation        143

Visual Words        144

SIFT Visual Words        144

Bags of Visual Words        144

Histograms of Bags of Visual Words        145

Visualising Visual Words        146

The effect of codebook size        146

Content-based Image Retrieval        147

BoVW Retrieval        147

Optimal codebook size        147

Problems with big codebooks        147

Overall process for building a BoVW retrieval system        148

Lecture 10: Image classification and auto-annotation        148

Multilabel classification        148

Object Detection / Localisation        149

Slide summary: Challenges in Computer Vision        149

Aside: Optimal codebook size        150

Another slide summary: Stuff        150

Dense Local Image Patches        150

Dense SIFT        150

Pyramid Dense SIFT        151

Spatial Pyramids        152

Developing and benchmarking a BoVW scene classifier        152

Evaluation Dataset        152

Building the BoVW        153

Training classifiers        153

Classifying the test set        153

Evaluating Performance        153

Lecture 11: Towards 3D vision        153

Summary Summary        154

Programming for computer vision & other musings related to the coursework        155

Writing code for computer vision        155

Most vision algorithms are continuous        155

Always work with floating point pixels        155

Guidelines for writing vision code        155

Convolution        155

Aside: phase and magnitude        156

Aside: Displaying FFTs        156

Template Convolution        157

What if you don’t flip the kernel?        158

Ideal Low-Pass filter        158

Ideal Low-Pass filter - problems        159

Gaussian filters - why        160

Building Gaussian Filters        160

High-pass filters        161

Note - Don’t do this!        161

High-pass filters have a mixture of negative and positive coefficients        162

Building hybrid images        162

…is really simple        162

TL;DR        163

Part 1: Mark        164

Part 2: Jon        164

Lec 1        164

Lec 2        164

Lec 3        165

Lec 4        167

Lec 5        169

Lec 6        172

Lec 7        174

Lec 8        176

Lec 9        178

Lec 10        180

TL;DR By Mark Towers        182

Mark (edited by Joshua Gregory)        182

Jon        187

Notes intro

Introduction

Images consist of picture elements known as “pixels”.

2D Images are matrices of numbers.

Point Operations:

Group Operations:

Feature Extraction

Applications of Computer Vision

Part 1: Mark

Lecture 1: Eye and Human Vision

Is human vision a good model for computer vision?

The human eye

Diagram of the human eye. Light comes in through the lens and is focused onto the retina. These light impulses are then sent through the optic nerve.

Cones are photopic (Coloured vision) and Rods are scotopic (Black and white) (Nixon made a mistake on his notes) - Thanks Tim :)

Optics

Image is flipped, just like a camera. Our brain works this out and we perceive it the right way up.

Spectral responses

The eye is most sensitive to green light, least to blue, but the point is that colours are at different response levels.

Mach bands

Mach bands are an optical illusion whereby the contrast between edges of slightly differing shades of grey is exaggerated when they are touching.

[Source: Wikipedia]

This and many other optical illusions (edges, static, benham’s disk... you can look at in the slides) demonstrate that our eyes can actually be not so good as a vision system.

Neural processing

Lecture 2: Image Formation

What is inside an image?

Decomposition

You can decompose an image into its bits.

Resolution

Images are also (obviously) less detailed at lower resolutions.

Fourier Transform

Any periodic function is the result of adding up sine and cosine waves of different frequencies.

[This is a good visual intuitive: https://www.youtube.com/watch?v=spUNpyF58BY]

What the Fourier Transform actually does

(To do: explain how a function that deals with (sound) waves can be used on images with pixels. If anyone reading can explain that’d be great)

A: The fourier transform is just a technique for mapping signals that vary in one dimension to the frequency domain of that dimension. In the case of audio this means transforming air pressure from the time domain to the (temporal) frequency domain. If we consider the intensity of an image to be our signal, which varies in space (let's say along the x axis), then the FT will transform the signal into spatial frequency in the x dimension.

Now, we can generalise the fourier transform to take signals that are defined in more than one mutually orthogonal dimensions, e.g. an image represented by pixel intensity defined in x and y. Now the FT transforms the image into the (spatial) frequency domain, defined in two dimensions (spatial frequency in x, and spatial frequency in the y direction). Hope that helps? - James

A pulse and its Fourier transform

Reconstructing signal from its Fourier Transform

Magnitude and phase of the Fourier transform of a pulse

Re() is the real function and Im() is the imaginary function so for expression  then and . In the complex plane, you can imagine the imaginary numbers as another number line with the real number as the other number. So to calculate the magnitude, it is Pythagoras’ theorem of the two numbers while the angle is the angle of the two numbers. For more information, look up complex numbers. (Thanks Mark ;) )

The magnitude is the scalar amount of the frequency and the phase is its offset (from a regular sin wave) (-Alex)

Lecture 3: Image Sampling

How is an image sampled and what does it imply?

Aliasing in Sampled Imagery

Aliasing

Aliasing is an effect that causes different signals to become indistinguishable when sampled. It also often refers to the distortion or artifact that results when a signal reconstructed from samples is different from the original continuous signal. [-Wiki]

Sampling Signals

Wheels Motion

Sampling Theory

Transform Pair from Sampled Pulse

Some stuff to do with the Fourier Transform…

(If anyone could explain what these graphs are demonstrating that’d be epic)

A: It’s showing the effect of adding reconstructing the sampled signal from its frequency components. However, the images for (b) and (c) are the wrong way around. Essentially it’s trying to demonstrate that as you add more frequency components from the transform of the sampled signal, the reconstruction becomes a better approximation of the sampled signal, until eventually (f) all the frequency components are present and the reconstruction is perfect [(a) and (f) are identical]. - James

2D Fourier Transform

Reconstruction (non examinable)

Shift Invariance

Rotation Invariance

Filtering

Other Transforms

Applications of 2D Fourier Transform

Lecture 4: Point Operators

How many different operators are there which operate on image points?

Image Histograms

Brightening an Image

Intensity Mappings

Exponential/Logarithmic Point Operators (non examinable)

Intensity Normalisation

Intensity normalisation and histogram equalisation

Histogram Equalisation

Applying intensity normalisation and histogram equalisation

Thresholding and eye image

Thresholding: Manual vs Automatic

Other thresholding includes Entropic thresholding and Optimal Thresholding

Lecture 5: Group Operators

How do we combine points to make a new point in a new image?

Template Convolution

I’m sure everyone knows about this by now…

This is when you take an original image and then apply a convolution template over every pixel to create a resultant pixel value.

(to do: more explanation…)

3x3 template and weighting coefficients

Where wi,j are the weights and x(i), y(j) denote the position of the point that matches the weighting coefficient position

3x3 averaging operator

Illustrating the effect of window size

Larger windows lead to more blur.

Template convolution via the Fourier transform

Allows for fast computation for template sizes >= 7x7

Note: it’s point by point! The equation is wrongly written in various places.

2D Gaussian function

2D Gaussian template

Applying Gaussian averaging

Finding the median from a 3x3 template

Newer stuff (non-examinable)

Averaging which preserves regions

Applying non local means

Even newer stuff: Image Ray Transform

Use analogy to light to find shapes, removing the remainder

Applying Image Ray Transform

Comparing operators

Lecture 6: Edge Detection

What are edges and how do we find them?

Edge detection

First order edge detection

Edge detection maths

        

Templates for improved first order difference

Edge Detection in Vector Format

Templates for Prewitt operator

Applying the Prewitt Operator

Templates for Sobel operator

Applying Sobel operator

Generalising Sobel

Generalised Sobel (non examinable)

Lecture 7: Further Edge Detection

What better ways are there to detect edges?

Canny edge detection operator

Formulated with three main objectives:

Approximation

  1. Use Gaussian smoothing
  2. Use the Sobel operator (could combine with 1?)
  3. Use non-maximal suppression
  4. Threshold with hysteresis to connect edge points

Interpolation in non-maximum suppression

(Thanks Samuel Collins)

Hysteresis thresholding transfer function

To help better understand the function an example:

Action of non-maximum suppression and hysteresis thresholding

Hysteresis thresholding vs uniform thresholding

As you can see, hysteresis thresholding is arguably better.

Canny vs Sobel

First and second order edge detection

Edge detection via the Laplacian operator

Mathbelts on…

(I think this is essentially just rearranging an equation)

Shape of Laplacian of Gaussian operator

Zero crossing detection

Marr-Hildreth edge detection

Comparison of edge detection operators

Newer stuff - interest detections (non-examinable)

Newer stuff - saliency

Lecture 8: Finding Shapes

How can we group points to find shapes?

Feature extraction by thresholding

Template Matching

Template matching in:

In Noisy images

In Occluded Images

Encore, Monsieur Fourier! (???) (non examinable)

Applying Template Matching (non examinable)

Here, have this pretty low res image of a reaaaaally realistic weapon identification system, which can detect weapons like guns and knives. (aka, this is an application of template matching.)

Windows XP anyone?

Applying SIFT in ear biometrics (non examinable)

Hough Transform

Go and read the following article about the Hough Transform: http://aishack.in/tutorials/hough-transform-basics/

It’s really good.

Applying the Hough transform for lines

Hough Transform for Lines … problems

Images and accumulator space of polar Hough Transform

Applying Hough Transform

Lecture 9: Finding More Shapes

How can we go from conic sections to general shapes?

Hough Transform for Circles

Points

Parameters

Radius

x,y

x0,y0

r

x0,y0

x,y

r

Circle Voting and Accumulator Space

Like before, but this time for circles.

Speeding it up

Applying the HT for circles

This can be used for detecting the shapes of an eye and iris.

Integrodifferential operator? (non examinable)

https://stackoverflow.com/questions/27058057/comparing-irises-images-with-opencv

Looks cool???

Arbitrary Shapes

R-table Construction

Pick a reference point in the image.

For each boundary point x, compute \Phi(x) - the gradient direction.

r is the distance from the reference point to the boundary point (radial distance), and \alpha is the angle.

The table then stores each (r, \alpha) pair with the corresponding \Phi.

(Thanks Bradley Garrod)

Active Contours (non examinable)

Geometric active contours (non examinable)

Couple’a hippos 🦛🦛

Parts-based shape modelling (non examinable)

This guy don’t look so good

Symmetry yrtemmyS (non examinable)

SPooky

Lecture 10 Applications/Deep Learning

Where is feature extraction used these days?

This lecture is mostly extra (non examinable) stuff, so I’m only gonna cover the hand with bow slides. If you want a look then jump to the slides: http://comp3204.ecs.soton.ac.uk/mark/Lecture%2010.pdf

Where is computer vision used?

What you see depends on the viewpoint you take

Deep Learning

Conclusions


Part 2: Jon

Note: Jon has done some really good handout summaries which have basically done my job for me, they can be found here: http://comp3204.ecs.soton.ac.uk/part2.html 

Highly recommend reading these probably.

But I will still go through the slides and make notes of hand with bow slides :)

Lecture 1: Building machines that see

Key terms in designing Computer Vision systems

  1. Robust
  2. Repeatable
  3. Invariant
  4. Constraints

Robustness

Repeatability

Invariance

Constraints

Constraints in Industrial Vision

Software Constraints

Colour-Spaces (non examinable?)

Even though there is no hand with bow for these slides, they seem useful.

RGB Colour-space

HSV Colour-space

LAB Colour-space

Lab color space is more perceptually linear than other color spaces. Perceptually linear means that a change of the same amount in a color value should produce a change of about the same visual importance. It is important especially when you try to measure the perceptual difference of two colors. (Source)

Physical Constraints

Vision in the wild

Lecture 2: Machine learning for pattern recognition

Feature Spaces

Many computer vision applications involving machine learning take the following form

  1. This is where cool image processing happens
  2. Feature extractors make feature vectors from images
  3. Machine learning system uses featurevectors to make intelligent decisions

Key terminology

Density and Similarity

Distance in featurespace

Cats are a close distance apart, and are further away from the cluster of dogs in this feature space.

Euclidean distance (L2 distance)

Equation

The straight-line; “Euclidean distance”

Manhattan/Taxicab distance (L1 distance)

Equation

Essentially just like you are taking a taxi cab around the grid like streets of manhattan

Cosine Similarity

Equation

The angle between the two points from the origin

Choosing good featurevector representations for machine learning

Supervised Machine Learning: Classification

Linear Classifiers

Linear classifiers try to learn a hyperplane that separates two classes in featurespace with minimum error

There can be lots of hyperplanes to choose from; differing classification algorithms apply differing constraints when learning the classifier.

To classify a new image, you just need to check what side of the hyperplane it is on.

Non-linear binary classifiers

Linear classifiers work best when the data is linearly separable.

Like this:

But what if the data is like this:

There is no hope for a linear classifier! 😭

Non-linear binary classifiers, such as Kernel support Vector Machines learn non-linear decision boundaries. (basically a curved graph separates the data instead of a straight one)

However, you have to be careful, you might lose generality by overfitting:

What class would the blue question mark actually belong to?

Multiclass classifiers: KNN

Assign class of unknown point based on majority class of closest K neighbours in featurespace.

KNN Problems (non examinable?)

Unsupervised Machine Learning: Clustering

K-Means Clustering

StatQuest: K-means clustering (good video to explain this!!!!!!!!!!!!!!!!)

Lecture 3: Covariance and Principal Components

Random Variables and Expected Values

Variance

Equation

Technically it’s E[(X - E[X])2]

Covariance

(Covariance is related to Correlation though…)

Equation

Technically it’s E[ (x - E[x]) (y - E[y]) ]

Covariance Matrix

The covariance matrix is a square symmetric matrix, as you can see by the symmetry in the example above.

Mean Centring

From top image to bottom image; mean centered around the origin.

Covariance matrix again

So then this means that

The covariance matrix is directly proportional to Z transposed, multiplied by Z, where Z is the matrix formed by the mean-centred vectors (each row of matrix Z is one mean-centred vector) - Hope this helps :) - Lorena

Principal axes of variation

Basis

The first principal axis

The second principal axis

The third principal axis

Eigenvectors and Eigenvalues

Important Equation

Properties

Finding Values

Eigendecomposition

Eigendecomposition takes a matrix and represents it in terms of its eigenvalues and eigenvectors.

Summary

The Eigendecomposition of a covariance matrix A:

Gives you the principal axes and their relative magnitudes.

Ordering

Principal Component Analysis

Linear Transform

Linear Transforms

PCA

PCA Algorithm

  1. Mean-centre the data vectors
  2. Form the vectors into a matrix Z, such that each row corresponds to a vector
  3. Perform the Eigendecomposition of the matrix ZTZ, to recover the eigen matrix Q and diagonal eigenvalue matrix Λ: ZTZ = QΛQT
  4. Sort the columns of Q and corresponding diagonal values of Λ so that the eigenvalues are decreasing
  5. Select the L largest eigenvectors of Q (the first L columns) to create the transform matrix QL
  6. Project the original vectors into a lower dimensional space, TL: TL = ZQL

Eigenfaces

Spooky 0_0

Making Invariant

Problems

Potential Solution… Apply PCA

Lecture 4: Types of image feature and segmentation

Image Feature Morphology

Global Features

Grid or Block-based Features

Region-based Features

Local Features

Global Features

Image Histograms

Joint-colour histogram

Image Segmentation

What is segmentation?

Global Binary Thresholding

Otsu’s thresholding method

More detailed look over at wikipedia!

Adaptive / local thresholding

Mean adaptive thresholding

Set the current pixel to 0 if its value is less than the mean of its neighbours plus a constant value; otherwise set to 1.

Segmentation with K-Means

Advanced segmentation techniques

Connected Components

Pixel Connectivity

Connected Component

A connected component is a set of pixels in which every pixel is connected either directly or through any connected path of pixels from the set.

Connected Component Labelling

The two-pass algorithm

  1. On the first pass:
  1. Iterate through each element of the data by column, then by row (Raster Scanning)
  1. If the element is not the background
  1. Get the neighbouring elements of the current element
  2. If there are no neighbours, uniquely label the current element and continue
  3. Otherwise, find the neighbour with the smallest label and assign it to the current element
  4. Store the equivalence between neighbouring labels
  1. On the second pass:
  1. Iterate through each element of the data by column, then by row
  1. If the element is not the background
  1. Relabel the element with the lowest equivalent label

Lecture 5: Shape description and modelling

Extracting features from shapes represented by connected components

Borders

Say you have this pixel shape:

Inner Border

A border made up of only pixels from the shape; the outermost pixels within the shape.

Outer Border

A border where the outline of pixels outside the shape make up the border.

Two ways to describe shape

  1. Region Description
  2. Boundary Description

Region Description: Simple Scalar Shape Features

Area and Perimeter

Compactness

Centre of Mass

Irregularity / Dispersion

A measure of how “spread-out” the shape is

Moments

Standard Moments

Central Moments

Normalised Central Moments

Boundary Description

Chain Codes

Chain Code Invariance

Chain Code Advantages and Limitations

Fourier Descriptors

Region Adjacency Graphs

Theres some more stuff in the slides which isn’t hand with bow-ed, go have a look

Slides 40-47: http://comp3204.ecs.soton.ac.uk/lectures/pdf/L5-shapedescription.pdf

Active Shape Models and Constrained Local Models

Lecture 6: Local interest points

What makes a good interest point?

How to find interest points

The Harris and Stephens corner detector

Basic Idea

Harris & Stephens: Mathematics

Weighted average change in intensity between a window and a shifted version [by (Δx,Δy)] of that window:

Structure Tensor

Eigenvalues of the Structure Tensor

Harris & Stephens Response Function

Harris & Stephens Detector

Scale in Computer Vision

The problem of scale

Scale space theory

The Gaussian Scale Space

Formally, Gaussian scale space is defined as:

Where t ≥ 0 and,

Normally, only a fixed set of values of t are used - it’s common to use integer powers of 2 or √2

Nyquist-Shannon Sampling theorem

If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.

...so, if you filter the signal with a low-pass filter that halves the frequency content, you can also half the sampling rate without loss of information…

Gaussian Pyramid

Multi-scale Harris & Stephens

Blob Detection Finally

Laplacian of Gaussian

Scale space LoG

Very useful property: if a blob is detected at (x0, y0 ; t0) in an image, then under a scaling of that image by a factor s, the same blob would be detected at (sx0, sy0 ; s2t0) in the scaled image.

Scale space DoG

DoG Pyramid

Lecture 7: Local features and matching

Local features and matching basics

Local Features

Multiple features are extracted; one per local interest point

Why extract local features?

Example: Building a panorama

Problem 1:

Problem 2:

Two distinct types of matching problem

Narrow-baseline stereo

(Notice how the red-circled background object only appears in the second image; so these two images have a slight difference to them)

Wide-baseline stereo

(I don’t think I need to circle the whole image lol)

Two distinct types of matching problem

Robust local description

Descriptor Requirements

Matching by correlation (template matching)

(Narrow baseline) template matching

Problems with wider baselines

Local Intensity Histograms

Use local histograms instead of pixel patches

Local histograms

Overcoming localisation sensitivity

Overcoming lack of illumination invariance

Local Gradient Histograms

Gradient Magnitudes and Directions

Gradient Histograms

Building gradient histograms

Rotation Invariance

The SIFT feature

Adding spatial awareness

SIFT Construction: sampling

SIFT Construction: weighting

SIFT Construction: binning

Matching SIFT features

Euclidean Matching

Improving matching performance

Lecture 8: Consistent matching

Feature distinctiveness

Constrained matching

Geometric Mappings

What are geometric transforms?

Go have a look at some on Wikipedia here: https://en.wikipedia.org/wiki/Geometric_transformation

Point Transforms

We’re interested in transforms that take the following form: x’=Tx

The Affine Transform

The affine transform is defined as

It’s more convenient to write this as a single transform matrix by adding an extra dimension to each vector:

Translation

Translation and Rotation

Scaling

Aspect Ratio

Shear

Degrees of Freedom

Affine Transform

6 DoF: translation + rotation + scale + aspect ratio + shear

Similarity Transform

4 DoF: translation+rotation+scale

More degrees of freedom

Normalise by w so that the transformed vector is [•,•,1]

Homogeneous coordinates

The Planar Homography (Projective Transformation)

Recovering a geometric mapping

Simultaneous equations

Least-squares

Robust Estimation

Problem: Noisy data

Robust estimation techniques

RANSAC: RANdom SAmple Consensus

This is an iterative method to estimate parameters of a mathematical model for a set of observed data that contains outliers.

Assume:

M data items required to estimate model T

N data items in total

Algorithm:

  1. Select M data items at random
  2. Estimate model T
  3. Find how many of the N data items fit T within tolerance tol, call this K (i.e. compute how many times the absolute residual is less than tol). The points that have an absolute residual less than tol are the inliers; the other points are the outliers.
  4. If K is large enough, either accept T, or compute the least-squares estimate using all inliers, and exit with success.
  5. Repeat steps 1..4 nIterations times
  6. Fail - no good T fit of data

Further applications of robust local matching

Object recognition & AR

3D reconstruction

Problems with direct local feature matching

Local feature matching is slow!

Efficient Nearest Neighbour Search

K-D Trees

K-D Tree problems

Hashing

Sketching

Lecture 9: Image search and Bags of Visual Words

Text Information Retrieval

The bag data structure

Bag of Words

Text processing (feature extraction)

The Vector-Space Model

Bag of Words Vectors

The Vector-space Model

Searching the VSM

Recap: Cosine Similarity

Inverted Indexes

Aardvark

[doc3:4]

Astronomy

[doc1:2]

Diet

[doc2:9; doc3:8]

...

Movie

[doc2:10]

Star

[doc:13; doc2:4]

Telescope

[doc1:15]

Computing the Cosine Similarity

Weighting the vectors

Possible weighting schemes

Vector Quantisation

Learning a Vector Quantiser

Vector Quantisation

Visual Words

SIFT Visual Words

Bags of Visual Words

So why are we doing all this word stuff for a module about vision?....

Histograms of Bags of Visual Words

Visualising Visual Words

The effect of codebook size

Content-based Image Retrieval

BoVW Retrieval

Optimal codebook size

Problems with big codebooks

Overall process for building a BoVW retrieval system

  1. Collect the corpus of images that are to be indexed and made searchable
  2. Extract local features from each image
  3. Learn a large codebook from (a sample of) the features
  4. Vector quantise the features, and build BoVW representations for each image
  5. Construct an inverted index with the BoVW representations

Lecture 10: Image classification and auto-annotation

Multilabel classification

Object Detection / Localisation

Slide summary: Challenges in Computer Vision 

This is a summary of slides slides 7-16, go look in the slides if you’re interested

http://comp3204.ecs.soton.ac.uk/lectures/pdf/L10-classification.pdf

Aside: Optimal codebook size

Another slide summary: Stuff

Slides 18-26

Dense Local Image Patches

Dense SIFT

Pyramid Dense SIFT

Egyptian Rhinoceros

Spatial Pyramids

Developing and benchmarking a BoVW scene classifier

Evaluation Dataset

Building the BoVW

Training classifiers

Classifying the test set

Evaluating Performance

Phew, I think we’ve covered all the examinable content. I’ll summarise the bonus slides just for completeness.

Lecture 11: Towards 3D vision

Look at the slides for pictures :) http://comp3204.ecs.soton.ac.uk/lectures/pdf/L11-towards3d.pdf

Summary Summary

Programming for computer vision & other musings related to the coursework

Writing code for computer vision

Most vision algorithms are continuous

Always work with floating point pixels

Guidelines for writing vision code

Convolution

We probably do need to know how to do convolution so...

Aside: phase and magnitude

Aside: Displaying FFTs

Template Convolution

int kh = kernel.height;

int kw = kernel.width;

int hh = kh / 2;

int hw = kw / 2;

Image clone = new Image(image.width, image.height);

for (int y = hh; y < image.height - (kh - hh); y++) {

for (int x = hw; x < image.width - (kw - hw); x++) {

float sum = 0;

                 for (int j = 0, jj = kh - 1; j < kh; j++, jj--) {

for (int i = 0, ii = kw - 1; i < kw; i++, ii--) {

int rx = x + i - hw; int ry = y + j - hh;

sum +=

image.pixels[ry][rx] * kernel.pixels[jj][ii];

}

}

clone.pixels[y][x] = sum;

}

}

Formatting bruh

What if you don’t flip the kernel?

Ideal Low-Pass filter

Ideal Low-Pass filter - problems

Gaussian filters - why

Building Gaussian Filters

High-pass filters

Note - Don’t do this!

High-pass filters have a mixture of negative and positive coefficients

Building hybrid images

…is really simple

Happy Revising!

(not my hybrid, add a suggestion if you know whose it is)

(- Thanks Matthew Barnes, god of notes (this is not a real hybrid lol))

Go look at the bonus lecture yourself: http://comp3204.ecs.soton.ac.uk/lectures/pdf/VisionRetrospective.pdf


TL;DR

The TL;DRs are TL;DRs themselves; so much content

The TL;DRs are actually pretty long, there’s just a lot of important content.

Part 1: Mark

See collab TL;DR below Jon’s part.

Mark (formatted by Joshua Gregory)

Part 2: Jon

Summaries from end of slides and hand-out notes.

Lec 1

A computer vision system is not just software: it is any hardware, software and wetware (i.e. humans) that make up the complete system. To engineer a computer vision system, you need to think about how you can achieve the required level of robustness by constraining the problem at hand and incorporating sufficient invariance to potentially changing environmental factors.

Lec 2

Machine learning is a fundamental part of high-level computer vision (interpreting what is seen, versus the low-level, which is all about processing the image). The standard computer vision pipeline goes from the image, through a process of feature extraction, to a pattern recognition component that makes inferences. Machine learning is a standard way of training the pattern recognition system.

Lec 3

Understanding the shape of data in a feature space is important to effectively using it. In addition, by understanding the distribution of really highly dimensional data, it is possible to determine the most important modes of variation of that data, and thus represent the data in a space with many fewer dimensions.

PCA steps summary:

  1. Mean-centre the data vectors
  2. Form vectors in matrix Z, so each row corresponds to vector
  3. Do Eigendecomposition of matrix ZTZ, to get eigenvector matrix Q and diagonal eigenvalue matrix Λ:
  1. Sort columns of Q and values of Λ to decreasing order.
  2. Select L largest eigenvectors of Q (first L columns) to make transform matrix QL.
  3. Project original vectors into lower dimensional space, TL:

Overall approach

Training images:

  1. Images flattened to vectors
  2. Mean vector computed and stored
  3. Vectors mean centred
  4. PCA applied, project to lower dimensional space. Transform matrix stored.
  5. Low dimensional vectors used as training data for classifier
  1. KNN with distance threshold

Testing image:

  1. Image flattened to vector, mean vector subtracted
  2. Vector projected by PCA basis (transform matrix) to lower dimensional space
  3. Vector given to classifier, generates class label

Lec 4

Features that can be extracted from images fall into four main categories. Global features are extracted from the entire image. Region-based features are extracted from regions of the image called connected components. Detecting connected components requires that you first apply a process called segmentation to partition the image into multiple segments.

Image features can be categorised into one of four main categories:

  1. Global
  2. Grid-based (block-based)
  3. Region-based
  4. Local, “interest points”

Two-pass algorithm:

First pass

  1. Raster scan data
  2. If element not bg, then get neighbour elements
  3. No neighbours? Uniquely label current element
  4. Otherwise find neighbour with smallest label, assign current element
  5. Store equivalence between neighbouring labels

Second pass

  1. Raster scan again
  2. If element not bg, then relabel element with lowest equivalent label

Lec 5

Basic shape description involves extracting characteristic features that describe the shape of a connected component. Shape descriptors can be used together with machine learning, or manually defined models to classify shapes into different classes; a common example of this is classifying shapes into letters of the alphabet in order to achieve optical character recognition. Statistical shape models describe how the shape of an object (or set of objects) can change (through internal movement and/or through changes in pose). In addition to describing shapes, statistical shape models can be used to find instances of a shape in an image.

Lec 6

The detection of local interest points that are stable under varying imaging conditions has a huge number of applications in computer vision. Research in this area goes back as far as the 1960s and 70s. The Harris and Stephens corner detection technique developed in 1988 is a classic example of a detection technique with impressive robustness. A related problem to the detection of interest points is the problem of scale (the size at which an object appears in an image). Scale space theory allows interest point techniques to be developed that are invariant to changes in scale (i.e. the object moving further away). The Difference-of-Gaussian blob detector is an example of such a scale-space blob detection technique.

Lec 7

How to extract local features from these interest points. A number of techniques have been proposed in the past in order to extract robust local descriptors, starting from simple pixel histograms, through to advanced features like the SIFT descriptor, which encode weighted local image gradients. Once these descriptors have been extracted, they can be used for image matching.

Lec 8

When comparing two images by matching local feature, we need to eliminate mismatches. By applying geometric constraints to the problem of finding corresponding interest points between pairs of images, it is possible to both reduce the number of mismatches and potentially learn a geometric mapping between the two images. Amongst many other applications, these transforms can be used to build panoramas from multiple images. The presence of such a transform is a good indicator for a match between the two images, and is commonly used in object recognition and image retrieval applications.

  1. Select M data items at random
  2. Estimate model T
  3. Find how many of the N data items fit T within tolerance tol, call this K. points that have an absolute residual less than tol are the inliers; the other points are outliers.
  4. If K is large enough, either accept T, or compute least-squares estimate using all inliers, and exit with success.
  5. Repeat steps 1..4 nIterations items
  6. Fail - no good T fit of data

Lec 9

Content-based image retrieval (CBIR) are systems that can search for images with an image as the query. Research on CBIR systems started in the early 90’s, but it is only more recently with ubiquitous mobile computing and applications like Google Goggles that the technology has matured. We’ve seen how (local) descriptors can be used to find matching objects within images, but we’ve also seen that the matching process is rather computationally expensive.

For CBIR applications we need to be able to search datasets of millions of images almost instantaneously. In the field of textual document search, techniques to efficiently index and efficiently search massive datasets of text documents are well understood. One of the biggest advances in CBIR has been to apply these textual indexing techniques to the image domain by extracting bags of visual words from images and indexing these.

Lec 10

We’ve looked at how features can be extracted from images, and how supervised machine learning techniques like linear classifiers and k-nearest neighbours can be used to train a computer vision system to predict a class for a particular feature input. Current research is looking at how we might make computers able to see in the human sense, fully understanding the content of a visual scene. The choice of feature for image classification is very important. We saw how a Bag of Visual Words (BoVW) representation was a powerful technique for image search. It turns out that BoVWs are very useful for use in high performance image classification.

TL;DR By Mark Towers

Mark (edited by Joshua Gregory)

These notes may miss something, please say if I have. Thanks” - Mark

  1. Apply Gaussian filter to smooth the image in order to remove the noise
  2. Find the intensity gradients of the image
  3. Apply non-maximum suppression to get rid of spurious response to edge detection
  4. Apply double threshold to determine potential edges
  5. Track edge by hysteresis: Finalize the detection of edges by suppressing all the other edges that are weak and not connected to strong edges.

  1. Optimal detection with no spurious responses
  2. Good localisation with minimal distance between detected and true edge position
  3. Single response to eliminate multiple responses to a single edge.

 

Jon

  1. Robust - Changes to the environment will not affect the accuracy of the system
  2. Repeatable - A measure of robustness, such that the system must work the same over and over regardless of the environmental changes
  3. Invariant - An environmental factor that helps achieve robustness and repeatability. Hardware and software can be designed to be invariant to certain environmental changes such as invariant to illumination changes.
  4. Constraints - Are applied to the hardware, software and wetware (humans in the system) to make sure the vision system works in a repeatable and robust fashion. An example is putting the system in a box so there can’t be any illumination changes.
  1. Software constraints: Simple and fast algorithm are the most popular as it means that there is a smaller chance of failure compared to deep learning approaches. Therefore Hough transform is a popular algorithm however this requires physical constraints to make it robust. So the use of colour is an important software feature as some colour-spaces don't encode the whole human vision or does not encode the luminance of a colour.
  2. Physical constraints - In industry environments, it is possible to constrain the physical environment e.g. lighting, enclosure, mounting, camera specs, optics and filters.
  1. Euclidean distance (L2 distance) - The straight line distance between two points that is computed via an extension of Pythagoras theorem to n dimensions.
  2. Manhattan/Taxicab distance (L1 distance) - The distance along paths parallel to the axes of the space
  3. Cosine similarity - This isn’t a distance but rather the angle between the two angles. This is useful if the relative length of the vectors don’t matter.

  1. The value of K (number of clusters) is chosen
  2. The initial cluster centres are chosens (called the centroids)
  3. Till the centroids dont move after an iteration
  1. The final clusters are created by assigning all of the points to their nearest centroids.

  1. Basis is a set of n linearly independent vectors in an n-dimensional space. This means that all of the vectors are orthogonal forming a coordinate system.
  2. The first principal axis is the vector that describes the direction of greatest variance for a given set of n-dimensional data.
  3. All of the following principal axis are orthogonal (perpendicular) to all of the previous axis. E.g. second principal axis is orthogonal to the first axe while the third principal axis is orthogonal to the first and second principal axes.
  1. At most n eigenvector-value pairs
  2. If A is symmetric, the set of eigenvectors is orthogonal
  3. If A is a covariance matrix, the eigenvector are the principal axes
  4. The eigenvalues are proportional to the variance of the data along each eigenvector.
  5. If all of the eigenvectors are independent then the sorted list of eigenvectors is equal to PCA axes.
  1. The transform matrix W is just the eigenvector matrix Q from the eigendecomposition of the covariance matrix of the data. 
  2. Dimensionality reduction can be achieved by removing the eigenvectors with low eigenvalues from Q (i.e. keeping the first L columns of Q assuming the eigenvectors are sorted by decreasing eigenvalue.
  1. Mean-centre the data vector
  2. Form the vectors into a matrix Z, such that each row corresponds to a vector
  3. Perform the eigendecomposition of the matrix  to recover the eigen matrix Q and diagonal matrix
  4. Sort the columns of Q and corresponding diagonal values of  so that the eigenvalues are decreasing.
  5. Select the L largest eigenvectors of Q (the first L columns) to create the transform matrix
  6. Project the original vectors into a lower dimensional space, .
  1. Global - features are extracted from the contents of an entire image through image histograms and joint-colour histogram.
  2. Grid/block-based - The entire image split into blocks with a feature extracted from each block. This allows for multiple feature vectors to be extracted.
  3. Region-based - The entire image is split into a number of regions and a feature is extracted from each allowing for multiple feature vectors to be extracted.
  4. Local - The interest points of the image are detected with a feature vector extracted from the surrounding pixels of each interest points.

  1. Global binary thresholding - Takes a greyscale image and assigns all pixel with a value lower than a predetermined threshold to one segment with all other pixels to the other segment. This is really fast but requires a manually set static threshold making it not robust to lightning changes. Works well in applications with lots of physical constraints
  2. Otsu’s thresholding method - Provides a way to automatically find the threshold assuming that there are only two classes (i.e. foreground and background) so the histogram must have two peaks. So an exhaustive search for the threshold that maximises the interclass variance 
  3. Adaptive/local thresholding - computes a different threshold value for every pixel in an image based on the surrounding pixels. This is usually a square or rectangular window around the current pixel to define the neighbours.
  4. Mean adaptive thresholding - Set the current pixel to zero if its value is less than the mean of its neighbours plus a constant values otherwise set to 1. This relies on the parameters: size of the window and the constant offset value. This means that it deals well with uneven lighting/contrast but it is computationally expensive (compared to other global methods) and can be difficult to choose the window size. As the scale of an object can break it.
  5. Segmentation with K-means - Cluster the colour vectors [r, g, b] of all the pixels where each pixel is assigned to a segment based on the closest centroid. This works best if the colour-space and sitane function are compatible (e.g. lab colour-space is designed so that Euclidean distances are proportional to perceptual colour differences. This doesn’t preserve continuity of segments so single pixel might end up alone. Therefore also encoding the position is helpful but this should be normalised by the width and height of the image to remove the effect of different image sizes and scale x and y so they have more or less effect than the colour components.
  1. One the first pass, iterate through each element of the data by column then by row (raster scanning)
  1. On the second pass, iterate over the data in the same way
  1. The inner border is the set of pixels that are outermost pixels within the shape
  2. The outer border is the set of pixels that the outline of pixels outside the shape.
  1. Region description: A simple scalar shape features

Central moments = Translation invariant as it uses the centre of mass

Normalised central moments = Both scale and translation invariant

  1. Chain codes: By walking around the boundary and encode the direction you take on each step as a number. They cyclically shift the code so it forms the smallest possible integer value (making it invariant to the starting point).
  2. Chain code invariance: This can be made rotation invariant by encoding the differences in direction rather than absolute values. It can also be made scale invariant by resampling the components to a fixed size (however this doesn’t work well in practice).

  1. Invariance to brightness changes
  2. Sufficient texture variation in the local neighbourhood
  3. Invariance to changes between the angle / position of the scene to the camera.
  1. Corner detection - By using  a small window, the idea is that at a corner, there will be a significant change in all directions. Whereas if the window is at the edge then there is no change along the edge direction. And in a flat region there will be no change in all directions. This is dependant on measuring the change in intensity between a window and the shifted version. TODO maths
  2. Blob detection (also called Difference-of-Gaussian Extrema) - TODO
  1. Narrow-baseline stereo - When the local features in two images have only moved by a few pixels. Because of this the interest matcher, it doesn’t need robustness to rotation and light. The search can be optimised for within the window area, so the detector doesn’t need to identify a large number of points (low descriptiveness).
  2. Wide-baseline stereo - When the local features in two images have moved by large amounts (the feature has moved, rotated, etc in the image). Because of this the interest matcher, it needs to be robust to intensity change and invariant to rotation. The matcher also needs to be highly descriptive so that it identifies a large number of interest points in the image to avoid mismatches (but not so distinctive that you can’t find any matches). Also robust to small localisation errors of the interest points, the descriptor should not change too much if it moves by a few pixels but to change more rapidly once it moves further way.
  1. Not necessarily very distinctive as many interest points likely to have similar distribution of greyscale image.
  2. It is not rotation invariant if the sampling window is square or rectangular but can be overcome by using a circular window.
  3. It is not invariant to illumination changes
  4. It is sensitive to interest point localisation, the point where the interest point is identified.
  1. Allow the window to move a few pixels in any direction without changing the descriptor.
  2. By applying a weighting so that the pixels near the edge of the window have a smaller effect than the points in the centre. Commonly this is done using the gaussian weighting centred on the interest point for this.
  1. Gradient histograms are not naturally rotation invariant however they can be made invariant by finding the dominant (bin with the greatest value) orientation and cyclically shifting the histogram so the dominant orientation is the first bin.
  1. Find the SIFT feature vector of the interest point of interest and then find all of the feature vectors of the image. Using the Euclidean distance between the interest points is an easy way of finding the feature vectors that are closest to the interest point. Thresholds can be used to reject poor matches however that doesn’t work well and results in lots of mismatches.
  2. An improved matching method is to take each feature from the first image and find the two closest feature in the second image. Only form a match if the ratio of distances between the closest and second closest matches is less than a threshold (usually set at 0.8 meaning that the distance to the closest feature must be at most 80% of the second closest. This leads to a more robust matching strategy.
  1. Point transforms - take the form of  where  is the transformed coordinate,  is the transform matrix and  is the original coordinate.
  2. Affine transform - take the form of  that allows for translation, rotation, scaling, changing of the aspect ratio and shearing. This means it has 6 degrees of freedom due to the number of operations
  3. Projective transformation (planar homography) - Uses a 3x3 matrix as the transformation with the coordinator vector being [x, y, 1].
  1. K-D Trees - A binary tree structure that partitions the space along axis-aligned hyperplanes. This typically takes each dimension in turn and splits on the median of the points in the enclosing partitions but stops after a certain depth or when the number of points in a leaf is less than a threshold. To search, just walk down the tree until a leaf is hit and brute-force search to find the best in the leaf. However this doesn’t scale well to high dimensions as you tend to end up needing to search most of the tree. And there are approximate versions that won’t necessarily return the exact answer that do scale.
  2. Locality Sensitive Hashing (LSH) - Creates hash codes for vectors such that similar vectors have similar hash codes

 

  1. Binary weights - Only presence (1) or absence (0) of a term recorded in vector
  2. Raw frequency - The frequency of occurrence of term in document included in vector
  3. TF/IDF (Term frequency / Inverse document frequency) - Provides high values for rare words and low values for common words by the term frequency being the frequency count of a term in the document and the inverse document frequency being the amount of information the word provides.