代写COMP24011 Lab3

2023-11-23 代写COMP24011 Lab3

COMP24011 Lab3:

Features for Estimating Autonomous Vehicle Poses

Riza Batista-Navarro and Francisco Lobo

Academic session: 2023-24

Introduction

In this exercise, you will investigate matching visual features in a series of images captured during

the navigation of an autonomous vehicle (AV). These features are used in estimating poses (i.e.,

camera trajectories) based on a visual odometry algorithm. Let’s start by introducing some terms

and their definitions to help clarify concepts in autonomous robot navigation.

Odometry is the use of sensors to estimate a robot’s change in position relative to a known posi￾tion. Visual odometry (VO) is a specific type of odometry where only cameras are used as sensors,

as opposed to using, e.g., global positioning system (GPS) sensors or light detection and ranging

(LIDAR) sensors. It is based on the analysis of a sequence of camera images. Simultaneous loc￾alisation and mapping (SLAM) is a task whereby a robot needs to build a map of its current

environment while at the same time trying to determine its position relative to that map.

In this exercise, you will explore a monocular (single-camera) VO solution to the SLAM Evaluation

2012 challenge, which made use of the KITTI data set. However, it is worth noting that VO is

limited in that it can only perform trajectory estimation after each pose, and hence trajectory

optimisation is achieved only locally. In contrast, global optimisation is achieved through loop

closure: the correction of the trajectory upon revisiting an already encountered location.

The original KITTI data set consists of 4541 images and occupies several gigabytes. To reduce disk

quota and running time requirements, this lab exercise will use a simplified version of the data set

that we refer to as MyKITTI. This contains only the first 501 images of the original, but these still

occupy close to 150 MB and so cannot be included in your GitLab repo. For this reason, you

will need to download MyKITTI.zip from Dropbox and extract it.


The examples in this manual assume that you extract MyKITTI.zip in your home directory,

and hence that the path to the data set is ~/MyKITTI. The command-line tool for this lab

accepts an optional parameter allowing you to set a different path for the MyKITTI images.

In the lab3 branch of your COMP24011_2023 GitLab repo you will find a fully functional monocular

VO system. The implementation provided to you builds upon another library available on GitHub,

which was ported to Python 3 and recent OpenCV 4 versions, and has additional functionality

specific to this lab exercise. To use our VO system, you will need to install the Python bindings

for OpenCV by issuing the following command

$ pip install opencv-contrib-python

OpenCV is a comprehensive computer vision framework. It is very popular in both academic and

industrial contexts, and thus benefits from active development.


The lab code depends on OpenCV 4 functionality which has changed in recent years. Indeed,

the continuous development of OpenCV has involved changes in data structures as well as

changes in the API due to patent issues. The above command installs version 4.8.0.x on the

Kilburn lab machines which will be the only supported version for this assignment.

In this exercise you will use OpenCV to complete computer vision tasks, which will involve getting

familiar with its 2D Features Framework. The VO system for the lab uses the Scale-Invariant

Feature Transform (SIFT) algorithm for feature detection and matching, so you will find the

OpenCV-Python tutorials on these topics very useful.

1

The code we provide already implements feature tacking, which identifies feature correspondences

between adjacent images. Your job is to implement feature matching and allow for inspecting any

feature matches. This will require coding the following feature matching strategies:

1) distance thresholding,

2) nearest neighbour, and

3) nearest neighbour distance ratio.

You will also need to extract relevant information about the resulting feature matches; namely the

coordinates of all the features involved and the distance for each matching pair.

The VO System

Once you refresh the lab3 branch of your GitLab repo you will find the following Python files.

run_odometry.py This is the command-line tool that runs the visual odometry accord￾ing to the subcommand (and the parameters) provided by the user. It

contains the RunOdometry class.

visual_odometry.py This is the main module of the visual odometry system implementing

the PinholeCamera and VisualOdometry classes.

vision_tasks_base.py This module contains the VisionTasksBase “abstract” class that spe￾cifies the signatures of four methods you need to implement, and imple￾ments the interface used in RunOdometry and VisualOdometry.

vision_tasks.py This is the module that you need to complete for this exercise. It contains

the VisionTasks class that is derived from VisionTasksBase, and must

implement its abstract methods in order to complete feature matching

and retrieve their details.

In order to complete this exercise you will need to understand both vision_tasks_base.py

and vision_tasks.py but you do not need to know the details of how run_odometry.py or

visual_odometry.py are coded.

The VO tool provides comprehensive help messages. To get started run the command

$ ./run_odometry.py -h

usage: run_odometry.py [-h] -d DATASET

{view_trajectory,view_feature,view_info,get_info} ...

options:

-h, --help show this help message and exit

-d DATASET, --dataset DATASET

path to KITTI dataset directory (option required

unless dataset is located at ~/MyKITTI)

subcommands:

select which odometry command to run

{view_trajectory,view_feature,view_info,get_info}

view_trajectory show the car camera view and calculated trajectory

view_feature show the matches for a frame feature using OpenCV (and

save this image as opencv_visual.png)

view_info show the matches for a frame feature using both OpenCV

and the info algorithm (and save this composite image

as custom_visual.png)

get_info use the info algorithm to get details of matches for a

frame feature

2

As noted in the introduction, you do not need to specify where the MyKITTI data set is located

if you extract it to ~/MyKITTI, otherwise every time you run run_odometry.py you will need

to add the appropriate -d option. The VO tool supports four subcommands: view_trajectory,

view_feature, view_info and get_info. Each of these subcommands has its own help message

which you can access with commands like

$ ./run_odometry.py view_trajectory -h

usage: run_odometry.py view_trajectory [-h] [frame_id]

positional arguments:

frame_id index of frame to stop visualisation (1 to 500)

options:

-h, --help show this help message and exit

Thus, if you extracted MyKITTI at /tmp/MyKITTI and want to visualise the trajectory of the AV

up to frame 150, you should run the command

$ ./run_odometry.py -d /tmp/MyKITTI view_trajectory 150

vo params: (None, None, ’~/MyKITTI’)

debug run: view_trajectory(150,)

Processing frame 150

ret value: None

You should see the trajectory and camera view in real time (depending on processing power). The

VO tool will open a window that sequentially displays the first 151 images in the data set (with

IDs 0 to 150), and will show in another window the true trajectory (drawn in green) and the

estimated trajectory (drawn in red) of the vehicle. While it is running you will get on the shell

progress indication showing the frame index being processed.


You need to use a graphical environment that the OpenCV library supports, in order to

run the view_trajectory, view_feature and view_info subcommands. These will work

as expected if you use the Ubuntu graphical login on Kilburn lab machines.

Determining the vehicle’s trajectory only involves feature tracking, which has been implemented

for you in VisionTasksBase. Once you start implementing the feature matching algorithms, you

can begin using the subcommands to show and report on feature matches. You will probably use

view_feature first, as this requires you to code only a particular matching algorithm. To get the

command-line help run

$ ./run_odometry.py view_feature -h

usage: run_odometry.py view_feature [-h] -a {dt,nn,nndr} -t THRESHOLD

frame_id feature_id

positional arguments:

frame_id index of chosen frame (1 to 500)

feature_id index of chosen feature (0 to 1499)

options:

-h, --help show this help message and exit

-a {dt,nn,nndr}, --algorithm {dt,nn,nndr}

name of the matching algorithm used to match features

-t THRESHOLD, --threshold THRESHOLD

decimal value of threshold for feature matching

(option required except for nn algorithm)

3

The subcommands view_info and get_info have similar command-line options. The possible

choices for algorithm correspond to the feature matching strategies that you’ll implement in Tasks

1 to 3 below. These are:

dt to use your implementation of distance thresholding,

nn to use your implementation of nearest neighbour, and

nndr to use your implementation of nearest neighbours distance ratio.

The valid ranges of threshold value depend on the choice of matching algorithm, as follows:

• between 0 and 500 (inclusive1

) for the dt and nn,

• between 0 and 1 (inclusive) for the nndr algorithm.

The threshold value is optional when using nn as described in Task 2.

As an example, once you completed Task 1 below, you can view the matches of the feature with

index 127 of frame 100 according to the distance thresholding algorithm for a threshold value of

232.5 by running

$ ./run_odometry.py view_feature -a dt -t 232.5 101 127

vo params: (’dt’, 232.5, ’~/MyKITTI’)

debug run: view_feature(101, 127)

Processing frame 101

ret value: None

This will display frames 100 and 101 side-by-side and OpenCV, using its own cv2.drawMatchesKnn

function, will draw lines from feature 127 in the first frame to its matches in the second, as shown

in Figure 1.

Figure 1: Viewing a feature

Because of the small size of Figure 1, it is difficult to see that feature 127 is the bottom right

corner of the lower window in frame 100, and that it is matched with the bottom right corners of

the upper and lower windows in frame 101, but this will be easier on the screen.


While frame indices correspond to images in the MyKITTI data set, feature indices are

generated by OpenCV at runtime. This means that there is no guarantee that feature 127 is

as shown in Figure 1 if you run the above example. There will be a feature corresponding to

the bottom right corner of the lower window in frame 100, but depending on the C libraries

installed on the lab machine this feature may be assigned a different index.


This issue affects all of the examples below (though hopefully feature indices won’t be too

unreliable when using Kilburn lab machines). However, please note that this will not affect

the marking as both your solution and the reference implementation will be running on

the same machine!

You will be able to use the view_info and get_info subcommands, once you’ve completed Task 4

below. Using view_info instead of view_feature will produce the image in Figure 2. It combines

the OpenCV visualisation of feature matches with a rendering of the coordinates that your imple￾mentation of VisionTasks.matching_info() returns (in green). This will allow you to visually

inspect if your code calculates the correct coordinate values.

1. As this value is applied on Euclidean distances, in principle the threshold can be any positive number. However,

for this exercise, only threshold values between 0 and 500 (inclusive) will be tested.

4

Figure 2: Viewing feature information

To get the details of the matches that your function returns, you need to run

$ ./run_odometry.py get_info -a dt -t 232.5 101 127

vo params: (’dt’, 232.5, ’../MyKITTI’)

debug run: get_info(101, 127)

Processing frame 101

ret value: ((1066, 152),

[(1032, 152), (1033, 9)],

[70.21395874023438, 172.14529418945312])

ret count: 3

As the example shows, this will be a tuple with the following three elements:

• a tuple of two integers,

giving the 𝑥 and 𝑦 coordinates of the chosen feature in the previous frame,

• a list of tuples,

each with 𝑥 and 𝑦 coordinates in the current frame matching the chosen feature; and

• a list of floats,

each giving the distance between the chosen feature and the corresponding match.

For the example, in frame 100 the bottom right corner of the lower window is located (1066, 152)

and its matches in frame 101 have coordinates (1032, 152) and (1033, 9) on the lower and upper

window, respectively.

Assignment

For this lab exercise, the only Python file that you need to modify is vision_tasks.py. You will

develop your own version of this script, henceforth referred to as “your solution” in this document,

following the tasks outlined below.

Although you do not need to make any changes to the vision_tasks_base.py script, it is advisable

that you familiarise yourself with its contents before you proceed to developing your solution. In

particular, it is useful to take note of the following details in relation to the VisionTasksBase

class:

• Its constructor creates a detector object by calling the OpenCV function cv2.SIFT_create,

which firstly creates a feature detector based on the Scale-Invariant Feature Transform (SIFT)

algorithm that is configured to extract 1500 features from each image (as specified by the

NUM_FEATURES constant).

• Its featureMatching function takes two images as arguments: cur_image, the image cor￾responding to the <frame_id> given as input to drive_car; and prev_image, the image

immediately preceding cur_image in the frame sequence.

5

• By calling the detectAndCompute function of the detector object, a list of keypoints and a

list of descriptors are computed for each of prev_image and cur_image.

Note that the size of the list of keypoints and list of descriptors returned, is the same as

NUM_FEATURES.

• A feature matching strategy — based on your own implementation of the dt, nn and nndr

functions in vision_tasks.py — is then applied through a call to matching_algo, taking

as parameters the list of descriptors extracted for each of prev_image and cur_image, as

well as a threshold value (if provided).

This lab exercise requires you to implement three feature matching algorithms (which were ex￾plained in the Week 7 lectures) and one function for determining the coordinates of the matches.

Stub code has been provided in vision_tasks.py, which you must modify to turn it into your

own solution.

In your implementation, you can make use of a class called BFMatcher (short for brute-force

matcher). This is an OpenCV class that implements some functions for computing the similarity

(e.g., in terms of Euclidean distance) between image descriptors (i.e., features).

bf = cv2.BFMatcher()

Without supplying any input arguments to cv2.BFMatcher(), Euclidean distance is used as the

distance metric by default (which is what we need).

For each of the tasks below, you should use the knnMatch function of the BFMatcher object to

compute matches between two sets of descriptors. For example:

matches = bf.knnMatch(des1, des2, k=100)

will return the closest 100 matches for every descriptor in an image, in order of increasing Euclidean

distance.

The arguments des1 and des2 correspond to the descriptors computed for the query (preceding

or previous) image and reference (current) image, respectively. Meanwhile, the value of k specifies

the maximum number of closest matches that should be returned for each query descriptor.

The above call to knnMatch will return a list matches which is of the same size as the number of

descriptors in the query image. Each element in matches is a list itself that contains k elements

which are of type DMatch.


The return type of the feature matching functions that you are required to implement should

similarly be a list of lists of DMatch objects; however, you need to write code to remove any

matches that should be eliminated depending on the specified algorithm and/or threshold

value.

Any DMatch object has the following attributes, which you will likely find useful:

distance the Euclidean distance between the query descriptor and the matched descriptor,

queryIdx the index of the query keypoint in the list of keypoints detected from the query (i.e.

preceding) image,

trainIdx the index of the reference keypoint in the list of keypoints detected from the reference

(i.e. current) image.

For example, if m is a DMatch object, one can obtain the value of the Euclidean distance between

a query descriptor and the matched descriptor with:

dist_value = m.distance

6

A DMatch object contains information on which keypoints from the query and reference images have

been considered as a match. The first step towards obtaining these keypoints is by determining

their indices, as follows:

query_kp_index = m.queryIdx

ref_kp_index = m.trainIdx

In the above, each of query_kp_index and ref_kp_index is an integer which is an index. Thus,

one can obtain the actual keypoints with:

query_kp = prev_keypoints[query_kp_index]

ref_kp = cur_keypoints[ref_kp_index]

assuming that prev_kp and cur_kp are the list of keypoints detected from the query (preceding

or previous) and reference (current) images, respectively. Here, each of query_kp and ref_kp is of

type KeyPoint. It has an attribute called pt which is a tuple of two elements, corresponding to

the 𝑥 and 𝑦 coordinates of the keypoint. However, the coordinates are provided as floating point

values, thus one needs to cast them into integers, as in the following example:

x_coord_query_kp = int(query_kp.pt[0])

y_coord_query_kp = int(query_kp.pt[1])

Bearing in mind the above notes on OpenCV functions and classes, you can proceed to completing

the tasks below.

Task 1: In your solution, write a function called dt that implements feature matching based on

the distance thresholding algorithm. It returns a list of lists of DMatch objects.

You can verify that your function behaves correctly on the command line. For example, you should

obtain output that is similar to the following.

$ ./run_odometry.py get_info -a dt 150 209 -t 100

vo params: (’dt’, 100.0, ’~/MyKITTI’)

debug run: get_info(150, 209)

Processing frame 150

ret value: ((226, 119), [(208, 115)], [72.02082824707031])

ret count: 3

$ ./run_odometry.py get_info -a dt 150 209 -t 300

vo params: (’dt’, 300.0, ’~/MyKITTI’)

debug run: get_info(150, 209)

Processing frame 150

ret value: ((226, 119), [(208, 115), (394, 52), (519, 213), \

(872, 160), (402, 48)], [72.02082824707031, 242.6643829345703, \

282.074462890625, 290.75762939453125, 294.5997314453125])

ret count: 3

Task 2: In your solution, write a function called nn that implements feature matching based on

the nearest neighbours algorithm. It returns a list of lists of DMatch objects.

Recall that the nearest neighbours algorithm can be used with or without a threshold

value; hence the --threshold argument is optional if --algorithm is set to nn. In this

case your function will get the value None as the threshold argument.

Again, you can verify that your function behaves correctly on the command line. For example, you

should obtain output that is similar to the following.

7

$ ./run_odometry.py get_info -a nn 150 209

vo params: (’nn’, None, ’~/MyKITTI’)

debug run: get_info(150, 209)

Processing frame 150

ret value: ((226, 119), [(208, 115)], [72.02082824707031])

ret count: 3

$ ./run_odometry.py get_info -a nn 150 209 -t 100

vo params: (’nn’, 100.0, ’~/MyKITTI’)

debug run: get_info(150, 209)

Processing frame 150

ret value: ((226, 119), [(208, 115)], [72.02082824707031])

ret count: 3

$ python run_odometry.py get_info -a nn 150 209 -t 50

vo params: (’nn’, 50.0, ’~/MyKITTI’)

debug run: get_info(150, 209)

Processing frame 150

ret value: ((0, 0), [], [])

ret count: 3

Task 3: In your solution, write a function called nndr that implements feature matching based

on the nearest neighbours distance ratio algorithm. It returns a list of lists of DMatch

objects.

As above, you can verify that your function behaves correctly on the command line. For example,

you should obtain output that is similar to the following.

$ ./run_odometry.py get_info -a nndr 120 45 -t 0.97

vo params: (’nndr’, 0.97, ’~/MyKITTI’)

debug run: get_info(120, 45)

Processing frame 120

ret value: ((22 , 14), [(61 , 163)], [156.46725463867188])

ret count: 3

$ ./run_odometry.py get_info -a nndr 120 45 -t 0.93

vo params: (’nndr’, 0.93, ’~/MyKITTI’)

debug run: get_info(120, 45)

Processing frame 120

ret value: ((0, 0), [], [])

ret count: 3

Task 4: In your solution, write a function called matching_info that takes the matches obtained

by any of the above-described feature matching functions, and determines the image

coordinates of the query keypoint and of every matching reference keypoint. It returns

a tuple with the following three elements:

• a tuple of two integers, which corresponds to the 𝑥 and 𝑦 coordinates of the query

keypoint;

• a list of tuples, where each tuple has two integers corresponding to the 𝑥 and 𝑦

coordinates of a matching reference keypoint; and

• a list of floats, which corresponds to the distances between the query keypoint and

the matching keypoints.

If the matches passed on the function is an empty list, the return value should be a

tuple whose first element is (0,0) and whose second and third elements are empty lists:

((0, 0), [], []).

8

You can verify that your function behaves correctly on the command line by inspecting the return

values of of the get_info subcommand, as exemplified in Tasks 1, 2 and 3 above. Additionally,

you can use the view_info subcommand to visualise the coordinates that you have obtained and

check if they are the same as what OpenCV’s own cv2.drawMatchesKnn function returns.

Submission

Please follow the README.md instructions in your COMP24011_2023 GitLab repo. Refresh the files of

your lab3 branch and develop your solution to the lab exercise. The solution consists of a single file

called vision_tasks.py which must be submitted to your GitLab repo and tagged as lab3_sol.

The README.md instructions that accompany the lab files include the git commands necessary to

commit, tag, and then push both the commit and the tag to your COMP24011_2023 GitLab repo.

Further instructions on coursework submission using GitLab can be found in the CS Handbook,

including how to change a git tag after pushing it.

The deadline for submission is 18:00 on Friday 24th November. In addition, no work will be

considered for assessment and/or feedback if submitted more than 2 weeks after the deadline. (Of

course, these rules will be subject to any mitigating procedures that you have in place.)

The lab exercise will be auto-marked offline. The automarker program will download your sub￾mission from GitLab and test it against our reference implementation. For each task the return

value of your function will be checked on a random set of valid arguments. A time limit of 10 seconds

will be imposed on every function call, and exceeding this time limit will count as a runtime error.

If your function does not return values of the correct type, this will also count as a runtime error.

A total of 20 marks is available in this exercise, distributed as shown in the following table.

Task Function Marks

1 VisionTasks.dt() 5

2 VisionTasks.nn() 5

3 VisionTasks.nndr() 5

4 VisionTasks.matching_info() 5

The marking scheme for all tasks is as follows:

• You obtain the first 0.5 marks if all tests complete without runtime errors.

• The proportion of tests with fully correct return values determines the remaining 4.5 marks.

Important Clarifications

• It will be very difficult for you to circumvent time limits during testing. If you try to do this,

the most likely outcome is that the automarker will fail to receive return values from your

implementation, which will have the same effect as not completing the call. In any case, an

additional time limit of 300 seconds for all tests of each task will be enforced.

• This lab exercise is fully auto-marked. If you submit code which the Python interpreter does

not accept, you will score 0 marks. The Python setup of the automarker is the same as the one

on the department’s Ubuntu image, but only a minimal set of Python modules are available.

If you choose to add import statements to the sample code, it is your responsibility to

ensure these are part of the default Python package available on the lab machines.

• It doesn’t matter how you organise your lab3 branch, but you should avoid having multiple

files with the same name. The automarker will sort your directories alphabetically (more

specifically, in ASCII ascending order) and find submission files using breadth-first search. It

will mark the first vision_tasks.py file it finds and ignore all others.

• Every file in your submission should only contain printable ASCII characters. If you include

other Unicode characters, for example by copying and then pasting code from the PDF of

the lab manuals, then the automarker is likely to reject your files.

9