INFS7410代写、Python设计编程代做
2023-10-06
INFS7410 Project - Part 2version 1.1PreambleThe due date for this assignment is 27 October 2023 1600 Eastern Australia Standard Time.This part of the project is worth 20% of the overall mark for INFS7410 (part 1 + part 2 = 40%). A detailedmarking sheet for this assignment is provided alongside this notebook. The project is to be completedindividually.We recommend that you make an early start on this assignment and proceed in steps. There are severalactivities you may have already tackled, including setting up the pipeline, manipulating the queries,implementing some retrieval functions, and performing evaluation and analysis. Most of the assignment relieson knowledge and code you should have already experienced in the computer practicals; however, there aresome hidden challenges here and there that you may require some time to solve.AimProject aim: The aim of this project is for you to implement several neural information retrieval methods,evaluate them and compare them in the context of a multi-stage ranking pipeline.The specific objectives of Part 2 are to:Set up your infrastructure to index the collection and evaluate queries.Implement neural information retrieval models (only inference).Examine your ability to perform evaluation and analysis when different neural models are used.The Information Retrieval Task: Web Passage RankingAs in part 1 of the project, in part 2 we will consider the problem of open-domain passage ranking in answer toweb queries. In this context, users pose queries to the search engine and expect answers in the form of aranked list of passages (maximum 1000 passages to be retrieved).The provided queries are actual queries submitted to the Microsoft Bing search engine. There are approximately8.8 million passages in the collection, and the goal is to rank them based on their relevance to the queries.What we provide you with:Files from practicalA collection of 8.8 million text passages extracted from web pages ( collection.tsv — provided in Week1).Pytorch file for ANCE model(refer to week10-prac ).Standard DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERTDR").eval() to load this model.Extra files for this projectA query dev file that contains 30 queries for you to perform retrieval experiments( data/dev_queries.tsv ).A query dev file that contains 30 queries (same query ids with previous one, but with typos in the querytext ( data/dev_typo_queries.tsv )A qrel file that contains relevance judgements for you that can be used to tune your methods for devqueries( data/dev.qrels ).A leaderboard system for you to evaluate how well your system performs.A test query file that contains 60 queries for you to generate run files to submit to the leaderboard( data/test_queries.tsv ).This jupyter notebook, which you will include inside your implementation, evaluation and report.An hdf5 file that contains TILDEv2 pre-computed terms weights for the collection. Download from this linkTypo-aware DPR model, use BertModel.from_pretrained("ielabgroup/StandardBERT-DRaug").eval() to load this model.Put this notebook and the provided files under the same directory.What you need to produceYou need to produce:Correct implementations of the methods required by this project's specifications.An explanation of the retrieval methods used, including the formulas that represent the models youimplemented and the code that implements that formula, an explanation of the evaluation settingsfollowed, and a discussion of the findings. Please refer to the marking sheet to understand how each ofthese requirements is graded.You are required to produce both of these within this jupyter notebook.Required methods to implementIn Part 2 of the project, you are required to implement the following retrieval methods as two-stage rankingpipelines (bm25 + one dense retriever). All implementations should be based on your code (except for BM25,where you can use the Pyserini built-in SimpleSearcher).1. ANCE Dense Retriever: Use ANCE to re-rank BM25 top-k documents. See the practical in Week 10 forbackground information.2. Standard DPR Dense Retriever: Use standard DPR to re-rank BM25 top-k documents. See the practical inWeek 10 for background information.. Typo-aware DPR Dense Retriever: typo-aware DPR is a DPR model that is fine-tuned with augumentedtypos in the training samples, please use this model (provided in the project) to re-rank BM25 top-kdocuments, the inference is the same to standard DPR Dense Retriever.4. TILDEv2: Use TILDEv2 to re-rank BM25 top-k documents. See the practical in Week 10 for backgroundinformation.For TILDEv2, unlike what you did in practical, we offer you the pre-computed term weights for the wholecollection (for more details, see the Initial packages and functions cell). This means you can have afast re-ranking speed for TILDEv2. Use this advantage to trade off effectiveness and efficiency for your rankingpipeline implementation.You should have already attempted many of these implementations above as part of the computer pracexercises.Required evaluation to performIn Part 2 of the project, you are required to perform the following evaluation: we consider two types of queries,one of which contains typos (i.e. typographical mistakes, like writing iformation for information , andanother one with the typos resolved. An important aspect of the evaluation in the project is to compare theretrieval behaviour of search methods on queries with and without typos (note this is the same as project part1).1. For all methods, evaluate their performance on data/dev_typo_queries.tsv (queries with typos) anddata/dev_queries.tsv (the same queries, but typos are corrected), using data/dev.qrels withfour evaluation metrics (see below).2. Report every method's effectiveness and efficiency (average query latency) on thedata/dev_queries.tsv (no need for typo queries) and the corresponding cut-off k for rerankinginto a table. Perform statistical significance analysis across the results of the methods and report themin the tables.. Produce a gain-loss plot that compares the most and least effective ones of the four required methodsabove in terms of nDCG@10 on data/dev_typo_queries.tsv .4. Comment on trends and differences observed when comparing your findings.Does the typo-aware DPR model outperform the others on the data/dev_typo_queries.tsvqueries?When evaluating the data/dev_queries.tsv queries, is there any indication that this model losesits effectiveness?Is this gain/loss statistically significant? (remember to perform a t-test as well for this task).5. (optional) submit your runs on the data/test_queries.tsv based on your implemented methodsfrom the dev sets to the leaderboard system (not counted in your mark for this assignment, but thetop-ranked student on the leaderboard could request for a recommendation letter from Professor GuidoZuccon). The submission link is: https://infs7410.uqcloud.net/leaderboard/, other instructions refer toProject 1.Regarding evaluation measures, evaluate the retrieval methods with respect to nDCG at 10 ( ndcg_cut_10 ),reciprocal rank at 1000 ( recip_rank ), MAP ( map ) and Recall at 1000 ( recall_1000 ).For all statistical significance analysis, use a paired t-test and distinguish between p<0.05 and p<0.01.How to submitYou will have to submit one file:1. A zip file containing this notebook (.ipynb) and this notebook as a PDF report. The code should be ableto be executed by us. Remember to include all your discussion and analysis in this notebook and report,not as a separate file.Tips: for printing as a pdf, you can first save and export as HTML in jupyter and use thebrowser's print function to save as a pdf.2. It needs to be submitted via the link in the INFS7410 BlackBoard site by 27 October 2023, 1600 EasternAustralia Standard Time, unless you have been given an extension (according to UQ policy), before thedue date of the assignment.Initial packages and functionsUnlike prac week 10 which we compute contextualized term weights with TILDEv2 in an "on-the-fly" manner. Inthis project, we provide an hdf5 file that contains pre-computed term weights for all the passages in thecollection.Frist, pip install the h5py library:!pip install h5pyCollecting h5pyDownloading h5py-3.4.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.9 MB)2.9 MB 10.4 MB/s eta 0:00:01Collecting cached-propertyUsing cached cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)Requirement already satisfied: numpy>=1.14.5 in /Users/s4416495/anaconda3/envs/infs7410/lib/python3.7/site-packages (from h5py) (1.21.1)Installing collected packages: cached-property, h5pySuccessfully installed cached-property-1.5.2 h5py-3.4.0The following cell gives you an example of how to use the file to access token weights and their correspondingtoken ids given a document id.Note: make sure you have already downloaded the hdf5 file introduced above and placed it in a validlocationIn [2]:import h5pyfrom transformers import BertTokenizerf = h5py.File("tildev2_weights.hdf5", 'r')weights_file = f['documents'][:] # load the hdf5 file to the memory.docid = 0token_weights, token_ids = weights_file[docid]tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')for token_id, weight in zip(token_ids.tolist(), token_weights): print(f"{tokenizer.decode([token_id])}: {weight}")presence: 3.62109375communication: 7.53515625amid: 5.79296875scientific: 6.140625minds: 6.53515625equally: 3.400390625important: 6.296875success: 7.19140625manhattan: 9.015625project: 5.45703125scientific: 5.1640625intellect: 7.328125cloud: 6.1171875hanging: 3.318359375impressive: 6.5234375achievement: 6.48828125atomic: 8.421875researchers: 4.9375engineers: 6.203125what: -1.1708984375success: 6.421875truly: 3.67578125meant: 4.25hundreds: 3.19140625thousands: 2.98828125innocent: 5.12890625lives: 3.029296875ob: 2.35546875##lite: 1.427734375##rated: 2.828125importance: 7.96484375purpose: 4.69140625quiz: 3.28515625scientists: 5.0390625bomb: 3.7109375genius: 3.8828125development: 2.55859375solving: 3.224609375significance: 3.90625successful: 5.0703125intelligence: 5.35546875solve: 2.751953125effect: 1.2392578125objective: 2.2265625research: 1.953125_: -2.36328125accomplish: 2.759765625brains: 4.046875progress: 1.6943359375scientist: 3.0234375Note, these token_ids include stopwords' ids, remember to remove stopwords' ids for query tokens.# Import all your python libraries and put setup code here.Double-click to edit this markdown cell and describe the first method you are going to implement, e.g., ANCE# Put your implementation of methods here.In [18]: In [ ]: In [ ]:When you have described and provided implementations for each method, include a table with statisticalanalysis here.For convenience, you can use tools like this one to make it easier:https://www.tablesgenerator.com/markdown_tables, or if you are using pandas, you can convert dataframes tomarkdown https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_markdown.html