Assessment Proforma 2024-25
Key Information
Module Code
|
CMT224
|
Module Title
|
Social Computing
|
Assessment Title
|
Social Computing Problem Sheet
|
Assessment Number
|
1
|
Assessment Weighting
|
100%
|
Assessment Limits
|
Submissions must be made using the notebook templates provided. Follow the instructions under “Assessment
Description” below.
|
The Assessment Calendar can be found under ‘Assessment & Feedback’ in the COMSC- ORG-SCHOOL organisation on Learning Central. This is the single point of truth for (a) the hand out date and time, (b) the hand in date and time, and (c) the feedback return date for all assessments.
Learning Outcomes
The learning outcomes for this assessment are as follows:
1. Analyse fundamental traits of complex networks by synthesising theoretical concepts and methodologies from graph theory.
2. Evaluate and implement computational approaches to model and visualise complex social phenomena.
3. Design and create software to investigate or support human interaction behaviour.
Submission Instructions
The coversheet can be found under ‘Assessment & Feedback’ in the COMSC-ORG- SCHOOL organisation on Learning Central.
All files should be submitted via Learning Central. The submission page can be found under ‘Assessment & Feedback’ in the CMT224 module on Learning Central. Your submission should consist of multiple files:
Description
|
Type
|
Name
|
Coversheet
|
Compulsory
|
One PDF (.pdf) file
|
[student_number]- coversheet.pdf
|
Part 1 Notebook
(Using the template provided on
Learning Central)
|
Compulsory
|
One IPython
Notebook file (.ipynb)
|
[student_number]-part-1.ipynb
|
Part 2 Notebook
(Using the template provided on
Learning Central)
|
Compulsory
|
One IPython
Notebook file (.ipynb)
|
[student_number]-part-2.ipynb
|
Part 3 Notebook
(Using the template provided on
Learning Central)
|
Compulsory
|
One IPython
Notebook file (.ipynb)
|
[student_number]-part-3.ipynb
|
If you are unable to submit your work due to technical difficulties, please submit your work via e-mail to comsc-submissions@cardiff.ac.ukand notify the module leader.
Assessment Description
You are tasked with analysing datasets representing different types of social and communication behaviour. These datasets are provided as files and can be found alongside this coursework pro-forma on Learning Central. Alongside the dataset files, there are 3 (THREE) IPython notebooks, named part-1.ipynb, part-2.ipynb, and part-3.ipynb, which you should solely use to complete the assignment and submit these in line with the Submission Instructions section above. The cells in each completed notebook will be ran in the order that they appear. You do not need to resubmit the dataset files.
You are required to address 16 total questions across the 3 parts. These questions are also listed below for convenience. For EACH question in EACH notebook:
1. Complete the cell below each question marked with “#CODE:” with the Python code needed to generate any new information you need for your answer. This information should be outputted when the cell is run, and any floating-point values should be presented to 2 decimal places unless they are less than 0.01.
2. Complete the cell below this marked with “ANSWER:” with your answer to the question, referring to the information outputted above (as well as any previous cell if needed). In doing so, briefly explain your approach and methods/measures used to answer the question and justify any choices made. Each answer cell should (ideally) be no more than 125 words.
You may use any Python packages installable via pip. “%pip install <some_package>” commands should be placed in the cell below “Install Python packages (pip only)” provided at the top of each notebook.
“import <some_package>” lines for all packages required for the notebook to be ran successfully should be placed in the cell under “Import Python packages” provided at the top of each notebook.
You may add additional cells throughout the notebooks, but this should be minimised.
Any code submitted will be run on a system equivalent to a Cardiff University imaged lab machine and must be submitted as stipulated in the instructions above. Any deviation from the submission instructions above (including the number and types of files submitted) may result in a mark of zero for the assessment or question part.
Staff reserve the right to invite students to a meeting to discuss coursework submissions.
Questions (duplicated from the three notebook files)
Part 1:
Examine the file "emails_cmt224.edgelist" which represents email behaviour at an organisation. Each line contains two numbers, u and v, separated by a blank space.
Consider each number as an identifier for an individual in an organisation, with the space on each line representing that the individual, u , sent at least one email to another individual, v, at some point.
Additionally, examine the JSON file "emails_cmt224_departments.json" (departments file). Keys in the departments file represent individuals using the same ids as in the "emails_cmt224.edgelist" file and the values represent a department id that the individual can be attributed to.
Model the data using an appropriate, directed network representation and answer the following questions:
Q1. How many individuals only send emails, only receive emails, or both send and receive emails?
Q2. For individuals that both receive and send emails, what proportion only do so with the same people?
Q3. How many individuals only send emails within their department?
Q4. Could the connectivity within the largest department be suggested to be reflective of a small world phenomenon in comparison to the typical connectivity of 10 comparative random networks?
Q5. Using the connections that individuals have in the network, are they more likely to mix with others in their department or those with a similar number of inward connections? You may define an appropriate assumption for similarity in your answer.
Q6. Assume the role of an outsider with complete visibility of the network that wishes to spread a hypothetical email such that everyone in the organisation would know the information it contained as quickly as possible. Also assume that the email will be forwarded in sequential timesteps using the following mechanism: if an individual is told the information in an email at timestep t, the individual will forward the email to all others that they have previously emailed before this forwarding process began at timestep t+1. Therefore, individuals should not forward the information to those that have previously emailed them that they have not sent an email to. Individuals can be told the information more than once.
If you had to select any 5 individuals to email at timestep 0, what is the fewest timesteps needed for the email to be received by everyone in the network? In determining your answer, use one or more appropriate network connectivity measures, rather than an exhaustive search through every combination of nodes in the network.
Part 2:
Examine the Graph Modelling Language (gml) files
"socialmedia_cmt224_reply_network.gml" (reply network) and
"socialmedia_cmt224_social_network.gml" (social network) which represent data for a
sample of users on an online social platform. Both networks are directed and share the
same ids for nodes (anonymised users). However, the shared user ids are contained
within the "label" attribute in the .gml files, not the node "id" attribute of each individual .gml file.
In the social network, an edge from a node, u , to some other node, v, indicates that u follows v's posts on the social media platform.
In the reply network, an edge from a node, u , to some other node, v, indicates that u
replied to one or more posts made by v. Edges are weighted with the weight representing the number of times this happened over the time period the dataset represents.
Using these networks, answer the following questions:
Q1. Are the 10 users with the most followers the same as those that have the most repliers to their posts?
Q2. Does the number of users that a user follows correlate with the number of replies that they receive?
Q3. On average, is a user's list of repliers more likely to contain more followers or not followers?
Q4. How many users have only mutual following connections (i.e., every user they follow
also follows them) and only mutual reply connections with these same users?
Q5. Are occurrences of induced, connected subgraphs of 3 individuals (triads) with only mutual connections (where connections exist) more abundant in the reply network than those with only asymmetric edges?
Part 3:
Examine the file "p2p_msg_cmt224.csv" which represents messaging behaviour between users on a messaging platform. Each row has four columns, representing a single event where a person (person_a) messaged another person (person_b) on some date (date) at some time of day (time).
From this, answer the following questions:
Q1. Build two suitable networks, with one to represent social connections based on the
messaging behaviour that took place in the first 14 days only and another to represent all message behaviour in the dataset. In doing so, assume that one or more messages from one person to another represents a mutual underlying social connection (i.e.,
regardless of whether person_a messaged person_b, person_b messaged person_a, or both at some point). Explain any assumptions and choices you make in constructing the networks.
Q2. How does the topological structure of the networks differ in terms of the number of
people and connections, how the connections are clustered, and the median shortest distance between people?
Q3. What fraction of people that only exist after the first 14 days are connected to one or more people present in the first 14 days?
Q4. Using only the people that exist in both the network created from data from the first 14 days and the network built from all message behaviour, does the number of social connections grow between these people and can the social phenomenon, ‘Triadic Closure’ , be supported?
Q5. How does the maximum 'Degree of separation' change between the people present in the first 14 days and their connectivity after all messages? How does this compare with the maximum 'Degree of separation' after all messages for all people?