From 8568e96a0952f8602313e6a318b986514f0fb24f Mon Sep 17 00:00:00 2001 From: vangef Date: Fri, 10 Mar 2023 16:52:34 +0000 Subject: [PATCH] separate README for 'inspect by hash' & changes for 'inspect gradebook' --- README-inspect.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++ README.md | 63 ++++++++------------------------------ 2 files changed, 90 insertions(+), 50 deletions(-) create mode 100644 README-inspect.md diff --git a/README-inspect.md b/README-inspect.md new file mode 100644 index 0000000..cf81a04 --- /dev/null +++ b/README-inspect.md @@ -0,0 +1,77 @@ +# **Inspect by hash** :mag: + +Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash + +## **Description** + +With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations: + +[*Inspect gradebook*](#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard* + +[*Inspect submissions*](#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files + +## **Features** + +- Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file. + + - Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*) + +- Compares the generated hashes and finds any duplicate hashes - ignores duplicates if they are by the same student/submission. + +- Finds all files with the same hash and outputs the list to a CSV file with the following information: + + - *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash` + + - *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash` + +Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files. + +## **Instructions** + +Before running the *inspect* scripts for the first time, you also need to install the *pandas* package: + +```python +python -m pip install pandas +``` + +### **Inspect gradebook** + +To inspect a *gradeboook* run **`inspect_gradebook.py`** and provide the name of the gradebook directory as an argument. + +- e.g. for the gradebook `AssignmentX` (in [*BB_gradebooks*](BB_gradebooks)/`AssignmentX`) run: + +```python +python inspect_gradebook.py AssignmentX +``` + +**Note:** run ***before*** organising a gradebook with *organise_gradebook.py* + +Generated CSV files can be found in directory `csv-inspect`, with the inspected gradebook's name as file name prefix - e.g. inspecting gradebook `AssignmentX` will create 2 CSV files: + +- `AssignmentX_gradebook_file_hashes_[datetime].csv` - all files and their hashes + +- `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes + +### **Inspect submissions** + +To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument. + +- e.g. for the organised gradebook `AssignmentX` (in [*BB_submissions*](BB_submissions)/`AssignmentX`) run: + +```python +python inspect_submissions.py AssignmentX +``` + +**Note:** run ***after*** organising a gradebook with *organise_gradebook.py* + +Generated CSV files can be found in directory `csv-inspect`, with the inspected submission's name as file name prefix - e.g. inspecting submissions for `AssignmentX` will create 2 CSV files: + +- `AssignmentX_submissions_file_hashes_[datetime].csv` - all files and their hashes + +- `AssignmentX_submissions_duplicate_[datetime].csv` - files with duplicate hashes + +*(Optional)* In order to exclude submission files from hashing, create a CSV file in directory `csv-inspect` to provide the file names to be excluded - e.g. for `AssignmentX` create: + +- `AssignmentX_excluded.csv` with a column named `exclude_filename` and list the file names + +**Note:** the directory *csv-inspect* is automatically created when you run *inspect_gradebook.py* or *inspect_submissions.py* - if you want to exclude files before the first run, you need to create it manually. diff --git a/README.md b/README.md index 1fa3b92..eef8c31 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ -# BBGradebookOrganiser +# **BBGradebookOrganiser** Blackboard Gradebook Organiser -## Description +## **Description** **Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn ⧉](https://en.wikipedia.org/wiki/Blackboard_Learn). The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review. -Optionally, after organising a gradebook, you can inspect the submissions to detect duplicated files from different submissions/students by generating and comparing SHA256 hashes. See section [Inspect submissions](#inspect-submissions-mag) for details. +Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](README-inspect.md) for more information. -### **Features** +## **Features** - Extracts, and organises per student, the content of submitted compressed files with extensions: `.zip`, `.rar`, `.7z` @@ -26,9 +26,13 @@ Optionally, after organising a gradebook, you can inspect the submissions to det - The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually -- [Inspect*s* submissions](#inspect-submissions-mag) by generating and comparing SHA256 hashes of submitted files +- [Inspect by hash](README-inspect.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect: -## Instructions + - Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard* + + - Inspect submissions: After organising a gradebook - for identical files in the files extracted from any submitted *compressed* files + +## **Instructions** ### **Download gradebook** @@ -95,52 +99,11 @@ While running, the script displays on the terminal information and stats about t - Any invalid/corrupt compressed files are moved into folder `__BAD__` inside the gradebook directory -## **Inspect submissions** :mag: +## **Inspect by hash** :mag: -### **Information** +See [***Inspect by hash***](README-inspect.md) for more information & details. -- Generates SHA256 hashes for each submitted file, and outputs list to CSV file - - - Can exclude files from hashing, if provided with a CSV file listing the file names - -- Compares the generated hashes and finds any duplicate hashes - ignores duplicates if they are by the same student/submission - -- Finds all files with a duplicated hash and outputs them to CSV file with the following information: *Student ID*, *file path*, *file name* (without path), *SHA256 hash* - - - Further inspection and filtering needs to be done manually - -### **Usage** - -For this feature you also need to install the *pandas* package: - -```python -python -m pip install pandas -``` - -  -To inspect the submissions run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook as an argument. - -- e.g. for the organised gradebook `AssignmentX` (in [*BB_submissions*](BB_submissions)/`AssignmentX`) run: - -```python -python inspect_submissions.py AssignmentX -``` - -**Note:** run ***after*** organising a gradebook with [*organise_gradebook.py*](organise_gradebook.py). - -Generated CSV files can be found in directory `csv`, with the inspected submission's name as file name prefix - e.g. inspecting submissions for `AssignmentX` will create 2 CSV files: - -- `AssignmentX_file_hashes_[datetime].csv` - all files and their hashes - -- `AssignmentX_suspicious_[datetime].csv` - files with duplicate hashes - -*(Optional)* In order to exclude files from hashing, create a CSV file in directory `csv` to provide the file names to be excluded - e.g. for `AssignmentX` create: - -- `AssignmentX_excluded.csv` with a column named `exclude_filename` and list the file names - -**Note:** the directory `csv` is automatically created when you run `inspect_submissions.py` - you need to create it manually if you want to exclude files before the first run. - -## General notes +## **General notes** The Blackboard generated name for submission files must follow the pattern: > ANYTHING_STUDENTNUMBER_attempt_DATETIME_FILENAME