Compare commits
31 Commits
b6c52ac26f
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| cc4d700028 | |||
| 5cad017a83 | |||
| 5a2d03db7d | |||
| 5f91e08b00 | |||
| f25688dc9f | |||
| beefb025d6 | |||
| b7f9db0efc | |||
| 3d86409f75 | |||
| 6a2144517b | |||
| 9ca32f1e48 | |||
| d3767b54a5 | |||
| de7dc817aa | |||
| ebc7a2599d | |||
| 71092daee0 | |||
| c5ad6ed5f0 | |||
| 51024deac4 | |||
| d04dac9b97 | |||
| c92a77ae5e | |||
| dd350e5190 | |||
| 0385e13da7 | |||
| 7577148f83 | |||
| 8a4dee8e73 | |||
| 08ffefa798 | |||
| bf7aaa12f2 | |||
| 08281194c2 | |||
| 81fe02e9df | |||
| 2381b26cca | |||
| 2217988f96 | |||
| 0841a1a478 | |||
| 196e215133 | |||
| f011cdcda0 |
11
.gitignore
vendored
11
.gitignore
vendored
@@ -126,7 +126,6 @@ dmypy.json
|
|||||||
.pyre/
|
.pyre/
|
||||||
|
|
||||||
# BBGradebookOrganiser
|
# BBGradebookOrganiser
|
||||||
TODO
|
|
||||||
BB_gradebooks/
|
BB_gradebooks/
|
||||||
BB_submissions/
|
BB_submissions/
|
||||||
csv-inspect/
|
csv-inspect/
|
||||||
@@ -138,3 +137,13 @@ csv-inspect/
|
|||||||
|
|
||||||
mkdocs.yml
|
mkdocs.yml
|
||||||
/site
|
/site
|
||||||
|
|
||||||
|
# vangef
|
||||||
|
|
||||||
|
requirements.*.txt
|
||||||
|
!requirements.txt
|
||||||
|
|
||||||
|
___*.py
|
||||||
|
venv*
|
||||||
|
.TODO
|
||||||
|
.NOTES
|
||||||
|
|||||||
@@ -4,11 +4,15 @@ Blackboard Gradebook Organiser - main (functional) changes and new features log
|
|||||||
|
|
||||||
## **Notable updates**
|
## **Notable updates**
|
||||||
|
|
||||||
|
2024-04-30 Restructure documentation - separate *Inspect by hash*
|
||||||
|
|
||||||
|
2024-03-01 Allow customisation of default settings - most useful default to edit is `IGNORE_DIRS`: the list of names for directories, or files, to ignore when extracting from compressed files
|
||||||
|
|
||||||
2023-07-17 Documentation updated and web docs added at [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser)
|
2023-07-17 Documentation updated and web docs added at [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser)
|
||||||
|
|
||||||
2023-03-16 Hyperlinks for file paths and names listed in generated CSV files by *inspect by hash*
|
2023-03-16 Hyperlinks for file paths and names listed in generated CSV files by *inspect by hash*
|
||||||
|
|
||||||
2023-03-10 Added *inspect gradebook* and merged with *inspect submission* to make [***inspect by hash***](inspect.md)
|
2023-03-10 Added *inspect gradebook* and merged with *inspect submission* to make [***inspect by hash***](inspect/about.md)
|
||||||
|
|
||||||
2023-03-02 Added *exclude files from hashing*
|
2023-03-02 Added *exclude files from hashing*
|
||||||
|
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ Blackboard Gradebook Organiser
|
|||||||
|
|
||||||
**Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn ⧉](https://en.wikipedia.org/wiki/Blackboard_Learn).
|
**Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn ⧉](https://en.wikipedia.org/wiki/Blackboard_Learn).
|
||||||
The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review.
|
The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review.
|
||||||
Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](inspect.md) for more information.
|
Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](inspect/about.md) for more information.
|
||||||
|
|
||||||
## **Features**
|
## **Features**
|
||||||
|
|
||||||
@@ -18,7 +18,15 @@ Optionally, you can inspect the submissions for identical files (by generating a
|
|||||||
|
|
||||||
- Detects invalid/corrupt files
|
- Detects invalid/corrupt files
|
||||||
|
|
||||||
- Doesn't extract macOS system generated files (ignores directory *__MACOSX* inside the compressed file)
|
- Skips extracting files and directories if their path contains any of the *ignored dirs*, as set in *settings.py* - ignored directories by default:
|
||||||
|
|
||||||
|
- `__MACOSX` (macOS system generated files)
|
||||||
|
|
||||||
|
- `.git` (git repo files)
|
||||||
|
|
||||||
|
- `node_modules` (npm)
|
||||||
|
|
||||||
|
- `vendor` (composer / laravel)
|
||||||
|
|
||||||
- Deletes each compressed file after successful extraction into student directory
|
- Deletes each compressed file after successful extraction into student directory
|
||||||
|
|
||||||
@@ -30,7 +38,7 @@ Optionally, you can inspect the submissions for identical files (by generating a
|
|||||||
|
|
||||||
- The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually
|
- The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually
|
||||||
|
|
||||||
- [Inspect by hash](inspect.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect:
|
- [Inspect by hash](inspect/about.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect:
|
||||||
|
|
||||||
- Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard*
|
- Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard*
|
||||||
|
|
||||||
@@ -38,11 +46,7 @@ Optionally, you can inspect the submissions for identical files (by generating a
|
|||||||
|
|
||||||
## **Instructions**
|
## **Instructions**
|
||||||
|
|
||||||
See [***Instructions***](instructions.md) for more information & details.
|
See the documentation for [Requirements & Settings](instructions/requirements-settings.md) and [Usage](instructions/usage.md) instructions, and more information & details about [***Inspect by hash***](inspect/about.md).
|
||||||
|
|
||||||
## **Inspect by hash** :mag:
|
|
||||||
|
|
||||||
See [***Inspect by hash***](inspect.md) for more information & details.
|
|
||||||
|
|
||||||
## **General notes**
|
## **General notes**
|
||||||
|
|
||||||
|
|||||||
29
docs/inspect/about.md
Normal file
29
docs/inspect/about.md
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
# **Inspect by hash** :mag:
|
||||||
|
|
||||||
|
Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash
|
||||||
|
|
||||||
|
## **Description**
|
||||||
|
|
||||||
|
With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations:
|
||||||
|
|
||||||
|
[*Inspect gradebook*](usage.md#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard*
|
||||||
|
|
||||||
|
[*Inspect submissions*](usage.md#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files
|
||||||
|
|
||||||
|
## **Features**
|
||||||
|
|
||||||
|
- Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file
|
||||||
|
|
||||||
|
- Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*)
|
||||||
|
|
||||||
|
- Compares the generated hashes and finds any duplicates - ignores duplicates if they are by the same student/submission
|
||||||
|
|
||||||
|
- Finds all files with the same hash and outputs the list to a CSV file with the following information:
|
||||||
|
|
||||||
|
- *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash`
|
||||||
|
|
||||||
|
- *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash`
|
||||||
|
|
||||||
|
- File names and paths listed in the generated CSV files have hyperlinks to the actual files for a quick inspection of the file contents (or running the files, if executable)
|
||||||
|
|
||||||
|
*Note:* Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files.
|
||||||
9
docs/inspect/requirements.md
Normal file
9
docs/inspect/requirements.md
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# **Inspect by hash** :mag:
|
||||||
|
|
||||||
|
## **Requirements**
|
||||||
|
|
||||||
|
The ***inspect*** scripts require the `pandas` package - if it's not already installed, run:
|
||||||
|
|
||||||
|
```console
|
||||||
|
python -m pip install pandas
|
||||||
|
```
|
||||||
@@ -1,42 +1,6 @@
|
|||||||
# **Inspect by hash** :mag:
|
# **Using Inspect by hash** :mag:
|
||||||
|
|
||||||
Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash
|
## **Inspect gradebook**
|
||||||
|
|
||||||
## **Description**
|
|
||||||
|
|
||||||
With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations:
|
|
||||||
|
|
||||||
[*Inspect gradebook*](#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard*
|
|
||||||
|
|
||||||
[*Inspect submissions*](#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files
|
|
||||||
|
|
||||||
## **Features**
|
|
||||||
|
|
||||||
- Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file
|
|
||||||
|
|
||||||
- Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*)
|
|
||||||
|
|
||||||
- Compares the generated hashes and finds any duplicates - ignores duplicates if they are by the same student/submission
|
|
||||||
|
|
||||||
- Finds all files with the same hash and outputs the list to a CSV file with the following information:
|
|
||||||
|
|
||||||
- *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash`
|
|
||||||
|
|
||||||
- *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash`
|
|
||||||
|
|
||||||
- File names and paths listed in the generated CSV files have hyperlinks to the actual files for a quick inspection of the file contents (or running the files, if executable)
|
|
||||||
|
|
||||||
*Note:* Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files.
|
|
||||||
|
|
||||||
## **Instructions**
|
|
||||||
|
|
||||||
Before running the *inspect* scripts for the first time, you also need to install the *pandas* package:
|
|
||||||
|
|
||||||
```python
|
|
||||||
python -m pip install pandas
|
|
||||||
```
|
|
||||||
|
|
||||||
### **Inspect gradebook**
|
|
||||||
|
|
||||||
If you haven't already, extract the downloaded from *Blackboard* gradebook in a new directory inside *BB_gradebooks*
|
If you haven't already, extract the downloaded from *Blackboard* gradebook in a new directory inside *BB_gradebooks*
|
||||||
|
|
||||||
@@ -44,7 +8,7 @@ If you haven't already, extract the downloaded from *Blackboard* gradebook in a
|
|||||||
|
|
||||||
To inspect a *gradeboook* run **`inspect_gradebook.py`** and provide the name of the gradebook directory as an argument, e.g. for the gradebook `AssignmentX` run:
|
To inspect a *gradeboook* run **`inspect_gradebook.py`** and provide the name of the gradebook directory as an argument, e.g. for the gradebook `AssignmentX` run:
|
||||||
|
|
||||||
```python
|
```console
|
||||||
python inspect_gradebook.py AssignmentX
|
python inspect_gradebook.py AssignmentX
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -56,13 +20,13 @@ Generated CSV files can be found in directory `csv-inspect`, with the inspected
|
|||||||
|
|
||||||
- `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes
|
- `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes
|
||||||
|
|
||||||
### **Inspect submissions**
|
## **Inspect submissions**
|
||||||
|
|
||||||
To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument.
|
To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument.
|
||||||
|
|
||||||
- e.g. for the organised gradebook `AssignmentX` (in *BB_submissions*/`AssignmentX`) run:
|
- e.g. for the organised gradebook `AssignmentX` (in *BB_submissions*/`AssignmentX`) run:
|
||||||
|
|
||||||
```python
|
```console
|
||||||
python inspect_submissions.py AssignmentX
|
python inspect_submissions.py AssignmentX
|
||||||
```
|
```
|
||||||
|
|
||||||
37
docs/instructions/requirements-settings.md
Normal file
37
docs/instructions/requirements-settings.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
# **Requirements & Settings**
|
||||||
|
|
||||||
|
## **Install requirements**
|
||||||
|
|
||||||
|
Before running the script for the first time, install the required python packages:
|
||||||
|
|
||||||
|
Option 1 - Install `py7z`, `rarfile`
|
||||||
|
|
||||||
|
```console
|
||||||
|
python -m pip install py7zr rarfile
|
||||||
|
```
|
||||||
|
|
||||||
|
Option 2 - Install all packages, including `pandas` which is used in [Inspect by hash](../inspect/about.md), using the requirements file
|
||||||
|
|
||||||
|
```console
|
||||||
|
python -m pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: If running on Linux/Mac, you also need to have `unrar` installed in order to be able to extract `.rar` files (applies for both options 1 and 2)
|
||||||
|
|
||||||
|
- `sudo apt install unrar` for Linux
|
||||||
|
|
||||||
|
- `brew install rar` for Mac
|
||||||
|
|
||||||
|
## (Optional) **Edit settings**
|
||||||
|
|
||||||
|
You can change the default settings by editing *utils/settings.py*. The main setting you might want to edit is `IGNORE_DIRS` - the list of names for directories, or files, to ignore when extracting from compressed files.
|
||||||
|
|
||||||
|
Ignored directories by default:
|
||||||
|
|
||||||
|
- `__MACOSX` (macOS system generated files)
|
||||||
|
|
||||||
|
- `.git` (git repo files)
|
||||||
|
|
||||||
|
- `node_modules` (npm)
|
||||||
|
|
||||||
|
- `vendor` (composer / laravel)
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
# **Instructions**
|
# **Using BBGradebookOrganiser**
|
||||||
|
|
||||||
## **Download gradebook**
|
## **Download gradebook**
|
||||||
|
|
||||||
@@ -20,24 +20,11 @@ Extract the downloaded gradebook in a new directory inside *BB_gradebooks*.
|
|||||||
|
|
||||||
## **Organise gradebook**
|
## **Organise gradebook**
|
||||||
|
|
||||||
Before running the script for the first time, install the required packages (*py7z*, *rarfile*):
|
|
||||||
|
|
||||||
```python
|
|
||||||
python -m pip install py7zr rarfile
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: If running on Linux/Mac, you also need to have `unrar` installed in order to be able to extract *.rar* files.
|
|
||||||
|
|
||||||
- `sudo apt install unrar` for Linux
|
|
||||||
|
|
||||||
- `brew install rar` for Mac
|
|
||||||
|
|
||||||
|
|
||||||
To organise the gradebook run **`organise_gradebook.py`** and provide the name of the directory with the *extracted* gradebook (from section *Extract gradebook* above) as an argument.
|
To organise the gradebook run **`organise_gradebook.py`** and provide the name of the directory with the *extracted* gradebook (from section *Extract gradebook* above) as an argument.
|
||||||
|
|
||||||
- e.g. for gradebook `AssignmentX` (in *BB_gradebooks*/`AssignmentX`) run:
|
- e.g. for gradebook `AssignmentX` (in *BB_gradebooks*/`AssignmentX`) run:
|
||||||
|
|
||||||
```python
|
```console
|
||||||
python organise_gradebook.py AssignmentX
|
python organise_gradebook.py AssignmentX
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -67,4 +54,4 @@ Compressed files are deleted after successfully extracting and organising the co
|
|||||||
|
|
||||||
## **Inspect by hash** :mag:
|
## **Inspect by hash** :mag:
|
||||||
|
|
||||||
See [***Inspect by hash***](inspect.md) for more information & details.
|
See [***Inspect by hash***](../inspect/about.md) for more information & details.
|
||||||
@@ -1,20 +1,20 @@
|
|||||||
import os, sys
|
import os, sys
|
||||||
|
|
||||||
from utils.inspector import generate_hashes_gradebook, generate_duplicate_hashes_gradebook
|
from utils.inspector import generate_hashes_gradebook, generate_duplicate_hashes_gradebook
|
||||||
|
from utils.settings import BB_GRADEBOOKS_DIR
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
gradebook_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
|
gradebook_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
|
||||||
|
|
||||||
gradebook_dir_path = os.path.join('BB_gradebooks', gradebook_dir_name)
|
gradebook_dir_path = os.path.join(BB_GRADEBOOKS_DIR, gradebook_dir_name)
|
||||||
if not os.path.exists(gradebook_dir_path):
|
if not os.path.exists(gradebook_dir_path):
|
||||||
exit('[Info] Gradebook directory does not exist - nothing to inspect')
|
exit('[Info] Gradebook directory does not exist - nothing to inspect')
|
||||||
if not os.listdir(gradebook_dir_path): # if no files in gradebook dir
|
if not os.listdir(gradebook_dir_path): # if no files in gradebook dir
|
||||||
exit(f'[Info] No files found in this gradebook - nothing to inspect')
|
exit(f'[Info] No files found in this gradebook - nothing to inspect')
|
||||||
# generate CSV file with hashes for all files in gradebook & return path to CSV file for finding duplicate hashes
|
hashes_csv_file_path = generate_hashes_gradebook(gradebook_dir_path) # generate CSV file with hashes for all files in gradebook & return path to CSV file for finding duplicate hashes
|
||||||
hashes_csv_file_path = generate_hashes_gradebook(gradebook_dir_path)
|
generate_duplicate_hashes_gradebook(hashes_csv_file_path) # generate CSV file with files having duplicate hashes
|
||||||
# generate CSV file with files having duplicate hashes
|
|
||||||
generate_duplicate_hashes_gradebook(hashes_csv_file_path)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main()
|
main()
|
||||||
|
|||||||
@@ -1,19 +1,19 @@
|
|||||||
import os, sys
|
import os, sys
|
||||||
|
|
||||||
from utils.inspector import generate_hashes_submissions, generate_duplicate_hashes_submissions
|
from utils.inspector import generate_hashes_submissions, generate_duplicate_hashes_submissions
|
||||||
|
from utils.settings import BB_SUBMISSIONS_DIR
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
submissions_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo submissions directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [submissions dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
|
submissions_dir_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo submissions directory name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [submissions dir name]\nExample: python {sys.argv[0]} AssignmentX\n')
|
||||||
|
|
||||||
submissions_dir_path = os.path.join('BB_submissions', submissions_dir_name)
|
submissions_dir_path = os.path.join(BB_SUBMISSIONS_DIR, submissions_dir_name)
|
||||||
if not os.path.exists(submissions_dir_path):
|
if not os.path.exists(submissions_dir_path):
|
||||||
exit('[Info] Directory does not exist - nothing to inspect')
|
exit('[Info] Directory does not exist - nothing to inspect')
|
||||||
if not os.listdir(submissions_dir_path): # if no files in dir
|
if not os.listdir(submissions_dir_path): # if no files in dir
|
||||||
exit(f'[Info] No files found in this submissions directory - nothing to inspect')
|
exit(f'[Info] No files found in this submissions directory - nothing to inspect')
|
||||||
# generate CSV file with hashes for all files in submissions (except for any 'excluded') & return path to CSV file for finding duplicate hashes
|
hashes_csv_file_path = generate_hashes_submissions(submissions_dir_path) # generate CSV file with hashes for all files in submissions (except for any 'excluded') & return path to CSV file for finding duplicate hashes
|
||||||
hashes_csv_file_path = generate_hashes_submissions(submissions_dir_path)
|
generate_duplicate_hashes_submissions(hashes_csv_file_path) # generate CSV file with files having duplicate hashes
|
||||||
# generate CSV file with files having duplicate hashes
|
|
||||||
generate_duplicate_hashes_submissions(hashes_csv_file_path)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
|
|||||||
@@ -1,18 +1,20 @@
|
|||||||
import os, sys
|
import os, sys
|
||||||
|
|
||||||
from utils.organiser import organise_gradebook, check_submissions_dir_for_compressed
|
from utils.organiser import organise_gradebook, check_submissions_dir_for_compressed
|
||||||
|
from utils.settings import BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
gradebook_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\n')
|
gradebook_name = ' '.join(sys.argv[1:]) if len(sys.argv) > 1 else exit(f'\nNo gradebook name given. Provide the name as an argument.\n\nUsage: python {sys.argv[0]} [gradebook dir name]\n')
|
||||||
gradebook_dir = os.path.join('BB_gradebooks', gradebook_name) # gradebook from Blackboard with all submissions
|
gradebook_dir = os.path.join(BB_GRADEBOOKS_DIR, gradebook_name) # gradebook from Blackboard with all submissions
|
||||||
submissions_dir = os.path.join('BB_submissions', gradebook_name) # target dir for extracted submissions
|
submissions_dir = os.path.join(BB_SUBMISSIONS_DIR, gradebook_name) # target dir for extracted submissions
|
||||||
|
|
||||||
abs_path = os.getcwd() # absolute path of main/this script
|
abs_path = os.getcwd() # absolute path of main/this script
|
||||||
print(f'\nGradebook directory to organise: {os.path.join(abs_path, gradebook_dir)}')
|
print(f'\nGradebook directory to organise:\n{os.path.join(abs_path, gradebook_dir)}', flush=True)
|
||||||
|
|
||||||
organise_gradebook(gradebook_dir, submissions_dir)
|
organise_gradebook(gradebook_dir, submissions_dir)
|
||||||
check_submissions_dir_for_compressed(submissions_dir)
|
check_submissions_dir_for_compressed(submissions_dir)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main()
|
main()
|
||||||
|
|
||||||
|
|||||||
5
requirements.txt
Normal file
5
requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
# for organise gradebook script
|
||||||
|
py7zr
|
||||||
|
rarfile
|
||||||
|
# for inspect gradebook/submissions scripts
|
||||||
|
pandas
|
||||||
@@ -2,7 +2,7 @@ import os, shutil, platform
|
|||||||
import zipfile, rarfile
|
import zipfile, rarfile
|
||||||
from py7zr import SevenZipFile, exceptions
|
from py7zr import SevenZipFile, exceptions
|
||||||
|
|
||||||
from utils.settings import BAD_DIR_NAME
|
from utils.settings import BAD_DIR_NAME, IGNORE_DIRS
|
||||||
|
|
||||||
|
|
||||||
def mark_file_as_BAD(file: str, bad_exception: Exception) -> None:
|
def mark_file_as_BAD(file: str, bad_exception: Exception) -> None:
|
||||||
@@ -12,24 +12,22 @@ def mark_file_as_BAD(file: str, bad_exception: Exception) -> None:
|
|||||||
os.makedirs(bad_dir, exist_ok=True)
|
os.makedirs(bad_dir, exist_ok=True)
|
||||||
bad_file_path = os.path.join(bad_dir, filename)
|
bad_file_path = os.path.join(bad_dir, filename)
|
||||||
shutil.move(file, bad_file_path)
|
shutil.move(file, bad_file_path)
|
||||||
print(f'[Warning] Found BAD compressed file: {filename}\nMoved to: {bad_file_path}\nError message: {bad_exception}')
|
print(f'\n[Warning] Found BAD compressed file: {filename}\nMoved to: {bad_file_path}\nError message: {bad_exception}\n', flush=True)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f'[Error] {e}')
|
print(f'\n[ERROR] {e}\n', flush=True)
|
||||||
|
|
||||||
|
|
||||||
def extract_zip(zip_file: str, target_dir: str) -> None | Exception:
|
def extract_zip(zip_file: str, target_dir: str) -> None | Exception:
|
||||||
try:
|
try:
|
||||||
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
|
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
|
||||||
members = [ m for m in zip_ref.infolist() if "__MACOSX" not in m.filename ]
|
members = [ m for m in zip_ref.infolist() if not any(dir_name in m.filename for dir_name in IGNORE_DIRS) ] # filter out files/dirs using IGNORE_DIRS
|
||||||
zip_ref.extractall(target_dir, members=members) # extract all files, ignoring those with the "__MACOSX" string in the name
|
zip_ref.extractall(target_dir, members=members) # extract remaining files
|
||||||
zip_ref.close()
|
zip_ref.close()
|
||||||
except zipfile.BadZipfile as e:
|
except zipfile.BadZipfile as e:
|
||||||
mark_file_as_BAD(zip_file, e)
|
mark_file_as_BAD(zip_file, e)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f'[ERROR] Something went wrong while extracting the contents of a submitted zip file. Check the error message, get student id and download / organise manually\nError message: {e}')
|
print(f'\n[ERROR] Something went wrong while extracting the contents of a submitted zip file. Check the error message, get student id and download / organise manually\n\nError message: {e}\n', flush=True)
|
||||||
return e
|
return e
|
||||||
|
|
||||||
|
|
||||||
def extract_rar(rar_file: str, target_dir: str) -> None:
|
def extract_rar(rar_file: str, target_dir: str) -> None:
|
||||||
try:
|
try:
|
||||||
with rarfile.RarFile(rar_file, 'r') as rar_ref:
|
with rarfile.RarFile(rar_file, 'r') as rar_ref:
|
||||||
@@ -38,18 +36,19 @@ def extract_rar(rar_file: str, target_dir: str) -> None:
|
|||||||
else: # if Linux or Mac
|
else: # if Linux or Mac
|
||||||
rarfile.UNRAR_TOOL = 'unrar'
|
rarfile.UNRAR_TOOL = 'unrar'
|
||||||
files = rar_ref.namelist()
|
files = rar_ref.namelist()
|
||||||
files = [ f for f in files if "__MACOSX" not in f ] # filter out files with "__MACOSX" in the name
|
files = [ f for f in files if not any(dir_name in f for dir_name in IGNORE_DIRS) ] # filter out files/dirs using IGNORE_DIRS
|
||||||
rar_ref.extractall(target_dir, files) # extract the remaining files
|
rar_ref.extractall(target_dir, files) # extract the remaining files
|
||||||
rar_ref.close()
|
rar_ref.close()
|
||||||
|
except OSError as e:
|
||||||
|
mark_file_as_BAD(rar_file, e)
|
||||||
except rarfile.BadRarFile as e:
|
except rarfile.BadRarFile as e:
|
||||||
mark_file_as_BAD(rar_file, e)
|
mark_file_as_BAD(rar_file, e)
|
||||||
except rarfile.NotRarFile as e:
|
except rarfile.NotRarFile as e:
|
||||||
mark_file_as_BAD(rar_file, e)
|
mark_file_as_BAD(rar_file, e)
|
||||||
except rarfile.RarCannotExec as e:
|
except rarfile.RarCannotExec as e:
|
||||||
print('[Error] Missing unrar tool\nfor Windows: make sure file UnRAR.exe exists in directory \'utils\'\nfor Linux/Mac: need to install unrar (check README)')
|
print('\n[ERROR] Missing unrar tool\nfor Windows: make sure file UnRAR.exe exists in directory \'utils\'\nfor Linux/Mac: need to install unrar (check README)\n', flush=True)
|
||||||
exit()
|
exit()
|
||||||
|
|
||||||
|
|
||||||
def extract_7z(seven_zip_file: str, target_dir: str) -> None:
|
def extract_7z(seven_zip_file: str, target_dir: str) -> None:
|
||||||
try: # extract the 7z file using py7zr
|
try: # extract the 7z file using py7zr
|
||||||
with open(seven_zip_file, 'rb') as f:
|
with open(seven_zip_file, 'rb') as f:
|
||||||
@@ -57,7 +56,7 @@ def extract_7z(seven_zip_file: str, target_dir: str) -> None:
|
|||||||
if not seven_zip.getnames():
|
if not seven_zip.getnames():
|
||||||
raise exceptions.Bad7zFile
|
raise exceptions.Bad7zFile
|
||||||
files = seven_zip.getnames()
|
files = seven_zip.getnames()
|
||||||
files = [ f for f in files if "__MACOSX" not in f ] # filter out files with "__MACOSX" in the name
|
files = [ f for f in files if not any(dir_name in f for dir_name in IGNORE_DIRS) ] # filter out files/dirs using IGNORE_DIRS
|
||||||
seven_zip.extract(target_dir, targets=files) # extract the remaining files
|
seven_zip.extract(target_dir, targets=files) # extract the remaining files
|
||||||
seven_zip.close()
|
seven_zip.close()
|
||||||
except exceptions.Bad7zFile as e:
|
except exceptions.Bad7zFile as e:
|
||||||
@@ -65,7 +64,6 @@ def extract_7z(seven_zip_file: str, target_dir: str) -> None:
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
mark_file_as_BAD(seven_zip_file, e)
|
mark_file_as_BAD(seven_zip_file, e)
|
||||||
|
|
||||||
|
|
||||||
def extract_file_to_dir(file_path: str, student_dir: str) -> None | Exception:
|
def extract_file_to_dir(file_path: str, student_dir: str) -> None | Exception:
|
||||||
os.makedirs(student_dir, exist_ok=True) # create the subdirectory for student
|
os.makedirs(student_dir, exist_ok=True) # create the subdirectory for student
|
||||||
|
|
||||||
@@ -76,4 +74,4 @@ def extract_file_to_dir(file_path: str, student_dir: str) -> None | Exception:
|
|||||||
elif file_path.lower().endswith('.7z'):
|
elif file_path.lower().endswith('.7z'):
|
||||||
extract_7z(file_path, student_dir)
|
extract_7z(file_path, student_dir)
|
||||||
else:
|
else:
|
||||||
print(f"[Error] unknown file type: {file_path}")
|
print(f'\n[ERROR] unknown file type: {file_path}\n', flush=True)
|
||||||
|
|||||||
@@ -5,50 +5,51 @@ import hashlib
|
|||||||
import pandas as pd
|
import pandas as pd
|
||||||
from functools import partial
|
from functools import partial
|
||||||
|
|
||||||
from utils.settings import CSV_DIR
|
from utils.settings import CSV_DIR, BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR, MIN_FILESIZE_IN_BYTES
|
||||||
|
|
||||||
|
|
||||||
def load_excluded_filenames(submissions_dir_name: str) -> list[str]: # helper function for hashing all files
|
def load_excluded_filenames(submissions_dir_name: str) -> list[str]: # helper function for hashing all files
|
||||||
csv_file_path = os.path.join(CSV_DIR, f'{submissions_dir_name}_excluded.csv')
|
csv_file_path = os.path.join(CSV_DIR, f'{submissions_dir_name}_excluded.csv')
|
||||||
if not os.path.exists(csv_file_path): # if csv file with excluded file names for submission does not exist
|
if not os.path.exists(csv_file_path): # if csv file with excluded file names for submission does not exist
|
||||||
print(f'[WARNING] Cannot find CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected')
|
print(f'[WARNING] Cannot find CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected', flush=True)
|
||||||
return [] # return empty list to continue without any excluded file names
|
return [] # return empty list to continue without any excluded file names
|
||||||
else: # if csv file with excluded file names for submission exists
|
else: # if csv file with excluded file names for submission exists
|
||||||
try:
|
try:
|
||||||
df = pd.read_csv(csv_file_path)
|
df = pd.read_csv(csv_file_path)
|
||||||
filename_list = df['exclude_filename'].tolist() # get the values of the 'filename' column as a list
|
filename_list = df['exclude_filename'].tolist() # get the values of the 'filename' column as a list
|
||||||
filename_list = [ f.lower() for f in filename_list ] # convert to lowercase for comparison with submission files
|
filename_list = [ f.lower() for f in filename_list ] # convert to lowercase for comparison with submission files
|
||||||
print(f'[INFO] Using CSV file with list of excluded file names: {csv_file_path}')
|
print(f'[INFO] Using CSV file with list of excluded file names: {csv_file_path}', flush=True)
|
||||||
return filename_list
|
return filename_list
|
||||||
except Exception as e: # any exception, print error and return empty list to continue without any excluded file names
|
except Exception as e: # any exception, print error and return empty list to continue without any excluded file names
|
||||||
print(f'[WARNING] Unable to load / read CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected')
|
print(f'[WARNING] Unable to load / read CSV file with list of excluded file names: {csv_file_path}\n[INFO] All files will be hashed & inspected', flush=True)
|
||||||
print(f'[INFO] Error message: {e}')
|
print(f'[INFO] Error message: {e}', flush=True)
|
||||||
return []
|
return []
|
||||||
|
|
||||||
|
|
||||||
def get_hashes_in_dir(dir_path: str, excluded_filenames: list = []) -> list: # helper function for hashing all files
|
def get_hashes_in_dir(dir_path: str, excluded_filenames: list = []) -> list: # helper function for hashing all files
|
||||||
hash_list = []
|
hash_list = []
|
||||||
for subdir, dirs, files in os.walk(dir_path): # loop through all files in the directory and generate hashes
|
for subdir, dirs, files in os.walk(dir_path): # loop through all files in the directory and generate hashes
|
||||||
for filename in files:
|
for filename in files:
|
||||||
if filename.lower() not in excluded_filenames: # convert to lowercase for comparison with excluded files & do not hash if in the excluded list
|
if filename.lower() not in excluded_filenames: # convert to lowercase for comparison with excluded files & do not hash if in the excluded list
|
||||||
filepath = os.path.join(subdir, filename)
|
filepath = os.path.join(subdir, filename)
|
||||||
with open(filepath, 'rb') as f:
|
if os.path.getsize(filepath) > MIN_FILESIZE_IN_BYTES: # file size more than MIN_FILESIZE_IN_BYTES (as set in settings.py)
|
||||||
filehash = hashlib.sha256(f.read()).hexdigest()
|
with open(filepath, 'rb') as f:
|
||||||
if filehash != 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855': # do not include hashes of empty files
|
filehash = hashlib.sha256(f.read()).hexdigest()
|
||||||
|
#if filehash != 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855': # do not include hashes of empty files
|
||||||
hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash})
|
hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash})
|
||||||
|
# else:
|
||||||
|
# print(f'size: {os.path.getsize(filepath)}B, {filepath}')
|
||||||
return hash_list
|
return hash_list
|
||||||
|
|
||||||
|
|
||||||
def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function for hashing all files in gradebook
|
def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function for hashing all files in gradebook
|
||||||
gradebook_dir_name = os.path.abspath(gradebook_dir_path).split(os.path.sep)[-1] # get name of gradebook by separating path and use rightmost part
|
gradebook_dir_name = os.path.abspath(gradebook_dir_path).split(os.path.sep)[-1] # get name of gradebook by separating path and use rightmost part
|
||||||
if not os.path.isdir(gradebook_dir_path):
|
if not os.path.isdir(gradebook_dir_path):
|
||||||
exit(f'Directory {gradebook_dir_path} does not exist.\nMake sure "{gradebook_dir_name}" exists in "BB_gradebooks".\n')
|
exit(f'Directory {gradebook_dir_path} does not exist.\nMake sure "{gradebook_dir_name}" exists in "{BB_GRADEBOOKS_DIR}".\n')
|
||||||
|
|
||||||
dicts_with_hashes_list = get_hashes_in_dir(gradebook_dir_path)
|
dicts_with_hashes_list = get_hashes_in_dir(gradebook_dir_path)
|
||||||
for hash_dict in dicts_with_hashes_list:
|
for hash_dict in dicts_with_hashes_list:
|
||||||
student_id = hash_dict['filename'].split('_attempt_')[0].split('_')[-1]
|
student_id = hash_dict['filename'].split('_attempt_')[0].split('_')[-1]
|
||||||
full_path = os.path.join(os.getcwd(), hash_dict["filepath"])
|
relative_path = os.path.join('..', hash_dict["filepath"])
|
||||||
hash_dict['filename'] = f'=HYPERLINK("{full_path}", "{hash_dict["filename"]}")'
|
hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
|
||||||
del hash_dict['filepath']
|
del hash_dict['filepath']
|
||||||
hash_dict.update({'Student ID': student_id})
|
hash_dict.update({'Student ID': student_id})
|
||||||
|
|
||||||
@@ -56,19 +57,18 @@ def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function
|
|||||||
csv_file_name = f'{gradebook_dir_name}_gradebook_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
|
csv_file_name = f'{gradebook_dir_name}_gradebook_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
|
||||||
csv_file_path = os.path.join(CSV_DIR, csv_file_name)
|
csv_file_path = os.path.join(CSV_DIR, csv_file_name)
|
||||||
|
|
||||||
with open(csv_file_path, 'w', newline='') as csvfile: # open the output CSV file for writing
|
with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile: # open the output CSV file for writing
|
||||||
fieldnames = ['Student ID', 'filename', 'sha256 hash']
|
fieldnames = ['Student ID', 'filename', 'sha256 hash']
|
||||||
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
|
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
|
||||||
writer.writeheader()
|
writer.writeheader()
|
||||||
writer.writerows(dicts_with_hashes_list)
|
writer.writerows(dicts_with_hashes_list)
|
||||||
print(f'[INFO] Created CSV file with all files & hashes in gradebook: {gradebook_dir_name}\nCSV file: {csv_file_path}')
|
print(f'[INFO] Created CSV file with all files & hashes in gradebook: {gradebook_dir_name}\nCSV file: {csv_file_path}', flush=True)
|
||||||
return csv_file_path
|
return csv_file_path
|
||||||
|
|
||||||
|
|
||||||
def generate_hashes_submissions(submissions_dir_path: str) -> str: # main function for hashing all files in submissions
|
def generate_hashes_submissions(submissions_dir_path: str) -> str: # main function for hashing all files in submissions
|
||||||
submissions_dir_name = os.path.abspath(submissions_dir_path).split(os.path.sep)[-1] # get name of submission/assignment by separating path and use rightmost part
|
submissions_dir_name = os.path.abspath(submissions_dir_path).split(os.path.sep)[-1] # get name of submission/assignment by separating path and use rightmost part
|
||||||
if not os.path.isdir(submissions_dir_path):
|
if not os.path.isdir(submissions_dir_path):
|
||||||
exit(f'Directory {submissions_dir_path} does not exist.\nMake sure "{submissions_dir_name}" exists in "BB_submissions".\n')
|
exit(f'Directory {submissions_dir_path} does not exist.\nMake sure "{submissions_dir_name}" exists in "{BB_SUBMISSIONS_DIR}".\n')
|
||||||
|
|
||||||
excluded_filenames = load_excluded_filenames(submissions_dir_name)
|
excluded_filenames = load_excluded_filenames(submissions_dir_name)
|
||||||
dicts_with_hashes_list = []
|
dicts_with_hashes_list = []
|
||||||
@@ -78,9 +78,9 @@ def generate_hashes_submissions(submissions_dir_path: str) -> str: # main funct
|
|||||||
student_dicts_list = []
|
student_dicts_list = []
|
||||||
for hash_dict in student_dicts_with_hashes_list:
|
for hash_dict in student_dicts_with_hashes_list:
|
||||||
hash_dict.update({'Student ID': student_dir_name}) # update hash records with student id
|
hash_dict.update({'Student ID': student_dir_name}) # update hash records with student id
|
||||||
full_path = os.path.join(os.getcwd(), hash_dict["filepath"])
|
relative_path = os.path.join('..', hash_dict["filepath"])
|
||||||
hash_dict['filepath'] = f'=HYPERLINK("{full_path}", "{hash_dict["filepath"]}")'
|
hash_dict['filepath'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filepath"]}")'
|
||||||
hash_dict['filename'] = f'=HYPERLINK("{full_path}", "{hash_dict["filename"]}")'
|
hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
|
||||||
student_dicts_list.append(hash_dict) # append file dict to student list of dict for csv export
|
student_dicts_list.append(hash_dict) # append file dict to student list of dict for csv export
|
||||||
|
|
||||||
dicts_with_hashes_list.append(student_dicts_list) # append student hashes to main list with all submissions
|
dicts_with_hashes_list.append(student_dicts_list) # append student hashes to main list with all submissions
|
||||||
@@ -89,15 +89,14 @@ def generate_hashes_submissions(submissions_dir_path: str) -> str: # main funct
|
|||||||
csv_file_name = f'{submissions_dir_name}_submissions_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
|
csv_file_name = f'{submissions_dir_name}_submissions_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
|
||||||
csv_file_path = os.path.join(CSV_DIR, csv_file_name)
|
csv_file_path = os.path.join(CSV_DIR, csv_file_name)
|
||||||
|
|
||||||
with open(csv_file_path, 'w', newline='') as csvfile: # open the output CSV file for writing
|
with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile: # open the output CSV file for writing
|
||||||
fieldnames = ['Student ID', 'filepath', 'filename', 'sha256 hash']
|
fieldnames = ['Student ID', 'filepath', 'filename', 'sha256 hash']
|
||||||
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
|
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
|
||||||
writer.writeheader()
|
writer.writeheader()
|
||||||
for student_dict in dicts_with_hashes_list:
|
for student_dict in dicts_with_hashes_list:
|
||||||
writer.writerows(student_dict)
|
writer.writerows(student_dict)
|
||||||
print(f'[INFO] Created CSV file with all files & hashes for submissions in: {submissions_dir_name}\nCSV file: {csv_file_path}')
|
print(f'[INFO] Created CSV file with all files & hashes for submissions in: {submissions_dir_name}\nCSV file: {csv_file_path}', flush=True)
|
||||||
return csv_file_path
|
return csv_file_path
|
||||||
|
|
||||||
|
|
||||||
def generate_duplicate_hashes_generic(hashes_csv_file_path: str, drop_columns: list[str]):
|
def generate_duplicate_hashes_generic(hashes_csv_file_path: str, drop_columns: list[str]):
|
||||||
csv = pd.read_csv(hashes_csv_file_path)
|
csv = pd.read_csv(hashes_csv_file_path)
|
||||||
@@ -119,7 +118,7 @@ def generate_duplicate_hashes_generic(hashes_csv_file_path: str, drop_columns: l
|
|||||||
csv_out = hashes_csv_file_path.rsplit('_', 1)[0].replace('file_hashes', 'duplicate_') + datetime.now().strftime("%Y%m%d-%H%M%S") + '.csv'
|
csv_out = hashes_csv_file_path.rsplit('_', 1)[0].replace('file_hashes', 'duplicate_') + datetime.now().strftime("%Y%m%d-%H%M%S") + '.csv'
|
||||||
try:
|
try:
|
||||||
df_duplicate.to_csv(csv_out, index=False)
|
df_duplicate.to_csv(csv_out, index=False)
|
||||||
print(f'[INFO] Created CSV file with duplicate hashes in {gradebook_or_submissions_str}: {assignment_name}\nCSV file: {csv_out}')
|
print(f'[INFO] Created CSV file with duplicate hashes in {gradebook_or_submissions_str}: {assignment_name}\nCSV file: {csv_out}', flush=True)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
exit(f'[ERROR] Something went wrong while trying to save csv file with duplicate hashes\nError message: {e}')
|
exit(f'[ERROR] Something went wrong while trying to save csv file with duplicate hashes\nError message: {e}')
|
||||||
|
|
||||||
|
|||||||
@@ -1,11 +1,47 @@
|
|||||||
import os, shutil, re
|
import os, shutil, re
|
||||||
|
from collections import defaultdict
|
||||||
from utils.extractor import extract_file_to_dir
|
from utils.extractor import extract_file_to_dir
|
||||||
from utils.settings import BAD_DIR_NAME
|
from utils.settings import BAD_DIR_NAME, MULTIPLE_DIR_NAME, BB_GRADEBOOKS_DIR, IGNORE_DIRS, TRACKED_FILE_EXT
|
||||||
|
|
||||||
|
|
||||||
def validate_gradebook_dir_name(src_dir: str) -> None:
|
def _parse_filename(file_path: str) -> tuple[str, str] | None:
|
||||||
|
"""Extract STUDENTNUMBER and DATETIME from the filename."""
|
||||||
|
pattern = r'^(.*?)_(\d+)_attempt_(\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2})(?:_.*)?(?:\..+)?$'
|
||||||
|
match = re.match(pattern, file_path)
|
||||||
|
if match:
|
||||||
|
return match.group(2), match.group(3) # STUDENTNUMBER, DATETIME
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
def _filter_multiple_attempts(directory: str) -> None:
|
||||||
|
"""Keep only the latest attempt for each student and move older attempts to MULTIPLE_DIR_NAME."""
|
||||||
|
submissions = defaultdict(list)
|
||||||
|
|
||||||
|
multiple_folder = os.path.join(directory, MULTIPLE_DIR_NAME)
|
||||||
|
os.makedirs(multiple_folder, exist_ok=True)
|
||||||
|
|
||||||
|
# collect all valid files
|
||||||
|
for filename in os.listdir(directory):
|
||||||
|
filepath = os.path.join(directory, filename)
|
||||||
|
if os.path.isfile(filepath):
|
||||||
|
student_number, timestamp = _parse_filename(filename)
|
||||||
|
if student_number and timestamp:
|
||||||
|
submissions[student_number].append((timestamp, filepath))
|
||||||
|
|
||||||
|
# process submissions
|
||||||
|
for student, files in submissions.items():
|
||||||
|
files.sort(reverse=True, key=lambda x: x[0]) # sort by timestamp (most recent first)
|
||||||
|
latest_timestamp = files[0][0] # get the most recent timestamp
|
||||||
|
|
||||||
|
# keep all files from the latest attempt, move older ones
|
||||||
|
for timestamp, filepath in files:
|
||||||
|
if timestamp != latest_timestamp:
|
||||||
|
shutil.move(filepath, os.path.join(multiple_folder, os.path.basename(filepath)))
|
||||||
|
|
||||||
|
print(f"\n[Info] Multiple submission attempts filtering completed.\nOlder submissions moved to folder: {MULTIPLE_DIR_NAME}")
|
||||||
|
|
||||||
|
def _validate_gradebook_dir_name(src_dir: str) -> None:
|
||||||
if not os.path.isdir(src_dir): # check if it exists and is a directory
|
if not os.path.isdir(src_dir): # check if it exists and is a directory
|
||||||
print(f"\n[Error] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in 'BB_gradebooks'")
|
print(f'\n[ERROR] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in "{BB_GRADEBOOKS_DIR}"')
|
||||||
exit()
|
exit()
|
||||||
if not os.listdir(src_dir): # check if there are any files in the directory
|
if not os.listdir(src_dir): # check if there are any files in the directory
|
||||||
print(f'\n[Info] No files found in this gradebook - nothing to organise')
|
print(f'\n[Info] No files found in this gradebook - nothing to organise')
|
||||||
@@ -14,34 +50,62 @@ def validate_gradebook_dir_name(src_dir: str) -> None:
|
|||||||
print(f'\n[Info] Gradebook has only invalid compressed files in: {os.path.join(src_dir, BAD_DIR_NAME)}\n[Info] Nothing to organise')
|
print(f'\n[Info] Gradebook has only invalid compressed files in: {os.path.join(src_dir, BAD_DIR_NAME)}\n[Info] Nothing to organise')
|
||||||
exit()
|
exit()
|
||||||
|
|
||||||
|
def _get_comment_from_submission_txt(file_path: str) -> tuple[str, str] | None:
|
||||||
|
no_comment_regex = f'Comments:\nThere are no student comments for this assignment.'
|
||||||
|
no_comment_pattern = re.compile(no_comment_regex)
|
||||||
|
|
||||||
def get_comment_from_submission_txt(file_path: str) -> str | None:
|
with open(file_path, encoding='utf-8') as f:
|
||||||
no_comment_text = f'Comments:\nThere are no student comments for this assignment.'
|
|
||||||
no_comment_text_regex = no_comment_text
|
|
||||||
no_comment_regex_compile = re.compile(no_comment_text_regex)
|
|
||||||
|
|
||||||
with open(file_path) as f:
|
|
||||||
file_contents = f.read()
|
file_contents = f.read()
|
||||||
if not no_comment_regex_compile.findall(file_contents):
|
if not no_comment_pattern.findall(file_contents):
|
||||||
regular_expression = f'Comments:\n.*'
|
comment_regex = f'Comments:\n.*'
|
||||||
regex_compile = re.compile(regular_expression)
|
name_regex = f'^Name:\s*.*'
|
||||||
match = regex_compile.findall(file_contents)[0]
|
comment_pattern = re.compile(comment_regex)
|
||||||
comment = match.split('\n')[1]
|
name_pattern = re.compile(name_regex)
|
||||||
return comment
|
if comment_pattern.findall(file_contents):
|
||||||
return None
|
comment_match = comment_pattern.findall(file_contents)[0]
|
||||||
|
comment = comment_match.split('\n')[1]
|
||||||
|
name_match = name_pattern.findall(file_contents)[0]
|
||||||
|
name = name_match.split('Name:')[1].split('(')[0].strip() or ''
|
||||||
|
return comment, name
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
def _get_comment_from_submission_txt_BB_ultra(file_path: str) -> tuple[str, str] | None:
|
||||||
|
with open(file_path, encoding='utf-8') as f:
|
||||||
|
file_contents = f.read()
|
||||||
|
|
||||||
|
match = re.search(r'Submission Field:\s*<br>(.*)', file_contents, re.DOTALL) # find the section starting with "Submission Field: <br>"
|
||||||
|
if not match:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
section = match.group(1)
|
||||||
|
section = re.sub(r'\s*<p><a href.*?</a>', '', section, flags=re.DOTALL) # remove the part starting with "<p><a href" and ending with "</a></p>"
|
||||||
|
paragraphs = re.findall(r'<p>(.*?)</p>', section, re.DOTALL) or None # extract text inside <p> tags
|
||||||
|
|
||||||
|
if not paragraphs:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
cleaned_text = '\n'.join(p.replace('<br>', '\n') for p in paragraphs) # replace <br> with new lines within paragraphs
|
||||||
|
|
||||||
|
if not cleaned_text:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
name_regex = f'^Name:\s*.*'
|
||||||
|
name_pattern = re.compile(name_regex)
|
||||||
|
name_match = name_pattern.findall(file_contents)[0]
|
||||||
|
name = name_match.split('Name:')[1].split('(')[0].strip() or ''
|
||||||
|
|
||||||
def get_gradebook_stats(src_dir: str) -> dict[str, int]:
|
return cleaned_text.strip(), name # comment, name
|
||||||
all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f ]
|
|
||||||
dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f ]
|
def _get_gradebook_stats(src_dir: str) -> dict[str, int]:
|
||||||
|
all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
|
||||||
|
dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
|
||||||
normal_files = [ f for f in all_files if os.path.isfile(f) ]
|
normal_files = [ f for f in all_files if os.path.isfile(f) ]
|
||||||
|
|
||||||
tracked_file_extensions = [ '.zip', '.rar', '.7z', '.txt' ] # add extension in list to track stats for more
|
|
||||||
files_counter = {}
|
files_counter = {}
|
||||||
files_counter['all'], files_counter['dirs'], files_counter['normal'] = len(all_files), len(dirs), len(normal_files)
|
files_counter['all'], files_counter['dirs'], files_counter['normal'] = len(all_files), len(dirs), len(normal_files)
|
||||||
|
|
||||||
tracked_files_counter = 0
|
tracked_files_counter = 0
|
||||||
for ext in tracked_file_extensions:
|
for ext in TRACKED_FILE_EXT:
|
||||||
files_counter[ext] = len([ f for f in normal_files if f.lower().endswith(ext) ])
|
files_counter[ext] = len([ f for f in normal_files if f.lower().endswith(ext) ])
|
||||||
tracked_files_counter += files_counter[ext]
|
tracked_files_counter += files_counter[ext]
|
||||||
|
|
||||||
@@ -49,14 +113,13 @@ def get_gradebook_stats(src_dir: str) -> dict[str, int]:
|
|||||||
files_counter['untracked'] = files_counter['normal'] - tracked_files_counter
|
files_counter['untracked'] = files_counter['normal'] - tracked_files_counter
|
||||||
|
|
||||||
dirs_msg = f'. Also found {len(dirs)} dir(s), wasn\'t expecting any!' if len(dirs) else ''
|
dirs_msg = f'. Also found {len(dirs)} dir(s), wasn\'t expecting any!' if len(dirs) else ''
|
||||||
tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in tracked_file_extensions ]
|
tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in TRACKED_FILE_EXT ]
|
||||||
tracked_msg = f"{', '.join(str(f) for f in tracked_files_list)}"
|
tracked_msg = f"{', '.join(str(f) for f in tracked_files_list)}"
|
||||||
msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(tracked_file_extensions)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}'
|
msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(TRACKED_FILE_EXT)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}'
|
||||||
print(msg)
|
print(msg, flush=True)
|
||||||
return files_counter
|
return files_counter
|
||||||
|
|
||||||
|
def _organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, student_no: str) -> None:
|
||||||
def organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, student_no: str) -> None:
|
|
||||||
student_dir = os.path.join(dest_dir, student_no)
|
student_dir = os.path.join(dest_dir, student_no)
|
||||||
os.makedirs(student_dir, exist_ok=True) # create student directory if it doesn't exist
|
os.makedirs(student_dir, exist_ok=True) # create student directory if it doesn't exist
|
||||||
file_path = os.path.join(src_dir, file_name)
|
file_path = os.path.join(src_dir, file_name)
|
||||||
@@ -69,44 +132,49 @@ def organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, stude
|
|||||||
os.remove(file_path) # delete compressed file after successful extraction
|
os.remove(file_path) # delete compressed file after successful extraction
|
||||||
else:
|
else:
|
||||||
if file_path_lowercase.endswith('.txt'):
|
if file_path_lowercase.endswith('.txt'):
|
||||||
comment = get_comment_from_submission_txt(file_path) # get student comment (if any) from submission txt file
|
comment, name = _get_comment_from_submission_txt_BB_ultra(file_path) # get student comment (if any), and name, from submission txt file
|
||||||
if comment:
|
if comment and name:
|
||||||
comments_filename = f'{dest_dir}_comments.txt'
|
comments_filename = f'{dest_dir}_comments.txt'
|
||||||
with open(comments_filename, 'a') as f:
|
with open(comments_filename, 'a') as f:
|
||||||
f.write(f'\nStudent number: {student_no} - File: {file_path}\nComment: {comment}\n')
|
f.write(f'\nStudent number: {student_no} - Student name: {name}\nFile: {file_path}\nComment: {comment}\n')
|
||||||
else:
|
else:
|
||||||
file_name = file_name.split('_attempt_')[1].split('_', 1)[1] # rename any remaining files before moving - remove the BB generated info added to the original file name
|
try:
|
||||||
|
file_name = file_name.split('_attempt_', 1)[1].split('_', 1)[1] # rename any remaining files before moving - remove the BB generated info added to the original file name
|
||||||
|
except IndexError as e:
|
||||||
|
print(f'Cannot process file - possible incorrect format of filename')
|
||||||
new_file_path = os.path.join(student_dir, os.path.basename(file_name))
|
new_file_path = os.path.join(student_dir, os.path.basename(file_name))
|
||||||
shutil.move(file_path, new_file_path) # move the file to student directory
|
shutil.move(file_path, new_file_path) # move the file to student directory
|
||||||
|
|
||||||
|
|
||||||
def organise_gradebook(src_dir: str, dest_dir: str) -> None:
|
def organise_gradebook(src_dir: str, dest_dir: str) -> None:
|
||||||
"""1) extracts .zip, .rar, .7z files, organises contents into directories per student number, and deletes compressed files after successful extraction
|
"""1) extracts .zip, .rar, .7z files, organises contents into directories per student number, and deletes compressed files after successful extraction
|
||||||
2) organises all other files in gradebook into directories per student number
|
2) organises all other files in gradebook into directories per student number
|
||||||
3) checks if there are any comments in submission text files and extracts them into a file
|
3) checks if there are any comments in submission text files and extracts them into a file
|
||||||
"""
|
"""
|
||||||
validate_gradebook_dir_name(src_dir) # check if dir exists, and has files in it - exits if not
|
_validate_gradebook_dir_name(src_dir) # check if dir exists, and has files in it - exits if not
|
||||||
os.makedirs(dest_dir, exist_ok=True) # create the destination directory if it doesn't exist
|
os.makedirs(dest_dir, exist_ok=True) # create the destination directory if it doesn't exist
|
||||||
print('\nGetting gradebook stats...')
|
_filter_multiple_attempts(src_dir)
|
||||||
files_counter = get_gradebook_stats(src_dir) # print stats about the files in gradebook and get files_counter dict to use later
|
print('\nGetting gradebook stats...', flush=True)
|
||||||
|
files_counter = _get_gradebook_stats(src_dir) # print stats about the files in gradebook and get files_counter dict to use later
|
||||||
students_numbers: list[str] = [] # list to add and count unique student numbers from all files in gradebook
|
students_numbers: list[str] = [] # list to add and count unique student numbers from all files in gradebook
|
||||||
print('\nStart organising...\n')
|
print('\nStart organising... (this may take a while depending on the number -and size- of submissions)\n', flush=True)
|
||||||
for file_name in os.listdir(src_dir): # iterate through all files in the directory
|
|
||||||
if BAD_DIR_NAME not in file_name: # ignore dir BAD_DIR_NAME (created after first run if corrupt compressed files found)
|
|
||||||
student_no = file_name.split('_attempt_')[0].split('_')[-1] # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !!
|
|
||||||
students_numbers.append(student_no)
|
|
||||||
organise_file_per_student(src_dir, dest_dir, file_name, student_no)
|
|
||||||
|
|
||||||
abs_path = os.getcwd() # absolute path of main script
|
|
||||||
print(f'[Info] Submissions organised into directory: {os.path.join(abs_path, dest_dir)}')
|
|
||||||
print(f'[Info] Unique student numbers in gradebook files: {len(set(students_numbers))}')
|
|
||||||
if files_counter['.txt'] == 0:
|
|
||||||
print(f'[Info] No submission text files found, file with comments not created')
|
|
||||||
else:
|
|
||||||
print(f'[Info] Comments in file: {dest_dir}_comments.txt')
|
|
||||||
|
|
||||||
print(f'[Note] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction')
|
|
||||||
|
|
||||||
|
for file_name in os.listdir(src_dir): # iterate through all files in the directory
|
||||||
|
if BAD_DIR_NAME not in file_name and MULTIPLE_DIR_NAME not in file_name: # ignore dirs BAD_DIR_NAME (created after first run if corrupt compressed files found) and MULTIPLE_DIR_NAME (dir with older attempts)
|
||||||
|
student_no = file_name.split('_attempt_', 1)[0].split('_')[-1] # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !!
|
||||||
|
students_numbers.append(student_no)
|
||||||
|
_organise_file_per_student(src_dir, dest_dir, file_name, student_no)
|
||||||
|
|
||||||
|
ignored_str = ', '.join(IGNORE_DIRS)
|
||||||
|
print(f'[Info] Skipped extracting files in dirs with name that includes any of the following strings: {ignored_str}\n', flush=True)
|
||||||
|
abs_path = os.getcwd() # absolute path of main script
|
||||||
|
print(f'[Info] Submissions organised into directory: {os.path.join(abs_path, dest_dir)}\n', flush=True)
|
||||||
|
print(f'[Info] Unique student numbers in gradebook files: {len(set(students_numbers))}\n', flush=True)
|
||||||
|
if files_counter['.txt'] == 0:
|
||||||
|
print(f'[Info] No submission text files found, file with comments not created\n', flush=True)
|
||||||
|
else:
|
||||||
|
print(f'[Info] Comments in file: {dest_dir}_comments.txt\n', flush=True)
|
||||||
|
|
||||||
|
print(f'[Info] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction\n', flush=True)
|
||||||
|
|
||||||
def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
|
def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
|
||||||
"""checks if any submitted compressed files contain more compressed files inside (they are not recursively extracted)
|
"""checks if any submitted compressed files contain more compressed files inside (they are not recursively extracted)
|
||||||
@@ -122,6 +190,6 @@ def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
|
|||||||
|
|
||||||
if compressed_files:
|
if compressed_files:
|
||||||
compressed_files_str = '\n'.join(compressed_files)
|
compressed_files_str = '\n'.join(compressed_files)
|
||||||
print(f'\n[Warning] One or more compressed files from the gradebook contain compressed file(s) inside ({len(compressed_files)} found in total)')
|
print(f'\n[Warning] One or more compressed files found in the extracted and organised submission files ({len(compressed_files)} found in total)')
|
||||||
print('\nSee below the organised per student compressed files, and extract them manually:\n')
|
print('\n[Info] See below the list of compressed files, organised per student, and extract them manually if necessary:\n')
|
||||||
print(compressed_files_str)
|
print(compressed_files_str)
|
||||||
|
|||||||
@@ -1,4 +1,16 @@
|
|||||||
import os
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
BB_GRADEBOOKS_DIR = 'BB_gradebooks' # directory with extracted gradebooks downloaded from Blackboard
|
||||||
|
BB_SUBMISSIONS_DIR = 'BB_submissions' # directory with organised gradebook submissions
|
||||||
BAD_DIR_NAME = '__BAD__' # for organise_gradebook.py - directory with corrupt/invalid compressed files
|
BAD_DIR_NAME = '__BAD__' # for organise_gradebook.py - directory with corrupt/invalid compressed files
|
||||||
|
MULTIPLE_DIR_NAME = '__multiple__' # for organise_gradebook.py - directory with older attempts / submissions when there is more than one. script organises only the most recent.
|
||||||
|
|
||||||
CSV_DIR = os.path.join(os.getcwd(), 'csv-inspect') # for inspect_gradebook.py and inspect_submissions.py - output dir for generated CSV files
|
CSV_DIR = os.path.join(os.getcwd(), 'csv-inspect') # for inspect_gradebook.py and inspect_submissions.py - output dir for generated CSV files
|
||||||
|
IGNORE_DIRS = [ '__MACOSX', '.git', 'node_modules', 'vendor' ] # list of dir names to ignore from extracting
|
||||||
|
|
||||||
|
TRACKED_FILE_EXT = [ '.zip', '.rar', '.7z', '.txt', '.pde' ] # add extension in list to track stats for more
|
||||||
|
|
||||||
|
|
||||||
|
# inspect
|
||||||
|
MIN_FILESIZE_IN_BYTES = 10
|
||||||
Reference in New Issue
Block a user