Compare commits

...

14 Commits

Author SHA1 Message Date
cc4d700028 code clean-up + minor improvements 2025-02-21 22:06:35 +00:00
5cad017a83 1) filter multiple attempts and keep only latest 2) new way to get any submission comment - compatible with BB ultra 2025-02-21 17:49:03 +00:00
5a2d03db7d skip "empty" (based on MIN FILE SIZE) files from inspection 2025-02-21 17:47:55 +00:00
5f91e08b00 added dir name to move multiple submissions / attempt (except for the latest). and minimum size for files to inspect (and skip empty files) 2025-02-21 17:47:07 +00:00
f25688dc9f use relative path (instead of full path) for csv HYPERLINKs - allows moving/sharing generated files w/ submission files 2024-11-05 23:20:19 +00:00
beefb025d6 tracked file extensions moved to settings.py + encoding added when reading comments 2024-11-05 23:13:13 +00:00
b7f9db0efc try/except when splitting the BB generated filename 2024-10-24 22:58:33 +01:00
3d86409f75 fix encoding error when assignment name has an emoji (or unicode, in general) 2024-10-24 22:57:20 +01:00
6a2144517b Restructure documentation - separate Inspect by hash 2024-10-24 22:55:38 +01:00
9ca32f1e48 added maxsplit limit = 1 when splitting the submitted files (fix for breaking when the student's file name inlcuded 'attempt') 2024-10-04 15:03:44 +01:00
d3767b54a5 docs: layout / structure changes 2024-04-26 21:00:50 +01:00
de7dc817aa update gitgnore re: requirements 2024-04-26 12:39:20 +01:00
ebc7a2599d update docs for added default ignored dir '.git' 2024-04-26 12:38:45 +01:00
71092daee0 added '.git' to IGNORE_DIRS 2024-04-26 12:37:49 +01:00
11 changed files with 203 additions and 119 deletions

3
.gitignore vendored
View File

@@ -140,6 +140,9 @@ mkdocs.yml
# vangef # vangef
requirements.*.txt
!requirements.txt
___*.py ___*.py
venv* venv*
.TODO .TODO

View File

@@ -4,13 +4,15 @@ Blackboard Gradebook Organiser - main (functional) changes and new features log
## **Notable updates** ## **Notable updates**
2024-04-30 Restructure documentation - separate *Inspect by hash*
2024-03-01 Allow customisation of default settings - most useful default to edit is `IGNORE_DIRS`: the list of names for directories, or files, to ignore when extracting from compressed files 2024-03-01 Allow customisation of default settings - most useful default to edit is `IGNORE_DIRS`: the list of names for directories, or files, to ignore when extracting from compressed files
2023-07-17 Documentation updated and web docs added at [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser) 2023-07-17 Documentation updated and web docs added at [docs.vangef.net/BBGradebookOrganiser](https://docs.vangef.net/BBGradebookOrganiser)
2023-03-16 Hyperlinks for file paths and names listed in generated CSV files by *inspect by hash* 2023-03-16 Hyperlinks for file paths and names listed in generated CSV files by *inspect by hash*
2023-03-10 Added *inspect gradebook* and merged with *inspect submission* to make [***inspect by hash***](inspect.md) 2023-03-10 Added *inspect gradebook* and merged with *inspect submission* to make [***inspect by hash***](inspect/about.md)
2023-03-02 Added *exclude files from hashing* 2023-03-02 Added *exclude files from hashing*

View File

@@ -10,7 +10,7 @@ Blackboard Gradebook Organiser
**Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn ⧉](https://en.wikipedia.org/wiki/Blackboard_Learn). **Blackboard Gradebook Organiser** is a tool for organising a downloaded gradebook with assignment submissions from [Blackboard Learn ⧉](https://en.wikipedia.org/wiki/Blackboard_Learn).
The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review. The submission files are organised per student, by extracting the student number from the submission file names and creating a directory per student. Compressed files are extracted into the student's directory, and any remaining individually submitted files are also moved into the student's directory. Student comments from the submissions are also extracted into a single text file for convenient access and review.
Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](inspect.md) for more information. Optionally, you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. See [Inspect by hash](inspect/about.md) for more information.
## **Features** ## **Features**
@@ -22,10 +22,12 @@ Optionally, you can inspect the submissions for identical files (by generating a
- `__MACOSX` (macOS system generated files) - `__MACOSX` (macOS system generated files)
- `vendor` (composer / laravel) - `.git` (git repo files)
- `node_modules` (npm) - `node_modules` (npm)
- `vendor` (composer / laravel)
- Deletes each compressed file after successful extraction into student directory - Deletes each compressed file after successful extraction into student directory
- Organises per student any remaining individually submitted files - Organises per student any remaining individually submitted files
@@ -36,7 +38,7 @@ Optionally, you can inspect the submissions for identical files (by generating a
- The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually - The path of any extracted and organised compressed files will be displayed on the terminal - they need to be extracted manually
- [Inspect by hash](inspect.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect: - [Inspect by hash](inspect/about.md) generates and compares SHA256 hashes of all the submitted files, and detects files that are identical and have been submitted by multiple students. Two ways to inspect:
- Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard* - Inspect gradebook: Before organising a gradebook - for identical files in the files submitted to *Blackboard*
@@ -44,11 +46,7 @@ Optionally, you can inspect the submissions for identical files (by generating a
## **Instructions** ## **Instructions**
See [***Instructions***](instructions.md) for more information & details. See the documentation for [Requirements & Settings](instructions/requirements-settings.md) and [Usage](instructions/usage.md) instructions, and more information & details about [***Inspect by hash***](inspect/about.md).
## **Inspect by hash** :mag:
See [***Inspect by hash***](inspect.md) for more information & details.
## **General notes** ## **General notes**

29
docs/inspect/about.md Normal file
View File

@@ -0,0 +1,29 @@
# **Inspect by hash** :mag:
Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash
## **Description**
With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations:
[*Inspect gradebook*](usage.md#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard*
[*Inspect submissions*](usage.md#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files
## **Features**
- Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file
- Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*)
- Compares the generated hashes and finds any duplicates - ignores duplicates if they are by the same student/submission
- Finds all files with the same hash and outputs the list to a CSV file with the following information:
- *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash`
- *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash`
- File names and paths listed in the generated CSV files have hyperlinks to the actual files for a quick inspection of the file contents (or running the files, if executable)
*Note:* Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files.

View File

@@ -0,0 +1,9 @@
# **Inspect by hash** :mag:
## **Requirements**
The ***inspect*** scripts require the `pandas` package - if it's not already installed, run:
```console
python -m pip install pandas
```

View File

@@ -1,42 +1,6 @@
# **Inspect by hash** :mag: # **Using Inspect by hash** :mag:
Blackboard Gradebook Organiser - Inspect gradebook & submissions by hash ## **Inspect gradebook**
## **Description**
With **Inspect by hash** you can inspect the submissions for identical files (by generating and comparing SHA256 hashes) and detect if any files have been submitted by multiple students. The tool has two variations:
[*Inspect gradebook*](#inspect-gradebook): Before organising a gradebook - for identical files in the files submitted to *Blackboard*
[*Inspect submissions*](#inspect-submissions): After organising a gradebook - for identical files in the files extracted from any submitted compressed files
## **Features**
- Generates SHA256 hashes for each submitted file, and outputs the list to a CSV file
- Can exclude files from hashing, if provided with a CSV file listing the file names (only applicable for *Inspect submissions*)
- Compares the generated hashes and finds any duplicates - ignores duplicates if they are by the same student/submission
- Finds all files with the same hash and outputs the list to a CSV file with the following information:
- *Inspect gradebook*: `Student ID`, `file name`, `SHA256 hash`
- *Inspect submissions*: `Student ID`, `file path`, `file name`, `SHA256 hash`
- File names and paths listed in the generated CSV files have hyperlinks to the actual files for a quick inspection of the file contents (or running the files, if executable)
*Note:* Further analysis needs to be done manually by inspecting and filtering the generated output, depending on the submission and its files.
## **Instructions**
Before running the *inspect* scripts for the first time, you also need to install the *pandas* package:
```console
python -m pip install pandas
```
### **Inspect gradebook**
If you haven't already, extract the downloaded from *Blackboard* gradebook in a new directory inside *BB_gradebooks* If you haven't already, extract the downloaded from *Blackboard* gradebook in a new directory inside *BB_gradebooks*
@@ -56,7 +20,7 @@ Generated CSV files can be found in directory `csv-inspect`, with the inspected
- `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes - `AssignmentX_gradebook_duplicate_[datetime].csv` - files with duplicate hashes
### **Inspect submissions** ## **Inspect submissions**
To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument. To inspect *submissions* run **`inspect_submissions.py`** and provide the name of the directory with the *organised* gradebook submissions as an argument.

View File

@@ -0,0 +1,37 @@
# **Requirements & Settings**
## **Install requirements**
Before running the script for the first time, install the required python packages:
Option 1 - Install `py7z`, `rarfile`
```console
python -m pip install py7zr rarfile
```
Option 2 - Install all packages, including `pandas` which is used in [Inspect by hash](../inspect/about.md), using the requirements file
```console
python -m pip install -r requirements.txt
```
**Note**: If running on Linux/Mac, you also need to have `unrar` installed in order to be able to extract `.rar` files (applies for both options 1 and 2)
- `sudo apt install unrar` for Linux
- `brew install rar` for Mac
## (Optional) **Edit settings**
You can change the default settings by editing *utils/settings.py*. The main setting you might want to edit is `IGNORE_DIRS` - the list of names for directories, or files, to ignore when extracting from compressed files.
Ignored directories by default:
- `__MACOSX` (macOS system generated files)
- `.git` (git repo files)
- `node_modules` (npm)
- `vendor` (composer / laravel)

View File

@@ -1,38 +1,4 @@
# **Instructions** # **Using BBGradebookOrganiser**
## **Script requirements**
Before running the script for the first time, install the required python packages:
Option 1 - install `py7z`, `rarfile`
```console
python -m pip install py7zr rarfile
```
Option 2 - install all packages, including `pandas` which is used in [Inspect by hash](inspect.md), using the requirements file
```console
python -m pip install -r requirements.txt
```
Note: If running on Linux/Mac, you also need to have `unrar` installed in order to be able to extract `.rar` files.
- `sudo apt install unrar` for Linux
- `brew install rar` for Mac
## (Optional) **Edit script defaults**
You can change the default settings by editing *utils/settings.py*. The main setting you might want to edit is `IGNORE_DIRS` - the list of names for directories, or files, to ignore when extracting from compressed files.
Ignored directories by default:
- `__MACOSX` (macOS system generated files)
- `vendor` (composer / laravel)
- `node_modules` (npm)
## **Download gradebook** ## **Download gradebook**
@@ -88,4 +54,4 @@ Compressed files are deleted after successfully extracting and organising the co
## **Inspect by hash** :mag: ## **Inspect by hash** :mag:
See [***Inspect by hash***](inspect.md) for more information & details. See [***Inspect by hash***](../inspect/about.md) for more information & details.

View File

@@ -5,7 +5,7 @@ import hashlib
import pandas as pd import pandas as pd
from functools import partial from functools import partial
from utils.settings import CSV_DIR, BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR from utils.settings import CSV_DIR, BB_GRADEBOOKS_DIR, BB_SUBMISSIONS_DIR, MIN_FILESIZE_IN_BYTES
def load_excluded_filenames(submissions_dir_name: str) -> list[str]: # helper function for hashing all files def load_excluded_filenames(submissions_dir_name: str) -> list[str]: # helper function for hashing all files
@@ -31,10 +31,13 @@ def get_hashes_in_dir(dir_path: str, excluded_filenames: list = []) -> list: #
for filename in files: for filename in files:
if filename.lower() not in excluded_filenames: # convert to lowercase for comparison with excluded files & do not hash if in the excluded list if filename.lower() not in excluded_filenames: # convert to lowercase for comparison with excluded files & do not hash if in the excluded list
filepath = os.path.join(subdir, filename) filepath = os.path.join(subdir, filename)
if os.path.getsize(filepath) > MIN_FILESIZE_IN_BYTES: # file size more than MIN_FILESIZE_IN_BYTES (as set in settings.py)
with open(filepath, 'rb') as f: with open(filepath, 'rb') as f:
filehash = hashlib.sha256(f.read()).hexdigest() filehash = hashlib.sha256(f.read()).hexdigest()
if filehash != 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855': # do not include hashes of empty files #if filehash != 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855': # do not include hashes of empty files
hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash}) hash_list.append({ 'filepath': filepath, 'filename': filename, 'sha256 hash': filehash})
# else:
# print(f'size: {os.path.getsize(filepath)}B, {filepath}')
return hash_list return hash_list
def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function for hashing all files in gradebook def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function for hashing all files in gradebook
@@ -45,8 +48,8 @@ def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function
dicts_with_hashes_list = get_hashes_in_dir(gradebook_dir_path) dicts_with_hashes_list = get_hashes_in_dir(gradebook_dir_path)
for hash_dict in dicts_with_hashes_list: for hash_dict in dicts_with_hashes_list:
student_id = hash_dict['filename'].split('_attempt_')[0].split('_')[-1] student_id = hash_dict['filename'].split('_attempt_')[0].split('_')[-1]
full_path = os.path.join(os.getcwd(), hash_dict["filepath"]) relative_path = os.path.join('..', hash_dict["filepath"])
hash_dict['filename'] = f'=HYPERLINK("{full_path}", "{hash_dict["filename"]}")' hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
del hash_dict['filepath'] del hash_dict['filepath']
hash_dict.update({'Student ID': student_id}) hash_dict.update({'Student ID': student_id})
@@ -54,7 +57,7 @@ def generate_hashes_gradebook(gradebook_dir_path: str) -> str: # main function
csv_file_name = f'{gradebook_dir_name}_gradebook_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv' csv_file_name = f'{gradebook_dir_name}_gradebook_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
csv_file_path = os.path.join(CSV_DIR, csv_file_name) csv_file_path = os.path.join(CSV_DIR, csv_file_name)
with open(csv_file_path, 'w', newline='') as csvfile: # open the output CSV file for writing with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile: # open the output CSV file for writing
fieldnames = ['Student ID', 'filename', 'sha256 hash'] fieldnames = ['Student ID', 'filename', 'sha256 hash']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader() writer.writeheader()
@@ -75,9 +78,9 @@ def generate_hashes_submissions(submissions_dir_path: str) -> str: # main funct
student_dicts_list = [] student_dicts_list = []
for hash_dict in student_dicts_with_hashes_list: for hash_dict in student_dicts_with_hashes_list:
hash_dict.update({'Student ID': student_dir_name}) # update hash records with student id hash_dict.update({'Student ID': student_dir_name}) # update hash records with student id
full_path = os.path.join(os.getcwd(), hash_dict["filepath"]) relative_path = os.path.join('..', hash_dict["filepath"])
hash_dict['filepath'] = f'=HYPERLINK("{full_path}", "{hash_dict["filepath"]}")' hash_dict['filepath'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filepath"]}")'
hash_dict['filename'] = f'=HYPERLINK("{full_path}", "{hash_dict["filename"]}")' hash_dict['filename'] = f'=HYPERLINK("{relative_path}", "{hash_dict["filename"]}")'
student_dicts_list.append(hash_dict) # append file dict to student list of dict for csv export student_dicts_list.append(hash_dict) # append file dict to student list of dict for csv export
dicts_with_hashes_list.append(student_dicts_list) # append student hashes to main list with all submissions dicts_with_hashes_list.append(student_dicts_list) # append student hashes to main list with all submissions
@@ -86,7 +89,7 @@ def generate_hashes_submissions(submissions_dir_path: str) -> str: # main funct
csv_file_name = f'{submissions_dir_name}_submissions_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv' csv_file_name = f'{submissions_dir_name}_submissions_file_hashes_{datetime.now().strftime("%Y%m%d-%H%M%S")}.csv'
csv_file_path = os.path.join(CSV_DIR, csv_file_name) csv_file_path = os.path.join(CSV_DIR, csv_file_name)
with open(csv_file_path, 'w', newline='') as csvfile: # open the output CSV file for writing with open(csv_file_path, 'w', newline='', encoding='utf-8') as csvfile: # open the output CSV file for writing
fieldnames = ['Student ID', 'filepath', 'filename', 'sha256 hash'] fieldnames = ['Student ID', 'filepath', 'filename', 'sha256 hash']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader() writer.writeheader()

View File

@@ -1,12 +1,47 @@
import os, shutil, re import os, shutil, re
from collections import defaultdict
from utils.extractor import extract_file_to_dir from utils.extractor import extract_file_to_dir
from utils.settings import BAD_DIR_NAME, BB_GRADEBOOKS_DIR, IGNORE_DIRS from utils.settings import BAD_DIR_NAME, MULTIPLE_DIR_NAME, BB_GRADEBOOKS_DIR, IGNORE_DIRS, TRACKED_FILE_EXT
def validate_gradebook_dir_name(src_dir: str) -> None: def _parse_filename(file_path: str) -> tuple[str, str] | None:
"""Extract STUDENTNUMBER and DATETIME from the filename."""
pattern = r'^(.*?)_(\d+)_attempt_(\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2})(?:_.*)?(?:\..+)?$'
match = re.match(pattern, file_path)
if match:
return match.group(2), match.group(3) # STUDENTNUMBER, DATETIME
return None, None
def _filter_multiple_attempts(directory: str) -> None:
"""Keep only the latest attempt for each student and move older attempts to MULTIPLE_DIR_NAME."""
submissions = defaultdict(list)
multiple_folder = os.path.join(directory, MULTIPLE_DIR_NAME)
os.makedirs(multiple_folder, exist_ok=True)
# collect all valid files
for filename in os.listdir(directory):
filepath = os.path.join(directory, filename)
if os.path.isfile(filepath):
student_number, timestamp = _parse_filename(filename)
if student_number and timestamp:
submissions[student_number].append((timestamp, filepath))
# process submissions
for student, files in submissions.items():
files.sort(reverse=True, key=lambda x: x[0]) # sort by timestamp (most recent first)
latest_timestamp = files[0][0] # get the most recent timestamp
# keep all files from the latest attempt, move older ones
for timestamp, filepath in files:
if timestamp != latest_timestamp:
shutil.move(filepath, os.path.join(multiple_folder, os.path.basename(filepath)))
print(f"\n[Info] Multiple submission attempts filtering completed.\nOlder submissions moved to folder: {MULTIPLE_DIR_NAME}")
def _validate_gradebook_dir_name(src_dir: str) -> None:
if not os.path.isdir(src_dir): # check if it exists and is a directory if not os.path.isdir(src_dir): # check if it exists and is a directory
print(f'\n[Error] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in "{BB_GRADEBOOKS_DIR}"') print(f'\n[ERROR] Incorrect directory: {src_dir}\n[Info] Make sure the directory exists in "{BB_GRADEBOOKS_DIR}"')
exit() exit()
if not os.listdir(src_dir): # check if there are any files in the directory if not os.listdir(src_dir): # check if there are any files in the directory
print(f'\n[Info] No files found in this gradebook - nothing to organise') print(f'\n[Info] No files found in this gradebook - nothing to organise')
@@ -15,11 +50,11 @@ def validate_gradebook_dir_name(src_dir: str) -> None:
print(f'\n[Info] Gradebook has only invalid compressed files in: {os.path.join(src_dir, BAD_DIR_NAME)}\n[Info] Nothing to organise') print(f'\n[Info] Gradebook has only invalid compressed files in: {os.path.join(src_dir, BAD_DIR_NAME)}\n[Info] Nothing to organise')
exit() exit()
def get_comment_from_submission_txt(file_path: str) -> tuple[str, str] | None: def _get_comment_from_submission_txt(file_path: str) -> tuple[str, str] | None:
no_comment_regex = f'Comments:\nThere are no student comments for this assignment.' no_comment_regex = f'Comments:\nThere are no student comments for this assignment.'
no_comment_pattern = re.compile(no_comment_regex) no_comment_pattern = re.compile(no_comment_regex)
with open(file_path) as f: with open(file_path, encoding='utf-8') as f:
file_contents = f.read() file_contents = f.read()
if not no_comment_pattern.findall(file_contents): if not no_comment_pattern.findall(file_contents):
comment_regex = f'Comments:\n.*' comment_regex = f'Comments:\n.*'
@@ -34,17 +69,43 @@ def get_comment_from_submission_txt(file_path: str) -> tuple[str, str] | None:
return comment, name return comment, name
return None, None return None, None
def get_gradebook_stats(src_dir: str) -> dict[str, int]: def _get_comment_from_submission_txt_BB_ultra(file_path: str) -> tuple[str, str] | None:
all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f ] with open(file_path, encoding='utf-8') as f:
dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f ] file_contents = f.read()
match = re.search(r'Submission Field:\s*<br>(.*)', file_contents, re.DOTALL) # find the section starting with "Submission Field: <br>"
if not match:
return None, None
section = match.group(1)
section = re.sub(r'\s*<p><a href.*?</a>', '', section, flags=re.DOTALL) # remove the part starting with "<p><a href" and ending with "</a></p>"
paragraphs = re.findall(r'<p>(.*?)</p>', section, re.DOTALL) or None # extract text inside <p> tags
if not paragraphs:
return None, None
cleaned_text = '\n'.join(p.replace('<br>', '\n') for p in paragraphs) # replace <br> with new lines within paragraphs
if not cleaned_text:
return None, None
name_regex = f'^Name:\s*.*'
name_pattern = re.compile(name_regex)
name_match = name_pattern.findall(file_contents)[0]
name = name_match.split('Name:')[1].split('(')[0].strip() or ''
return cleaned_text.strip(), name # comment, name
def _get_gradebook_stats(src_dir: str) -> dict[str, int]:
all_files = [ os.path.join(src_dir, f) for f in os.listdir(src_dir) if BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
dirs = [ f for f in all_files if os.path.isdir(f) and BAD_DIR_NAME not in f and MULTIPLE_DIR_NAME not in f ]
normal_files = [ f for f in all_files if os.path.isfile(f) ] normal_files = [ f for f in all_files if os.path.isfile(f) ]
tracked_file_extensions = [ '.zip', '.rar', '.7z', '.txt' ] # add extension in list to track stats for more
files_counter = {} files_counter = {}
files_counter['all'], files_counter['dirs'], files_counter['normal'] = len(all_files), len(dirs), len(normal_files) files_counter['all'], files_counter['dirs'], files_counter['normal'] = len(all_files), len(dirs), len(normal_files)
tracked_files_counter = 0 tracked_files_counter = 0
for ext in tracked_file_extensions: for ext in TRACKED_FILE_EXT:
files_counter[ext] = len([ f for f in normal_files if f.lower().endswith(ext) ]) files_counter[ext] = len([ f for f in normal_files if f.lower().endswith(ext) ])
tracked_files_counter += files_counter[ext] tracked_files_counter += files_counter[ext]
@@ -52,13 +113,13 @@ def get_gradebook_stats(src_dir: str) -> dict[str, int]:
files_counter['untracked'] = files_counter['normal'] - tracked_files_counter files_counter['untracked'] = files_counter['normal'] - tracked_files_counter
dirs_msg = f'. Also found {len(dirs)} dir(s), wasn\'t expecting any!' if len(dirs) else '' dirs_msg = f'. Also found {len(dirs)} dir(s), wasn\'t expecting any!' if len(dirs) else ''
tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in tracked_file_extensions ] tracked_files_list = [ f'{files_counter[ext]} {ext}' for ext in TRACKED_FILE_EXT ]
tracked_msg = f"{', '.join(str(f) for f in tracked_files_list)}" tracked_msg = f"{', '.join(str(f) for f in tracked_files_list)}"
msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(tracked_file_extensions)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}' msg = f'\n[Stats] Gradebook contains {files_counter["all"]} file(s){dirs_msg}\n[Stats] Tracking {len(TRACKED_FILE_EXT)} file extension(s), files found: {tracked_msg}\n[Stats] Files with untracked extension: {files_counter["untracked"]}'
print(msg, flush=True) print(msg, flush=True)
return files_counter return files_counter
def organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, student_no: str) -> None: def _organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, student_no: str) -> None:
student_dir = os.path.join(dest_dir, student_no) student_dir = os.path.join(dest_dir, student_no)
os.makedirs(student_dir, exist_ok=True) # create student directory if it doesn't exist os.makedirs(student_dir, exist_ok=True) # create student directory if it doesn't exist
file_path = os.path.join(src_dir, file_name) file_path = os.path.join(src_dir, file_name)
@@ -71,13 +132,16 @@ def organise_file_per_student(src_dir: str, dest_dir: str, file_name: str, stude
os.remove(file_path) # delete compressed file after successful extraction os.remove(file_path) # delete compressed file after successful extraction
else: else:
if file_path_lowercase.endswith('.txt'): if file_path_lowercase.endswith('.txt'):
comment, name = get_comment_from_submission_txt(file_path) # get student comment (if any), and name, from submission txt file comment, name = _get_comment_from_submission_txt_BB_ultra(file_path) # get student comment (if any), and name, from submission txt file
if comment and name: if comment and name:
comments_filename = f'{dest_dir}_comments.txt' comments_filename = f'{dest_dir}_comments.txt'
with open(comments_filename, 'a') as f: with open(comments_filename, 'a') as f:
f.write(f'\nStudent number: {student_no} - Student name: {name}\nFile: {file_path}\nComment: {comment}\n') f.write(f'\nStudent number: {student_no} - Student name: {name}\nFile: {file_path}\nComment: {comment}\n')
else: else:
file_name = file_name.split('_attempt_')[1].split('_', 1)[1] # rename any remaining files before moving - remove the BB generated info added to the original file name try:
file_name = file_name.split('_attempt_', 1)[1].split('_', 1)[1] # rename any remaining files before moving - remove the BB generated info added to the original file name
except IndexError as e:
print(f'Cannot process file - possible incorrect format of filename')
new_file_path = os.path.join(student_dir, os.path.basename(file_name)) new_file_path = os.path.join(student_dir, os.path.basename(file_name))
shutil.move(file_path, new_file_path) # move the file to student directory shutil.move(file_path, new_file_path) # move the file to student directory
@@ -86,18 +150,19 @@ def organise_gradebook(src_dir: str, dest_dir: str) -> None:
2) organises all other files in gradebook into directories per student number 2) organises all other files in gradebook into directories per student number
3) checks if there are any comments in submission text files and extracts them into a file 3) checks if there are any comments in submission text files and extracts them into a file
""" """
validate_gradebook_dir_name(src_dir) # check if dir exists, and has files in it - exits if not _validate_gradebook_dir_name(src_dir) # check if dir exists, and has files in it - exits if not
os.makedirs(dest_dir, exist_ok=True) # create the destination directory if it doesn't exist os.makedirs(dest_dir, exist_ok=True) # create the destination directory if it doesn't exist
_filter_multiple_attempts(src_dir)
print('\nGetting gradebook stats...', flush=True) print('\nGetting gradebook stats...', flush=True)
files_counter = get_gradebook_stats(src_dir) # print stats about the files in gradebook and get files_counter dict to use later files_counter = _get_gradebook_stats(src_dir) # print stats about the files in gradebook and get files_counter dict to use later
students_numbers: list[str] = [] # list to add and count unique student numbers from all files in gradebook students_numbers: list[str] = [] # list to add and count unique student numbers from all files in gradebook
print('\nStart organising... (this may take a while depending on the number -and size- of submissions)\n', flush=True) print('\nStart organising... (this may take a while depending on the number -and size- of submissions)\n', flush=True)
for file_name in os.listdir(src_dir): # iterate through all files in the directory for file_name in os.listdir(src_dir): # iterate through all files in the directory
if BAD_DIR_NAME not in file_name: # ignore dir BAD_DIR_NAME (created after first run if corrupt compressed files found) if BAD_DIR_NAME not in file_name and MULTIPLE_DIR_NAME not in file_name: # ignore dirs BAD_DIR_NAME (created after first run if corrupt compressed files found) and MULTIPLE_DIR_NAME (dir with older attempts)
student_no = file_name.split('_attempt_')[0].split('_')[-1] # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !! student_no = file_name.split('_attempt_', 1)[0].split('_')[-1] # get student number from file name !! pattern might need adjusting if file name format from blackboard changes !!
students_numbers.append(student_no) students_numbers.append(student_no)
organise_file_per_student(src_dir, dest_dir, file_name, student_no) _organise_file_per_student(src_dir, dest_dir, file_name, student_no)
ignored_str = ', '.join(IGNORE_DIRS) ignored_str = ', '.join(IGNORE_DIRS)
print(f'[Info] Skipped extracting files in dirs with name that includes any of the following strings: {ignored_str}\n', flush=True) print(f'[Info] Skipped extracting files in dirs with name that includes any of the following strings: {ignored_str}\n', flush=True)
@@ -109,7 +174,7 @@ def organise_gradebook(src_dir: str, dest_dir: str) -> None:
else: else:
print(f'[Info] Comments in file: {dest_dir}_comments.txt\n', flush=True) print(f'[Info] Comments in file: {dest_dir}_comments.txt\n', flush=True)
print(f'[Note] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction\n', flush=True) print(f'[Info] Compressed files (.zip, .rar, .7z) are automatically deleted from the gradebook directory after successful extraction\n', flush=True)
def check_submissions_dir_for_compressed(submissions_dir: str) -> None: def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
"""checks if any submitted compressed files contain more compressed files inside (they are not recursively extracted) """checks if any submitted compressed files contain more compressed files inside (they are not recursively extracted)
@@ -126,5 +191,5 @@ def check_submissions_dir_for_compressed(submissions_dir: str) -> None:
if compressed_files: if compressed_files:
compressed_files_str = '\n'.join(compressed_files) compressed_files_str = '\n'.join(compressed_files)
print(f'\n[Warning] One or more compressed files found in the extracted and organised submission files ({len(compressed_files)} found in total)') print(f'\n[Warning] One or more compressed files found in the extracted and organised submission files ({len(compressed_files)} found in total)')
print('\nSee below the organised per student compressed files, and extract them manually if necessary:\n') print('\n[Info] See below the list of compressed files, organised per student, and extract them manually if necessary:\n')
print(compressed_files_str) print(compressed_files_str)

View File

@@ -4,5 +4,13 @@ import os
BB_GRADEBOOKS_DIR = 'BB_gradebooks' # directory with extracted gradebooks downloaded from Blackboard BB_GRADEBOOKS_DIR = 'BB_gradebooks' # directory with extracted gradebooks downloaded from Blackboard
BB_SUBMISSIONS_DIR = 'BB_submissions' # directory with organised gradebook submissions BB_SUBMISSIONS_DIR = 'BB_submissions' # directory with organised gradebook submissions
BAD_DIR_NAME = '__BAD__' # for organise_gradebook.py - directory with corrupt/invalid compressed files BAD_DIR_NAME = '__BAD__' # for organise_gradebook.py - directory with corrupt/invalid compressed files
MULTIPLE_DIR_NAME = '__multiple__' # for organise_gradebook.py - directory with older attempts / submissions when there is more than one. script organises only the most recent.
CSV_DIR = os.path.join(os.getcwd(), 'csv-inspect') # for inspect_gradebook.py and inspect_submissions.py - output dir for generated CSV files CSV_DIR = os.path.join(os.getcwd(), 'csv-inspect') # for inspect_gradebook.py and inspect_submissions.py - output dir for generated CSV files
IGNORE_DIRS = [ '__MACOSX', 'vendor', 'node_modules' ] # list of dir names to ignore from extracting IGNORE_DIRS = [ '__MACOSX', '.git', 'node_modules', 'vendor' ] # list of dir names to ignore from extracting
TRACKED_FILE_EXT = [ '.zip', '.rar', '.7z', '.txt', '.pde' ] # add extension in list to track stats for more
# inspect
MIN_FILESIZE_IN_BYTES = 10