Evaluation Rubric for Portfolio Project

Candidates are to submit a public Github repo of an ML project built mainly on Python. This will be evaluated using the following rubric. Candidates must explain their design choices in a one-to-one 30 min interview after the project has been graded and accepted. We encourage candidates to prepare a brief powerpoint presentation for this (15 slides).

The minimum grade to pass the portfolio project is 70 points.

Data Preparation and Preprocessing (10 points):
- How effectively is the data cleaned, normalized, and preprocessed?
- Are techniques like handling missing data, normalization, and feature engineering appropriately applied?
- Is there a thoughtful approach to dealing with imbalanced data or outliers?
Model Selection and Rationale (10 points):
- Is the choice of model suitable for the problem at hand?
- How well is the reasoning for selecting a particular model articulated?
- Are comparisons made with alternative models?
Model Training and Validation (10 points):
- How effectively is the model trained and validated?
- Are appropriate metrics chosen for evaluating model performance?
- Is there a robust approach to training, such as cross-validation or use of a validation set?
Code Quality and Efficiency (10 points):
- Is the code well-organized, readable, and efficient?
- Are best practices in coding and software engineering followed?
- How are error handling and exception management implemented?
API Design and Implementation (10 points):
- How well is the REST API designed (endpoints, request-response structure)?
- Are best practices in API development (like security, scalability) considered?
- Is there proper documentation for the API (e.g., Swagger documentation)?
Model Deployment and Environment (10 points):
- How effectively is the model deployed for use via the REST endpoint?
- Are considerations like load balancing, scalability, and environment stability addressed?
- Is there an effective use of cloud services or containerization (e.g., Docker)?
Integration of Machine Learning and API (10 points):
- How well are the machine learning model and REST API integrated?
- Is there efficient handling of requests and responses between the server and the model?
- Are there measures for performance optimization in the integration?
Security and Data Privacy (10 points):
- Are security best practices for APIs and machine learning models implemented?
- How is data privacy and protection handled, especially with sensitive data?
- Are there mechanisms to prevent common vulnerabilities (e.g., SQL injection, data leaks)?
Testing and Reliability (10 points):
- How thoroughly is the system (both the model and API) tested?
- Are there unit tests, integration tests, and system tests?
- Is there evidence of reliable and consistent performance under different scenarios?
Documentation, Reporting, and Usability (10 points):
- Is the project well-documented, including model training, API usage, and deployment details?
- Are the results, challenges, and decision-making processes clearly communicated?
- Is the API user-friendly and easy to use for the end-users?