Evaluation Rubric for Portfolio Project

Candidates are to submit a public Github repo of an ML project built mainly on Python. This will be evaluated using the following rubric. Candidates must explain their design choices in a one-to-one 30 min interview after the project has been graded and accepted. We encourage candidates to prepare a brief powerpoint presentation for this (15 slides).

The minimum grade to pass the portfolio project is 70 points.

  1. Data Preparation and Preprocessing (10 points):

    • How effectively is the data cleaned, normalized, and preprocessed?
    • Are techniques like handling missing data, normalization, and feature engineering appropriately applied?
    • Is there a thoughtful approach to dealing with imbalanced data or outliers?
  2. Model Selection and Rationale (10 points):

    • Is the choice of model suitable for the problem at hand?
    • How well is the reasoning for selecting a particular model articulated?
    • Are comparisons made with alternative models?
  3. Model Training and Validation (10 points):

    • How effectively is the model trained and validated?
    • Are appropriate metrics chosen for evaluating model performance?
    • Is there a robust approach to training, such as cross-validation or use of a validation set?
  4. Code Quality and Efficiency (10 points):

    • Is the code well-organized, readable, and efficient?
    • Are best practices in coding and software engineering followed?
    • How are error handling and exception management implemented?
  5. API Design and Implementation (10 points):

    • How well is the REST API designed (endpoints, request-response structure)?
    • Are best practices in API development (like security, scalability) considered?
    • Is there proper documentation for the API (e.g., Swagger documentation)?
  6. Model Deployment and Environment (10 points):

    • How effectively is the model deployed for use via the REST endpoint?
    • Are considerations like load balancing, scalability, and environment stability addressed?
    • Is there an effective use of cloud services or containerization (e.g., Docker)?
  7. Integration of Machine Learning and API (10 points):

    • How well are the machine learning model and REST API integrated?
    • Is there efficient handling of requests and responses between the server and the model?
    • Are there measures for performance optimization in the integration?
  8. Security and Data Privacy (10 points):

    • Are security best practices for APIs and machine learning models implemented?
    • How is data privacy and protection handled, especially with sensitive data?
    • Are there mechanisms to prevent common vulnerabilities (e.g., SQL injection, data leaks)?
  9. Testing and Reliability (10 points):

    • How thoroughly is the system (both the model and API) tested?
    • Are there unit tests, integration tests, and system tests?
    • Is there evidence of reliable and consistent performance under different scenarios?
  10. Documentation, Reporting, and Usability (10 points):

    • Is the project well-documented, including model training, API usage, and deployment details?
    • Are the results, challenges, and decision-making processes clearly communicated?
    • Is the API user-friendly and easy to use for the end-users?