This Intermediate Generative AI (GenAI) course is for DevOps and ITOps professionals who want to advance their GenAI skills with deployment strategies and best practices for building large language model (LLM) applications. Participants master popular tools and frameworks, including Docker, Kubernetes, and cloud platforms, in an LLM environment.
Skills Gained
- Deploy and manage LLM-powered applications using containerization and orchestration technologies
- Implement strategies for scaling LLM applications to handle increasing workloads
- Monitor and troubleshoot LLM application performance in production environments
- Ensure the security, compliance, and reliability of LLM deployments
- Optimize resource utilization and cost-efficiency for LLM applications
Prerequisites
- Practical Python programming and scripting for automation tasks (6+ months)
- API call access and event stream handling
- Exception handling, debugging, testing, and logging
- Experience with containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes)
- Familiarity with CI/CD pipelines and tools, such as Jenkins, GitLab, or GitHub Actions
- Knowledge of cloud platforms (e.g., AWS, GCP, Azure) and their services
- Experience with monitoring and logging tools, such as Prometheus, Grafana, and ELK stack (Elasticsearch, Logstash, Kibana) is recommended but not required
- Machine Learning concepts recommended - classification, regression, clustering
Outline
Introduction
Containerization and Orchestration
- Containerizing LLM applications using Docker
- Orchestrating LLM containers using Kubernetes
- Deploying an LLM application using Docker and Kubernetes
Scaling LLM Applications
- Strategies for horizontal and vertical scaling
- Load balancing and auto-scaling techniques
- Implementing auto-scaling for an LLM application
Monitoring and Troubleshooting
- Key performance metrics for LLM applications
- Automated Testing for LLMOps
- Differences of LLMOps testing and traditional software testing
- Evaluation using CI/CD Tools
- Evaluating LLM problems like hallucinations, data drift, unethical/harmful outputs
- Monitoring tools and techniques (e.g., Weights and Biases, CircleCI)
- Setting up monitoring for an LLM application
- Creating dashboards and alerts for key metrics
Security, Compliance, and Cost Optimization
- Securing LLM application infrastructure and data
- Ensuring compliance with relevant regulations and standards
- Strategies for optimizing resource usage and costs in cloud-based LLM deployments
Conclusion