Cloud optimization: Saving money and soaring high. Part II/II.
Infrastructure & Software Engineering Trends, Tools & Innovations to help with costs & efficiencies
Hey Mark here, happy Thursday and welcome to the Tech Accelerator.
Letβs continue from last week where we focused on Cloud cost optimization and efficiency βοΈπ°β‘οΈπ. For those who perhaps missed Part I, here it is:
Cloud optimization: Saving money and soaring high. Part I/II.
In this Tech Accelerator, we continue exploring areas of technology optimization opportunities with the aim to maximize efficiency, drive innovation, and propel technological advancements to new heights. So letβs dive right into what we will cover:
π AI-Driven IT Operations (AIOps)
π» Software Development with AI Assistance
π§ͺ AI-Driven Software Testing
π§ Managing Code Quality and Technical Debt
π Tracking and Monitoring Software Health
π€ Revolutionizing Team Collaboration
π Collaborative Tools for Open Source Data
βοΈ CI/CD Automation and Infrastructure as a Service
π Feature Management and Experimentation
π Data Versioning and Collaboration
π Low-Code Development Platform
βοΈ MLOps: Machine Learning Operations
π Large Language Model Operations (LLMOps)
π° Other opportunities
AI-Driven IT Operations (AIOps)
AIOps, or Artificial Intelligence for IT Operations, is a term used to describe the use of AI and machine learning techniques to automate and optimize various IT operations tasks. AIOps can be especially valuable in cloud computing (public/private/both) environments where organizations need to manage large and complex infrastructures that generate massive amounts of data. In today's rapidly evolving technological landscape, the integration of AI-driven IT operations has become a game-changer for businesses seeking enhanced efficiency and cost optimization.
AIOps is important because it can help organizations to improve their cloud infrastructure's performance, reliability, and security and costs.
AIOps can help organizations optimize their cloud spending by identifying inefficiencies and cost-saving opportunities. By analyzing usage patterns and identifying areas where resources are being underutilized, AIOps tools can help organizations optimize their cloud infrastructure and reduce unnecessary spending.
The AIOps Platform Market is expected to reach US$ 80.2 Billion by 2032, growing at a CAGR of 25.4% from 2022 to 2032.
If you are an Enterprise, start your LLM journey in IT. Consolidate tools, automate IT via LLM Agents.
My recommendations to all whether startups or enterprises who want to embark on an AIOps journey and want to build AI/ML knowledge and experience:
Assess and Define: π Clearly define objectives, assess IT infrastructure, and identify pain points.
Data Preparation and Integration: π Clean and normalize data, establish robust integration pipelines for seamless data flow.
Platform Selection: π οΈ Evaluate and choose the right AIOps platform with scalability and advanced analytics capabilities.
Pilot Projects and Collaboration: π Start with small-scale pilots, encourage cross-team collaboration for quick wins.
Monitoring and Learning: π― Continuously monitor performance, collect feedback, and invest in talent for continuous learning and improvement.
The categorization of the goals of typical AIOps projects. The high level
categories are highlighted in dark β¬οΈ
Revolutionizing IT Operations: Explore the Hottest AIOps Projects on GitHub for Enhanced Efficiency and Automation β¬οΈ
AI-Driven Software Testing
One of the biggest benefits of AI-assisted software development tools is that they can help streamline the software development process, allowing developers to create and test code more quickly and efficiently. They can also help identify bugs and other issues more quickly, reducing the time and effort required for manual testing and debugging.
It's exciting to see the rapid development of AI-assisted software development tools over the past year, and it's likely that we'll see even more innovation in this area in the near future. Some potential areas could include:
Improved code optimization: Current AI-assisted software development tools primarily focus on generating code based on user input, but there's a lot of potential for AI to help optimize and improve existing code. This could involve tools that analyze code for performance issues or even #ESG and #sustainability or suggest alternative implementations that are more efficient.
Integration with DevOps workflows: AI tools could be used to automate certain aspects of the software development process, such as generating pull requests, running automated tests, or deploying code to production.
Domain-specific language support: AI tools could be tailored to specific programming domains, such as web development or machine learning. This could allow for more targeted and specialized assistance for developers working in those domains.
Better understanding of natural language: Many current AI-assisted software development tools require users to provide input in a specific format or language. Future tools could be better equipped to understand natural language and provide more intuitive and flexible assistance to developers.
Integration with version control systems: AI tools could be integrated more closely with version control systems like Git, allowing for more seamless collaboration and versioning of code generated by AI models.
They are particularly useful for organizations with a large number of software developers or complex software systems. They are also beneficial for startups or small businesses that need to develop software quickly and with limited resources.
AI tools for code:
AI-driven test-first Software Development
AI-driven test-first software development utilizes AI tools like ChatGPT to support the software development process. Developers provide prompts to generate tests and implementation plans, ensuring code alignment and aiding junior developers in learning coding standards.
AI-driven test-first development offers benefits such as time and cost savings and risk mitigation. Automated test case generation increases productivity, detects bugs early, and enforces consistency and code quality standards. It also aids knowledge transfer and onboarding for new team members, reducing risks associated with inexperience.
However, AI-driven test-first development should not replace manual testing and quality assurance practices. Human expertise is vital for overall software quality. The technique can be applied to various areas such as security testing, co-refactoring, documentation, and collaboration. The time saved varies depending on complexity but can be significant, particularly for repetitive or routine test cases.
Here you can see the technique in use:
ChatGPT - Useful self-tested code.
Guide to prompting, open-source collaborative space to describe prompt engineering:
General knowledge prompting:
abs: Generated Knowledge Prompting for Commonsense Reasoning
Chain of thought prompting
abs: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Managing Code Quality and Technical Debt
Managing code debt is crucial for the long-term success of software initiatives. Code debt refers to the costs incurred when shortcuts or compromises are made during development, leading to suboptimal code, inadequate testing, and delayed updates. Unmanaged code debt can result in increased complexity, higher maintenance costs, and heightened risks of bugs and security vulnerabilities.
To effectively manage code debt, organizations should adopt collaborative decision-making processes, establish clear guidelines, and conduct regular reviews. Prioritizing high-impact areas, enforcing coding standards, and adopting iterative development practices are key techniques for successfully managing code debt.
Implementing specific techniques is vital to manage code debt efficiently. This includes prioritizing high-impact areas, establishing and enforcing coding standards, and adopting Agile and DevSecOps methodologies.
Here are some additional techniques and best practices that can help organizations manage code debt efficiently:
π Refactoring: Improve code quality and maintainability.
π¬ Test Automation: Ensure code integrity and prevent regressions.
π CI/CD: Catch issues early and automate testing and deployment.
π Code Reviews: Identify and address potential issues, adhere to standards.
π Documentation: Document code purpose, behavior, and pitfalls.
π Technical Debt Backlog: Track and prioritize code debt items.
π Education and Training: Continuous learning and skill development.
π Monitoring and Feedback Loops: Measure code quality and gather user feedback.
Tracking and Monitoring Software Health
Tracking health in software development refers to shifting the focus from solely monitoring and prioritizing technical debt to monitoring and improving the overall health of the system. Technical debt refers to sub-optimal code or design choices made during development that may hinder future productivity or cause maintenance issues.
By tracking health, teams can assess the state of their systems in various categories such as development, operations, and architecture. This approach provides a more constructive framework for addressing technical debt because it emphasizes the value of reducing debt in terms of improving the overall health and performance of the system.
Treating the health rating as a service-level objective (SLO) helps prioritize improvements. When the health rating falls below a predefined threshold or drops out of the "green zone," teams can focus on addressing the specific areas that need improvement. This approach aligns technical debt reduction with the team's agreed-upon expectations and connects it to the ultimate goal of delivering a healthier and more reliable system.
By tracking health instead of solely focusing on debt, teams can foster a proactive mindset, continuously improving the system's overall health and performance, and addressing technical debt as part of their regular development and maintenance processes.
Revolutionizing Team Collaboration
Obsidian, a personal knowledge management tool, is revolutionizing team collaboration by offering bi-directional linking capabilities that create a web of interconnected knowledge. Unlike commercial tools like Notion and Confluence, Obsidian is designed to work effectively for teams, enabling quick access to relevant information, seamless collaboration, and up-to-date knowledge. Advanced knowledge management tools like Obsidian and Logseq can help teams transcend traditional siloed approaches, fostering a culture of continuous learning and improvement. These tools facilitate the contribution and collaboration of team members on a shared knowledge base, allowing the organization to harness collective expertise and benefit from improved collaboration, efficiency, and innovation.
π: Obsidian
π: logseq
Collaborative Tools for Open Source Data
Collaboration across different teams is essential for effective model validation and data quality assurance. Evidently, Giskard, pandera, Deepchecks, Great expectations and Soda Core, facilitate collaboration by establishing common standards and providing automation for routine checks and tests. By bringing together data engineers, data scientists, domain experts, and business stakeholders, these tools enable a holistic approach to quality assurance.
While the space is still evolving, these tools provide a promising foundation for ensuring that AI models are aligned with business objectives and deliver reliable, trustworthy results.
π: Evidently
π:Giskard
π: Pandera
π: Deepchecks
π: Great expectations
π: Soda Core
CI/CD Automation and Infrastructure as a Service
CI/CD infrastructure as a service offers significant advantages in terms of reducing maintenance and operational overhead, providing scalability and self-service capabilities, and supporting self-hosted runners for specific use cases. However, organizations must still implement proper security measures to protect their CI/CD infrastructure and ensure the integrity of their development and deployment processes.
Evaluate Managed CI/CD Services: Assess managed CI/CD services like GitHub Actions, Azure DevOps, GitLab CI/CD, CircleCI, or Travis CI. These services offer a wide range of features, scalability, and ease of use. They can save costs by eliminating the need to maintain and operate your own infrastructure, while also providing efficient workflows and integrations with other development tools.
Explore Self-Hosted Solutions: If you have specific requirements or prefer to maintain control over your CI/CD infrastructure, consider open-source self-hosted CI/CD tools. Some popular options include Jenkins, Drone, Concourse, and GoCD. These tools provide flexibility, customization options, and cost savings by utilizing your existing infrastructure. However, they require dedicated resources for setup, maintenance, and operations.
Hybrid Approach: You can also adopt a hybrid approach, combining managed services and open-source tools. For example, you can use a managed service for certain projects or teams that benefit from the convenience and scalability, while utilizing open-source tools for more specific or complex use cases that require customization.
Cost-Effective Tools: Several open-source CI/CD tools are cost-effective alternatives to commercial solutions. For instance, Jenkins, one of the most popular CI/CD tools, is open-source and widely adopted. It offers extensive plugin support, allowing you to customize your CI/CD workflows. Other cost-effective options include Jenkins X, which focuses on Kubernetes-based deployments, and Tekton, a cloud-native CI/CD framework.
π: Jenkins
π: Drone
π: Concourse
π: GoCD
Feature Management and Experimentation
Feature toggle tools, such as Unleash and Flagsmith, have become popular in modern development practices for managing and controlling the release of new features. These tools offer comprehensive solutions for effective feature flag management. Furthermore, the OpenFeature initiative aims to establish a vendor-neutral interface for feature flagging, facilitating seamless management across different tools and platforms.
By leveraging feature toggles, organizations can achieve cost savings by minimizing the risk of costly rollbacks or major failures. They can also improve efficiencies through faster deployments, data-driven resource allocation, and streamlined feature management processes. Feature toggles contribute to increased agility, better risk management, and enhanced collaboration, ultimately leading to improved software quality and user satisfaction. Embracing these tools and industry standards enables organizations to make informed decisions, optimize costs, and deliver exceptional customer service.
Data Versioning and Collaboration
DVC (Data Version Control) is a preferred tool for managing experiments in data science projects. It leverages Git, making it familiar to developers and facilitating the application of engineering practices in data science. DVC's approach to model checkpointing ensures reproducibility by capturing training data, test data, model hyperparameters, and code. It enables time travel across different model versions, supporting continuous delivery for machine learning (CD4ML) in production. DVC can integrate with various storage options like AWS S3, Google Cloud Storage, and MinIO. It excels in tracking model drifts over time when working with rapidly changing data. Some teams use DVC with optimized versioning storage formats like Delta Lake. Setting up DVC early in a project is common practice among data science teams.
Low-Code Development Platforms
Low-code platforms offer several advantages that can lead to cost savings, increased efficiencies, and optimization in software development. They enable faster development through visual interfaces and drag-and-drop features, reducing time and costs. These platforms empower non-technical users, reducing the reliance on highly skilled developers and freeing them to focus on more complex tasks. Reusability and component libraries promote standardization and code consistency, accelerating development. Low-code platforms support agile iterations, prototyping, and early feedback, minimizing the time and cost of changes. Built-in integration capabilities simplify connecting applications with external systems. Streamlined maintenance and updates result in cost savings and improved efficiency. Collaboration among citizen developers and professional developers enhances communication and decision-making. However, low-code platforms may not be suitable for all scenarios, and careful consideration is necessary to determine their applicability.
MLOps: Machine Learning Operations
π Machine Learning (ML) is increasingly used to solve real-world problems, and applying DevOps principles to ML, known as MLOps, is gaining attention.
π The ML life-cycle involves manual steps in deploying ML pipeline models, which can lead to unexpected results due to complex dependencies.
π€ An automated pipeline using Continuous Integration (CI) and Continuous Deployment (CD) is designed to streamline the ML deployment process.
π Monitoring, metrics, and Key Performance Indicators (KPIs) are crucial in MLOps for ensuring high model performance and evaluating real-world robustness.
π Open-source MLOps tools are shaping the ML landscape by providing accessible and customizable solutions.
π Open-source MLOps tools democratize access to MLOps practices and foster collaboration and innovation within the ML community.
βοΈ Popular open-source MLOps tools such as MLflow, Kubeflow, and TensorFlow Extended (TFX) have gained traction, offering automation, versioning, monitoring, and performance evaluation capabilities.
π‘ Open-source MLOps tools empower ML practitioners, accelerate model deployment, and contribute to the development of best practices.
π Open-source MLOps tools continue to evolve, providing diverse solutions to cater to different needs and preferences in the ML field.
Large Language Model Operations (LLMOps)
MLOps addresses the operational aspects of managing machine learning models across their entire life cycle, regardless of their size or type. MLOps covers various activities such as data preprocessing, model training, validation, deployment, monitoring, and performance evaluation. It aims to streamline and automate these processes to ensure reliable and efficient deployment of machine learning models in production environments.
The focus of LLMOps is on the specific challenges associated with large language models, which often require significant computational resources and specialized infrastructure. LLMOps involves tasks such as model training, fine-tuning, deployment, monitoring, and scaling of large language models.
While commercial LLMOps tools are being build, a recent Cowboy ventures outlined the current LLMOps Landscape (They call it Infrastructure for Generative AI) as of May 2023:
πΉ Foundation Models:
Foundation models are trained on massive datasets and serve as the basis for generative AI applications.
πΉ Fine-Tuning:
Fine-tuning is the process of adjusting pre-trained models on curated datasets for specific use cases.
Open source frameworks like TensorFlow and PyTorch, as well as end-to-end solutions like MosaicML, are used for fine-tuning.
Domain-specific generative AI models are gaining traction, allowing incumbents to leverage proprietary data for fine-tuning.
πΉ Data Storage & Retrieval:
Data storage and retrieval for long-term model memory are complex and costly challenges.
Vector databases have emerged as a solution, powering semantic search, similarity search, and recommendation systems.
πΉ Model Supervision: Monitoring, Observability & Explainability:
Monitoring, observability, and explainability are essential for evaluating models during and after production.
Traditional MLOps tools are being adapted for generative AI models, but black box closed source models present challenges.
πΉ Model Safety, Security, and Compliance:
Ensuring model safety, security, and compliance is crucial for enterprise adoption.
Tools are needed to evaluate model fairness, bias, toxicity, and implement guardrails.
Open source will play a major role in generative AI infrastructure, building trust and benefiting from community-driven innovation. To test open source alternatives for generative AI infrastructure, companies should start by researching and identifying relevant options. They should evaluate the features and functionality of each alternative and test their performance and compatibility through benchmarks, pilots, or prototypes. Considering the total cost of ownership is crucial, including deployment, maintenance, and potential hidden costs. Security and compliance aspects should be assessed, ensuring alignment with the company's requirements. Engaging with the open source community provides valuable insights and support. Pilot projects and proof of concepts can validate the tools' suitability for the company's specific use cases. Ongoing monitoring and evaluation help track performance and user feedback. By following these steps, companies can make informed decisions about incorporating open source alternatives into their generative AI infrastructure. Below are alternative OSS tools for LLMOps:
Other opportunities for Founders & Co-Founders
Drive profitability by negotiating with hosting providers for cost reduction, optimizing cloud usage and code efficiency, automating implementation and exploring monetization opportunities, considering outsourcing for scalability, enhancing customer support through resource expansion and automation, and reviewing customer profitability to make necessary pricing adjustments. π°π©οΈπππ―πΌ
Drive sales effectiveness, optimizing sales team and focus on ideal customer profiles and prioritizing high-conversion prospects, leveraging enablement and standardization for increased productivity, and investing in product-led growth while effectively tracking marketing spend and leveraging next-gen sales & marketing tools. ππΌπ―ππ‘
Drive R&D efficiency by prioritizing high ROI projects and reallocating resources, providing uninterrupted time for product development, improving engineering effectiveness and measuring contributions, utilizing tools for visualizing engineering productivity, and cultivating a culture that values ingenuity and profitability. πΌπβοΈπ
Cost efficiency by reducing real estate expenses and renegotiating rent costs, optimizing vendor spend and consolidating licenses, leveraging next-gen software and automation (AI) for finance, IT (see above) , Marketing, Software Development, HR and others to automate and streamline for efficiency.π’πΌπ₯οΈπΈ
Optimize pricing strategies by experimenting with price increases, introducing tiered pricing for value alignment, and encouraging annual upfront prepayments for improved cash flow.
Prompts of the week:
π https://twitter.com/aaditsh/status/1659548160342241282?s=20
π§° AI Tools for Work
LLM.reportΒ - - It is a better way to monitor your OpenAI API usage. Just enter your OpenAI API key, and get a beautiful dashboard. No need to install anything.
SweephyΒ - Sweephy is a code-free SaaS platform that enables users to extract value from data by utilizing state-of-the-art open-source machine learning and natural language processing models, while also providing data cleaning and organization capabilities across different formats and sources.
Velents.aiΒ - AI-powered SaaS platform that automates the hiring process to mitigate the bias and cut hiring cost by 80%.
FloatbotΒ - Floatbot is a SaaS based Conversational AI platform that helps Financial Institutes increase Digital Sales and automate Customer Support.
PixlrΒ - Pixlr provides cloud-based photo editing services that deliver editing capabilities in browsers for both consumers and companies.
πΈ Venture Funding
AI startup funding has rebounded significantly, with money flowing at an accelerated pace. Factors such as market confidence, technological advancements, industry applications, and strategic partnerships have contributed to this resurgence.
Anthropic: π Anthropic secured $450 million in Series C funding from investors including Spark Capital, Google, and Salesforce Ventures. Anthropic focuses on developing AI technologies and solutions.
Builder: ποΈ London-based composable software company, Builder, raised $250 million in Series D funding. Builder utilizes AI to power its software platform, enabling flexible and customizable solutions.
Course5 Intelligence: ππ§ Course5 Intelligence received $55 million in funding to support its investments in deep learning, computer vision, and genAI. The company specializes in providing AI-powered solutions for data analytics and insights.
Axelera AI: π§ π» AI chip manufacturer, Axelera AI, raised $50 million to advance its hardware and software solutions for Edge computing. Axelera AI focuses on developing AI chips optimized for edge devices and edge computing applications.
Union: π€πΌ Union secured $19.1 million in Series A funding for its enterprise service that enables the deployment of AI and data products at scale. Union aims to simplify the process of implementing AI and data solutions for businesses.
π₯Top ML Papers of the Week
Finetuning LLMs to call APIsΒ - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls. This capability can help identify the right API, boosting the ability of LLMs to interact with external tools to complete specific tasks. (paperΒ |Β Project)
Med-PaLM 2Β - a top-performing LLM for medical question answering; scored up to 86.5% on the MedQA dataset (a new state-of-the-art); approaches or exceeds SoTA across MedMCQA, PubMedQA, and MMLU clinical topics datasets. (paperΒ |Β tweet)
QLoRA - 4-bit finetuning of LLMs! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark (paperΒ | Code & Demo)
iNLP - 110-page paper outlining the new paradigm in NLP: Interactive NLP (iNLP). It delves deep into critical challenges like alignment, hallucination, reasoning, tool-use, embodiment, simulated society, etc. (paperΒ |Β tweet)
BUFFET - Can LLMs perform well across languages? BUFFET enables a fair eval. for few-shot NLP across languages in scale. (paperΒ |Β tweet)










