Guardians of AI: Mohamed Elgendy Of Kolena On How AI Leaders Are Keeping AI Safe, Ethical, Responsible, and True

The industry must conduct extensive regression tests before every deployment of an AI system to maintain functionality and prevent performance degradation as models continue to be trained on new datasets.
As AI technology rapidly advances, ensuring its responsible development and deployment has become more critical than ever. How are today’s AI leaders addressing safety, fairness, and accountability in AI systems? What practices are they implementing to maintain transparency and align AI with human values? To address these questions, we had the pleasure of interviewing Mohamed Elgendy, CEO and Co-founder of Kolena, an innovative platform aimed at enhancing AI & ML testing and validation.
Mohamed is a distinguished figure in the field of AI and machine learning (ML), with a career of demonstrated leadership in developing AI/ML platforms, managing engineering teams, and pioneering computer vision technologies at notable companies such as Rakuten, Amazon, and Twilio Inc. He is also the acclaimed author of “Deep Learning for Vision Systems,” a testament to his expertise in AI, with over 20,000 copies sold. Holding an Executive Certificate in Machine Learning from Stanford University and an MBA from Nova Southeastern University, his educational background is as impressive as his professional journey. Mohamed’s vision extends to ensuring the quality, reliability, and security of ML models and applications, driving innovation and setting standards in the AI industry. You can learn more about Kolena at www.kolena.com.
Thank you so much for your time! I know that you are a very busy person. Before we dive in, our readers would love to “get to know you” a bit better. Can you tell us a bit about your ‘backstory’ and how you got started?
Right after college, I spent about three years working in Egypt. As a biomedical engineer who loves building things, I quickly realized I wanted to be on the cutting edge of technology and innovation, so I made the decision to move to the U.S.
Since then, I’ve dedicated my career to data science and AI product development at companies like Amazon, Twilio, and Rakuten. At Kolena, we’re passionate about building AI Quality and Testing tools that make life easier for developers. Partnering closely with top AI companies and enterprises has given us a front-row seat to the industry’s rapid evolution — and the chance to take on some of its toughest challenges directly.
None of us can achieve success without some help along the way. Is there a particular person who you are grateful for, who helped get you to where you are? Can you share a story?
That would definitely be my wife, Amanda El-Dakhakhni. She’s been the absolute best support system for me throughout this journey and continues to be. Amanda is incredibly smart and accomplished, and as a great leader, she knows how to guide her team through even the toughest challenges. What’s more, as a member of her ‘team’ — our family — she’s helped me immensely in developing my own leadership and communication skills. She also keeps me accountable to my strict routine, making sure I stay on track with my health and my work. I wouldn’t be where I am today without her unwavering support.
You are a successful business leader. Which three character traits do you think were most instrumental to your success? Can you please share a story or example for each?
- Trustworthiness: Leaders earn trust in everything they do. They consistently deliver on their promises and commitments, own their mistakes, seek feedback, and work on improving. I hope that I am continually striving to earn the trust of my peers, team, customers, and investors. I have many stories that show how working to earn people’s trust always pans out. Many members of my team now, as well as my customers and investors, are folks that I have worked with before and whose trust I’ve worked hard to earn. As a result, they trust working with me once, twice, and more.
- Consistency: This is the hardest one. Things get hard and morale goes up and down. Consistently putting in the hard work despite the setbacks is what I believe drives success. I consistently start my work day at 6 am and end at 5 pm. Every day, every week, throughout the year.
- Motive Force: In science, motive force refers to the force that causes an object to move or accelerate. Leaders ignite enthusiasm and passion within their teams. They find their inner-motivator to get excited, and create an environment where dedication thrives.
Thank you for all that. Let’s now turn to the main focus of our discussion about how AI leaders are keeping AI safe and responsible. To begin, can you list three things that most excite you about the current state of the AI industry?
1-AI quality is being prioritized — Our industry is putting more focus on fairness, transparency, and accountability. People are realizing that AI can’t just be powerful — it has to be trusted. I’m especially excited about all the new tools and frameworks coming out to help test and validate AI systems, which ties directly to what we do at Kolena.
2- More capable AI models — The progress in large language models and multimodal AI — where systems can handle text, images, and audio together — is mind-blowing. These advances are opening up so many possibilities, from analyzing complex financial data to improving critical decisions. At Kolena, we’re really leaning into this space to help businesses make sure these systems work reliably and safely.
3-AI is more accessible than ever — The fact that AI tools are becoming easier to use and more available to everyone is a huge deal. It’s fueling innovation and making it possible for businesses of all sizes to tap into AI. That’s a big part of what we’re focused on — giving companies the tools to scale AI safely and responsibly.
Conversely, can you tell us three things that most concern you about the industry? What must be done to alleviate those concerns?
At Kolena, we’ve defined the 3 Pillars of AI Quality as accuracy, reliability, and privacy. Without a strong combination of all three, the risk of mistakes and mistrust increases. We also have to understand that AI quality is not a one size fits all standard. It’s widely dependent on the application and domain of what AI is being used for.
Accuracy — AI systems aren’t always as precise as they need to be, especially when dealing with complex or high-stakes tasks. Errors in predictions or outputs can lead to serious consequences, whether it’s a misdiagnosis in healthcare or a flawed financial forecast. To address this, we need rigorous testing frameworks to validate AI performance and ensure that models are trained on high-quality, representative data. We need greater focus on catching errors early and improving accuracy before AI models are deployed.
Reliability — Even accurate AI models can fail when faced with unexpected inputs or edge cases. Reliability means ensuring that AI performs consistently across different scenarios, environments, and datasets. The key here is robust validation and stress testing to identify weaknesses before they become problems in production. That’s exactly what we’re enabling with our AI testing platforms to give teams confidence in their systems.
Privacy — AI systems rely on vast amounts of data, which raises critical questions about privacy and data security. Mishandling sensitive information can erode user trust and lead to compliance issues. To tackle this, we need stronger encryption methods, privacy-preserving techniques like federated learning, and clearer data governance policies. Businesses must make privacy a top priority from day one — not just an afterthought.
As a CEO leading an AI-driven organization, how do you embed ethical principles into your company’s overall vision and long-term strategy? What specific executive-level decisions have you made to ensure your company stays ahead in developing safe, transparent, and responsible AI technologies?
At Kolena, embedding ethical principles into our vision starts with a commitment to trust, transparency, and accountability in AI. We’ve built this into our culture through ongoing education, clear governance frameworks, and proactive compliance with evolving regulations. Our products are designed to help companies develop AI that is accurate, reliable, and fair; ensuring safety and performance are priorities from day one. We’ve also invested in cutting-edge techniques for bias detection, adversarial testing, and privacy-preserving AI to keep our systems and processes aligned with the highest ethical standards, enabling businesses to innovate responsibly.
Have you ever faced a challenging ethical dilemma related to AI development or deployment? How did you navigate the situation while balancing business goals and ethical responsibility?
I think there is an understanding in our industry and among engineers that with any technology you build, you also create the opportunity for bad actors with malicious intent to find a new way to influence our society or take advantage of people. Unfortunately, it’s an unavoidable byproduct of technological innovation; but that’s why building responsible AI is such a priority for us, because we recognize that by implementing rigorous quality and safety standards, you can prevent and mitigate the misuses and major harms that come with new technology. At Kolena, we’ve integrated AI responsibility into our business and aligned our incentives with that goal, and we work continuously with AI standards bodies and others to ensure that the industry can continue to innovate without compromising our collective safety. By making AI quality and safety our goals, we’re able to forge ahead knowing that we’re contributing to the development of responsible AI and that we’re helping AI teams scale their models in a safe way.
Many people are worried about the potential for AI to harm humans. What must be done to ensure that AI stays safe?
To ensure AI stays safe and that we prevent harm to humans, we need a multi-layered approach focused on testing, transparency, and accountability. First, rigorous validation and stress-testing frameworks must be standard practice among all AI teams to catch errors, biases, and vulnerabilities before deployment. Second, AI systems need to be explainable and auditable, so developers and users can understand how decisions are made and trace outcomes. Third, privacy and security safeguards must be built into AI systems to protect sensitive data. Finally, we need clear regulations and industry standards to guide responsible development, paired with ongoing monitoring and updates to keep systems aligned with ethical principles as technology evolves.
Despite huge advances, AIs still confidently hallucinate, giving incorrect answers. In addition, AIs will produce incorrect results if they are trained on untrue or biased information. What can be done to ensure that AI produces accurate and transparent results?
To prevent hallucinations and provide more accurate results, the industry needs to make scenario-level stress-testing the standard for all model testing to ensure that AI models can perform reliably in the most important use cases. By evaluating a model’s performance on a per-case basis, we can verify that the model will repeatedly provide accurate results for its various applications.
AI models must also meet the specific standards and quality requirements for the industries they serve. It’s important to reiterate that AI quality standards are not ubiquitous or “one size fits all.” By working with the proper stakeholders and adhering to all the regulatory requirements in each industry, we can ensure AI models will reliably provide accurate information with minimal hallucinations.
When it comes to model training, data quality and data hygiene are of paramount importance. Builders need the right tools to verify that their training data is sound and that each new iteration of a model is not regressing in certain areas and “forgetting” what it learned previously.

Based on your experience and success, what are your “Five Things Needed to Keep AI Safe, Ethical, Responsible, and True”? Please share a story or an example for each.
Example screenshot of an AI model comparison on the Kolena platform against a predefined quality standard for an organization.
1. Scenario-Level Granular Testing
First of all, we need to ensure that all AI systems are tested at a fine-grained level to evaluate specific use cases and edge cases, ensuring comprehensive scenario coverage.
Scenario-level testing is critically important to the widespread viability and adoption of AI models. This is because aggregate performance metrics for model accuracy can be very misleading, even if their scores are really impressive (like 93% or 95% accurate). For example, the screenshot above shows a comparison between two AI systems for autonomous drive-throughs (food ordering systems). In the aggregate, the results show that System Pipeline B has improved over Pipeline A. But, at a granular level, you can see that the system has regressed for a specific customer (Panda Express, for example). System B also regresses — when the conversation (order) is longer, it starts to hallucinate. This level of granularity allows both engineering and product teams to pinpoint where exactly the regressions are happening and to make the best decision to deploy the best performing system for their customers.
2. Statistically Balanced Test Cases
We need to make it an industry standard to design test cases that reflect real-world data distributions to eliminate bias and enhance model reliability and fairness. Biases in our training data leads to biases in the AI system’s decision making. To test against these biases, we need to consistently create sets of testing scenarios for each case and ensure that they are represented equally in our test data to detect inherent bias.
3. Thorough Regression Testing
The industry must conduct extensive regression tests before every deployment of an AI system to maintain functionality and prevent performance degradation as models continue to be trained on new datasets.
Every time you train a model, you are feeding it with new data to try and teach it something new. However, every time a model undergoes training and learns something new, it de-prioritizes older information and “unlearns” some of the things you taught it in the past. That is called regression, and it’s crucial to address regression if we want to continuously improve AI model performance and the scope of each model’s capabilities.
4. End-to-End Pipeline Assurance
We also need to make it an industry standard to validate entire AI pipelines, from data ingestion to output delivery, in order to ensure seamless integration and system-wide robustness. Advanced AI systems do not simply consist of one model; they are complex multi-step pipelines that need to be tested both individually and as a whole to make sure the end system performs as expected. Examples of such advanced AI pipelines would be retrieval-augmented generation (RAG) systems and sophisticated AI agents. These systems are composed of components that provide retrieval + generation + guardrails and usually even more. Stitching together the highest performing system at the component level does not guarantee the highest performing end-to-end system; we need to be applying rigorous testing practices to each system as a whole.
5. Transparent Quality Standards
Our industry needs to ensure that stakeholders are aligned by defining and reporting clear test coverage, performance metrics, and acceptance thresholds. Aligning product, engineering, and leadership on what “quality” means is key to building reliable systems. At Kolena, we define quality standards as test coverage plus evaluation metrics. Test coverage asks: what are the scenarios that we’re testing against? These scenarios could be how the system performs for each type of customer data; multi-step requests vs single requests; types of request, and many more. Evaluation metrics assess the quality metrics that we care about for our product. Examples of this would include accuracy, completeness, adherence to company policy, PII leakage, and more.
Looking ahead, what changes do you hope to see in industry-wide AI governance over the next decade?
I would like to see more industry-wide governance bodies and groups comprised of industry leaders helping to make collective decisions on the quality standards we want to institute and adhere to in the development of AI models. Although there’s been a substantial amount of progress in AI, our industry is still in its early stages, and we have more work to do to ensure we’re all aligned on how we will develop AI responsibly.
I am also hoping that the government will take initiative — whether it be the California state legislature or the federal government — and pass some comprehensive AI legislation that offers protections for users and developers against the misuses of AI models from malicious actors. To develop safe and responsible AI systems, it is critical for us to have a clear legal framework that promotes these principles while still allowing Silicon Valley to innovate and lead on AI.
What do you think will be the biggest challenge for AI over the next decade, and how should the industry prepare?
Current ML testing approaches are time consuming and not reliable. There’s a lot of guessing or trial and error approaches, which is tedious and produces results that are mediocre at best.
You are a person of great influence. If you could inspire a movement that would bring the most good to the most people, what would that be? You never know what your idea can trigger. 🙂
This is a great question, because we do see Kolena as the leader of a global movement toward AI quality that’s already well under way. In the same way that engineers and users all over the world came together decades ago to elevate software from something that was novel and interesting but not yet trustworthy, AI builders, researchers, users and regulators are coming together today to establish best practices, benchmarks and standards that will make AI and ML models trustworthy, as well.
Kolena is helping to drive this movement. We’re building AI / ML model testing and validation solutions that help developers build safe, reliable, and fair AI systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize models in the precise scenarios those models will be entrusted to handle in the real world.
We’re also bringing successful software concepts like unit testing and regression analysis into the AI development framework, transforming AI / ML testing from experimentation-based into an engineering discipline that can be trusted and automated.
Just as importantly, we’re helping to bring the global community together around the drive for trustworthy AI. Last June, we held our first annual AI Quality Conference (or AIQCON), where over 10,000 AI builders, users, investors, regulators and journalists came together to discuss what needs to be done to help all of us enjoy the benefits of AI by ensuring that AI and ML models do what they’re designed to do. We’re currently planning our second AIQCON for later this year.
How can our readers follow your work online?
Authority Magazine readers can follow me and Kolena on LinkedIn and on our website. We frequently post informative and educational content about AI on our various channels, so that’s a good way to stay up to date on our newest products and the latest hot-topic concepts in the industry. You can also learn more about AIQCON at www.aiqualityconference.com .
Thank you so much for joining us. This was very inspirational.
Guardians of AI: Mohamed Elgendy Of Kolena On How AI Leaders Are Keeping AI Safe, Ethical… was originally published in Authority Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.