Increasing the efficiency of Meta’s global data centers
Meta relies on in its data centers to deliver a smooth user experience for billions of users across Facebook, Instagram, and WhatsApp. As part of our mission to improve their operations, my team was tasked with making the repair process within these data centers more efficient.
6 Software Engineers, 1 Product Manager, 1 Lead Researcher, and 3 Product Designers
Prototypes, Wireframes, User flows, User research, Usability testing, Workshops, and Design Strategy
Nov 2022 - April 2023
Researching the problem
I supported the lead researcher by conducting user interviews with data center engineers and repair technicians as well as observing their repair process to identify pain points and better understand their current workflow.
Repair teams often solved the same problems without knowledge of each other's solutions.
Communication between teams was often informal and not well-documented, leading to a waste of time and resources.
There was no central repository of knowledge for how to repair assets, making it difficult for new members to learn.
Teams used multiple tools to document work, making it difficult for new members to get a full story of how something was repaired.
Lack of Standardization
There was no standardized process for repairing assets, leading to inconsistency across teams.
Teams used different methods, tools, and terminology for the same tasks.
Processes often changed based on new hardware, technology, and industry best practices.
Documentation was quickly outdated and often ignored, leading to longer repair times and reduced efficiency.
Ideation & Designs
We held brainstorming workshops with our team and a few of the subject-matter-experts to generate ideas to solve the problems we identified from user research. I also performed a competitive analysis of similar tools, as well as heuristic evaluations on Meta’s existing repair tools. I then converted some of the team’s ideas into wireframes and low-fidelity prototypes that we tested to get quick user feedback and iterate on the designs.
Grouping Similar Repairs
We decided to fundamentally change how repairs were done, by having technicians no longer work on just one repair at a time, but rather groups of similar repairs, which we labeled “Issues”.
This would allow teams to diagnose multiple repairs at once, helping avoid the unnecessary duplication of effort and make their workflow much more scalable and efficient.
The grouped repairs enable teams to recognize trends and patterns, allowing for a more high-level perspective. This approach also helps technicians prioritize the most crucial issues to fix.
Grouping similar repairs addressed the problem of duplicate effort by enabling technicians to diagnose multiple repairs at once.
Standardized repair plans provided technicians a single source of truth that they could rely on, and act as a guide for new team members to learn from.
Standardized Repair Plans
We came up with the concept of a “Repair plan" that would provide step-by-step instructions to guide technicians through a repair. To help teams trust the effectiveness of plans, subject-matter-experts would be responsible for creating them, and each plan would have metrics to track its success rate.
Repair plans would standardize repair workflows, as well as empower new team members to easily learn from past repairs. These plans would also be flexible enough to evolve and support new workflows, a major gap in the current process.
22% reduction in repair time
31% increase in repair accuracy
The engineering team built an MVP of this product that we used to test and compare with the existing repair process. Overall, our designs made the repair workflow more scalable, efficient, and consistent across teams, ensuring that Meta’s data centers can run at peak performance, providing optimal service to billions of users across the world.