Network Production Engineer - Network.AI

Network Production Engineer - Network.AI
Location pin icon
Menlo Park, CA
You will be joining the PE Network AI team that is responsible for the end-to-end health (performance and reliability) of Meta's backend datacenter networks that support our GPU based AI Training Clusters. You will build tools and use automation to efficiency scale how we mitigate real-time impact to the network, identify and investigate long-term trends into performance and risks in our data center networks, and drive innovative solutions to monitor and improve Meta's current and future DC network products. AI is essential in driving more relevant content recommendations and ads, enhancements in engagement, and improving user experiences. Network production engineer supporting the Network AI PE team is pivotal in ensuring that the backend DC network is robust, efficient, and capable of supporting Meta's AI training clusters effectively. This role is crucial for driving AI innovations and enhancements that impact various aspects of Meta's operations and services. Engineers that typically thrive in this role are hybrid software and network engineers with experience working with systems, how they fail, and how we can increase their reliability. You have the opportunity to dig into interesting challenges in the networking and software domains, at a scale that offers new challenges on a daily basis.
Network Production Engineer - Network.AI Responsibilities
  • Write and review code, develop documentation and capacity plans, and debug the hardest problems, live, on some of the largest and most complex networks and systems in the world
  • Participate in a weekly on-call rotation and be an escalation contact for service incidents
  • Perform deep dives on complex technical issues across networks, ranging from automated tooling to hardware failures and network issues
  • Analyze data to diagnose and identify root causes to network issues
  • Define, develop, and optimize automated network monitoring systems to mitigate and remediate network events
  • Proactively find gaps that impact multiple teams, come up with the execution plan, and drive the project directly and through influence of other teams
  • Contribute to team growth and development through peer mentorship
Minimum Qualifications
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 4+ years experience coding in higher-level languages (e.g., Python, C++, Go, etc.)
  • 5+ years experience understanding and mitigating network hardware and topology failures
  • Experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems
  • Experience learning software, frameworks and APIs
  • Experience developing and understanding network device configuration for at least one vendor (Juniper, Cisco, Arista, Brocade, etc.)
  • Knowledge in routing and switching - hardware design and knowledge of forwarding and data planes
  • Expert knowledge of data center networking concepts (routing, switching, etc.).
Preferred Qualifications
  • BS or MS in Computer Science, Computer Engineering, or Network Engineering
  • Expert knowledge of TCP/IP and IPv6
  • Experience working in a multi-vendor network environment.
  • Experience with developing distributed systems and operating them at scale
  • Experience with automation frameworks and tools such as Ansible, Puppet, or Chef
  • Experience with operating, designing, implementing and troubleshooting servers and networking components.
For those who live in or expect to work from California if hired for this position, please click here for additional information.
Locations
About Meta
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.

$147,000/year to $208,000/year + bonus + equity + benefits

Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.


Equal Employment Opportunity and Affirmative Action
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here.

Meta is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support. If you need support, please reach out to accommodations-ext@fb.com.
Related Job Openings