Artificial Intelligence (AI) has become increasingly pervasive in driving internal and external innovation in enterprises — especially so with the proliferation of NVIDIA DGX-1 servers. IT teams in enterprise data centers often rely on Red Hat Enterprise Linux (RHEL) leveraging its feature richness, stability, and manageability. Taking advantage of that familiarity and helping to enable AI innovation, Red Hat and NVIDIA announced the support of RHEL on DGX-1 servers. This allows seamless integration of DGX-1 servers into environments in which infrastructure and expertise is tooled towards supporting RHEL. RHEL provides rich management, security, support certification, and enables AI that is critical to both AI practitioners and those that support them.
Enterprise data centers that have a large deployment or are built around RHEL often also have common management infrastructure, tools, and ecosystems of software supporting the deployment, management, and monitoring of systems. This is a significant advantage for IT teams deploying DGX-1 servers to support data scientists and researchers or in mission critical infrastructure that uses AI. The large ecosystem to support RHEL on DGX-1 servers.
Red Hat offers infrastructure deployment/management software, Red Hat Satellite, task automation using Red Hat Ansible, providing more complex cloud like environments with Red Hat CloudForms and monitoring with Red Hat Insights. Additionally numerous open source and third-party tools are also available for deployment/management, task automation, monitoring, etc. RHEL benefits from the availability of the Red Hat Package Management (RPM) system, also used by many other Linux distributions, making it easy to install, manage, and update software from a wide variety of sources. Most importantly DGX-1 server deployment with RHEL benefits from the expertise of IT teams managing the broader data center.
Security is an increasingly critical consideration in data centers with a focus on system and data integrity and especially critical for DGX-1 servers in which tremendous amounts of data will be used for AI. Support for RHEL on DGX-1 servers allows seamless integration into existing data center security best practices, policies, and infrastructure including regulatory compliance. Enterprises and the IT teams that support them need a solution compatible with their security architecture. RHEL on DGX-1 servers enables use of SELinux (Security-Enhanced Linux), a Linux kernel security module that offers policy-based (instead of user-based) management, and secure compartmentalization of applications and processes, helping to isolate and limit the exposure of a compromised system.
RHEL provides continuous integrated security automating security lifecycle and providing a large knowledge base/training and tools to meet any security demands (refer to the Guide to Continuous IT Security for more information). Even beyond standard security practices, Red Hat supports several government standards with more information found here.
RHEL has been tested and qualified with DGX-1 servers and NVIDIA optimized deep learning framework containers to help ensure end-to-end compatibility. Red Hat provides L1 and L2 support for typical RHEL operating system issues, and NVIDIA provides support for NVIDIA-provided software. Organizations can leverage RHEL in combination with their DGX-1 servers, which provides the same access to Red Hat Customer Portal and Red Hat Customer Service page used for other RHEL deployments in the data center. The combination of Red Hat and NVIDIA ensures enterprise-level support with online support, chat, email and phone support available.
Red Hat Hardware Certification Program ensures compatibility of RHEL and the DGX-1 server being certified. The certification process consists of a suite of tests that must be completed by the hardware vendor to validate each hardware component of the system. In addition to this being a requirement for Red Hat to support RHEL on the DGX-1, the certification provides end users and IT teams greater confidence that it “just works” – functionality, interoperability, etc. Certification helps ensure solid foundation for any solution built on the DGX-1 server running RHEL.
RHEL provides flexibility on how applications execute whether it is bare-metal or via containers. RHEL offers support for bare-metal execution of workloads — including popular AI frameworks such as Pytorch and TensorFlow, CUDA Toolkit, RAPIDs libraries, and much more. For those that prefer to use containers NVIDIA optimized frameworks, HPC applications and analytics containers are available via ngc.nvidia.com. RHEL provides support for Docker containers and Kubernetes which helps take the guesswork out of creating optimized stacks. It enables orchestration and scheduling of workloads and flexibility of deploying in the cloud, on the DGX-1 server, or at the edge. The availability of this infrastructure allows organizations to embrace DevOps, maximize resources, and greatly reduce time to deployment.
RHEL provides rich management, security, support, certification and enables AI critical to both AI practitioners and those that support them. RHEL support and certification for DGX-1 servers enables enterprise customer to seamlessly integrate DGX-1 servers into their data centers without additional overhead. Making DGX-1 servers available in data centers to help maximize AI innovation in the enterprise.
For More Information
- Red Hat, NVIDIA Align on Open Source Solutions to Fuel Emerging Workloads
- AI Developer Productivity Meets Enterprise IT Manageability Blog
- NVIDIA DGX-1 Essential Instrument of AI Research
- DGX-1 Server Data Sheet
- Delivering Manageability With Red Hat on NVIDIA DGX-1 Solutions Brief
- DGX-1 Red Hat Certification
By Darrin Johnson
Global Director of Technical Marketing for Enterprise and Deep Learning Institute (DLI) Certified Instructor.