亚洲影院在线观看,婷婷在线网,亚洲字幕,国产福利在线观看精品,青青福利,亚洲午夜久久久精品影院视色,日本久久伊人

APP下載
機(jī)會在手,求職信息實(shí)時掌握
    Alternate Text
    APP下載
    Alternate Text
    微信公眾號
    Alternate Text
    小程序
當(dāng)前位置:首頁> 列表 >職位詳情
Infrastructure Engineer
100000-150000元 香港 8年以上 碩士
  • 補(bǔ)充醫(yī)療保險
  • 創(chuàng)業(yè)公司
  • 強(qiáng)積金
Video Rebirth Limited 2025-02-06 21:08:33 102人關(guān)注
職位描述
該職位還未進(jìn)行加V認(rèn)證,請仔細(xì)了解后再進(jìn)行投遞!
Position Overview We are seeking an experienced Infrastructure Engineer to architect and manage our AI computing infrastructure. The ideal candidate will have extensive experience in building and scaling ML infrastructure, with particular emphasis on distributed training systems and GPU cluster management. Key Responsibilities Design and implement high-performance computing infrastructure for large-scale AI model training Manage and optimize GPU clusters for distributed training workloads Build and maintain container orchestration systems for ML workflows Implement efficient resource allocation and scheduling systems Design and maintain monitoring and alerting systems for compute infrastructure Optimize infrastructure costs while maintaining performance Collaborate with ML teams to support their computing needs Ensure system reliability, security, and scalability Required Qualifications Master's degree in Computer Science, Systems Engineering, or related field 8+ years of experience in infrastructure engineering, with focus on ML/AI infrastructure Strong experience with: GPU cluster management and optimization Kubernetes and container orchestration Linux system administration Infrastructure as Code (IaC) Proven track record in building large-scale computing systems Experience with major cloud providers (AWS/GCP/Azure or Alibaba Cloud/Tencent Cloud etc) Preferred Qualifications Experience with ML infrastructure at major tech companies Knowledge of distributed training systems (PyTorch DDP, Horovod) Familiarity with ML frameworks and their infrastructure requirements Experience with high-performance networking (InfiniBand, RDMA) Background in performance optimization and troubleshooting Understanding of ML workload characteristics Bilingual proficiency (English/Chinese) Technical Skills Computing Infrastructure GPU Clusters: NVIDIA DGX, GPU management tools Distributed Systems: Slurm, Kubernetes ML Platforms: Kubeflow, Ray Job Scheduling: YARN, Slurm Cloud & Networking Cloud Platforms: International: AWS, GCP, Azure China: Alibaba Cloud, Tencent Cloud Networking: InfiniBand, RDMA, TCP/IP optimization Load Balancing: HAProxy, NGINX Infrastructure Management Container Technologies: Docker, Kubernetes, Singularity IaC: Terraform, Ansible, CloudFormation CI/CD: Jenkins, GitLab CI Monitoring: Prometheus, Grafana, ELK Stack Development Languages: Python, Go, Shell scripting Version Control: Git Documentation: Markdown, Confluence What We Offer Opportunity to build cutting-edge AI infrastructure Competitive salary and equity package Access to latest hardware and technologies Professional development opportunities Comprehensive health benefits Learning and conference budget Location ?Hong Kong (on-site, Hong Kong Science and Technology Park) Expected Impact Design and implement next-generation AI computing infrastructure Optimize resource utilization and cost efficiency Improve training speed and efficiency for AI models Build scalable and reliable systems Projects You'll Work On Building automated GPU cluster management systems Implementing efficient resource scheduling for ML workloads Optimizing distributed training infrastructure Setting up monitoring and observability systems Designing disaster recovery and backup solutions
聯(lián)系方式
注:聯(lián)系我時,請說是在包頭人才網(wǎng)上看到的。
工作地點(diǎn)
地址:香港香港香港沙田區(qū)香港科學(xué)園10W棟317-318
求職提示:用人單位發(fā)布虛假招聘信息,或以任何名義向求職者收取財物(如體檢費(fèi)、置裝費(fèi)、押金、服裝費(fèi)、培訓(xùn)費(fèi)、身份證、畢業(yè)證等),均涉嫌違法,請求職者務(wù)必提高警惕。
top
投遞簡歷
馬上投遞
更多優(yōu)質(zhì)崗位等你來挑選   加入包頭人才網(wǎng),發(fā)現(xiàn)更好的自己
投遞簡歷
馬上投遞
提示
該職位僅支持官方網(wǎng)站投遞
關(guān)閉 去投遞
會員中心 提示:訂單支付,立即生效
天數(shù): 0
共計: 0
支付方式:
微信支付
支付寶支付
確認(rèn) 取消