Ensure the reliability and normal operation of multiple core systems related to Viking Team's Big data and online services, focusing on system capacity planning and stability assurance.
Enhance system visibility by monitoring the availability and performance metrics of system components, helping development teams quickly locate faults, and especially ensuring stability in critical links such as AI search/vector databases.
Improve the reliability, scalability, and performance optimization of services to ensure core system SLA achievement.
Participate in the design and implementation of the automation platform, ensuring rapid iteration and efficient operation and maintenance of large-scale online Viking clusters and AI search-related clusters.