Big Data Engineer

Date: May 25, 2025

Location:

Milwaukee, WI, US, 53204-2941

Req ID: 32315

Onsite or Remote: Hybrid Position

Komatsu is an indispensable partner to the construction, mining, forestry, forklift, and industrial machinery markets, maximizing value for customers through innovative solutions. With a diverse line of products supported by our advanced IoT technologies, regional distribution channels, and a global service network, we tap into the power of data and technology to enhance safety and productivity while optimizing performance. Komatsu supports a myriad of markets, including housing, infrastructure, water, pipeline, minerals, automobile, aerospace, electronics and medical, through its many brands and subsidiaries, including TimberPro, Joy, P&H, Montabert, Modular Mining Systems, Hensley Industries, NTC, and Gigaphoton.

Job Overview

Develop innovative big-data applications based on data from the global Komatsu mining equipment fleet to meet evolving business needs. Solutions shall leverage a primarily open-source software stack, including Confluent Kafka, Snowflake, Databricks, KSQL, Kafka Streams, Scala and Python ecosystem technologies, and balance the constraints of time, budget and computing infrastructure available to the analytics platform. Utilize Azure DevOps Pipelines to build and deploy code changes quickly and efficiently.
Work within and enhance the analytics software development lifecycle, Services for analyzing and exploring data.
Champion development and operations support processes that leverage industry best-practice and that continually improve robustness and supportability of the analytics platform that is used by global JoySmart teams, customers and factory engineering and product support teams. Ensure that the analytics platform continues to meet performance needs as it scales with data volume and user base.
Lead the day-to-day operation, maintenance and monitoring of the analytics platform, Services related to collecting and ingesting data from various sources into the big data platform. This includes batch processing , API integration , database replication, real-time streaming, and data connectors.
Ensuring data quality, compliance, and security. This involves setting access controls, monitoring data usage, managing metadata, backup and disaster recovery strategies. Monitoring system health, resource utilization, and identifying bottlenecks. Alerts and logs help troubleshoot to ensure the reliability and availability of data systems.

Key Job Responsibilities

Gather and document requirements for software and system development to meet the business needs of the global analytics platform user base.
Drive core software and infrastructure development lifecycle activities: Evaluate, select, design, implement, document and version control analytics applications and infrastructure. Services for analyzing and exploring data. This includes querying data creating visualizations and running machine learning models.
Actively participate in the standardization of test-driven development processes: Develop, execute and document tests for the analytics software applications and infrastructure, automating as much as practicable.
Adapt to changes in technology, learning new skills, design patterns and programming languages as required (Confluent Kafka, Snowflake, Databricks, KSQL, Kafka Streams, Scala and Python, etc)
Provide critical review of requirements, design, implementation of own and others’ work as required to ensure quality and functionality of software meets business needs.
Participate in the development of, and advocate for, global application of common coding standards and guidelines for infrastructure applications.
Integrating the big data platform with cloud services for scalability, elasticity, and cost-effectiveness.

Qualifications/Requirements

Requires a minimum of Masters in Computer Science or IT, Information Systems, Engineering or equivalent and 7+ years of progressive development experience with data systems, analytics, and big data. Strong preference for experience with big data applications involving complex equipment and mining. Experience developing applications for processing large volumes of noisy, real-world time-series sensor data from unreliable sources is required.
Big data and Hadoop programming, Cloud environments (Azure/AWS), Agile Development methodology, excellent communication skills, test-driven development, requirements gathering, drive for results. Excellent track-record in the development of production back-end applications in Hadoop ecosystems using Scala, Python. DevOps experience highly valued. Continuous deployment.
#MTS1

Additional Information

Komatsu is an Equal Opportunity Workplace and an Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.