Sr Data Scientist (Copy) (USA)

Share this job

Sr Data Scientist (Copy)

USA

We’re a mission-driven team working within government, but not like government—agile, innovative, and focused on real outcomes. Serving the California Department of Health Care Services, we’re helping transform behavioral health care for one-third of Californians—tackling issues like suicide, addiction, and homelessness. We value transparency, accountability, and self-directed teams, and we’re passionate about building modern, responsive digital solutions that drive lasting impact. Join us to help shape how behavioral health outcomes are measured, explore the role of AI in policy, and modernize our data science tech stack—all while doing meaningful work with people who care deeply about what they do and enjoy doing it together.

Location: Remote (United States). Must be able to work PST hours.

We are seeking a highly skilled Senior Data Scientist with specialized expertise in Identity Resolution or Entity Resolution/Matching. The ideal candidate will leverage advanced data science techniques, including machine learning, probabilistic matching, and graph-based algorithms, to resolve and link entities across disparate data sources, ensuring high accuracy in identity and entity disambiguation. This role will involve working closely with cross-functional teams to drive insights, improve data quality, and support business objectives through robust entity resolution solutions.

Key Responsibilities

Entity Resolution & Matching: Design, develop, and implement scalable identity and entity resolution algorithms to deduplicate, link, and disambiguate records across structured and unstructured datasets.
Data Analysis & Modeling: Apply statistical and machine learning techniques (e.g., clustering, classification, natural language processing) to analyze complex datasets and improve matching accuracy.
Feature Engineering: Create and optimize features for entity matching, such as name standardization, address parsing, and fuzzy matching techniques.
Graph-Based Solutions: Utilize graph databases and network analysis to model relationships between entities and enhance resolution processes.
Data Quality & Validation: Assess and improve data quality by identifying inconsistencies, missing values, or duplicates, and validate matching results against ground truth or external benchmarks.
Collaboration: Partner with data engineers, product managers, and domain experts to integrate entity resolution pipelines into production systems and align solutions with business needs.
Performance Optimization: Optimize algorithms and workflows for scalability, speed, and accuracy, ensuring they perform efficiently on large-scale datasets.
Documentation & Reporting: Document methodologies, present findings, and provide actionable insights to stakeholders through clear visualizations and reports.

Required Qualifications

Education: Master’s or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, or a related quantitative field.

Experience:

10+ years of experience in data science or a related role.
3+ years of hands-on experience with identity resolution, entity resolution, or record linkage projects.

Technical Skills:

Proficiency in Python or R for data analysis and modeling (libraries: pandas, scikit-learn, NumPy, etc.).
Experience with machine learning frameworks (e.g., TensorFlow, PyTorch) and probabilistic matching techniques.
Familiarity with fuzzy matching tools (e.g., dedupe, fuzzywuzzy) and string similarity metrics (e.g., Levenshtein, Jaro-Winkler).
Knowledge of graph databases (e.g., Neo4j) and network analysis for relationship modeling.
Experience with SQL and handling large-scale datasets in relational or No

Apply for this job