Passionate about solving problems

Hi, I'm Arun Singh

Associate Data Scientist

Building AI-driven solutions using NLP, LLMs, and Generative AI, with expertise in scalable data pipelines.

About Me

Data Science professional skilled in Python, SQL, MongoDB, and machine learning, with experience in automation, data pipelines, and cloud deployments. Passionate about building AI-driven solutions using NLP, LLMs, and Generative AI. Known for quickly learning new technologies and delivering impactful, production-ready solutions across domains like healthcare and marketing. Focused on solving real-world problems through data-driven insights and intelligent systems.

Career Highlights

  • Designed scalable data pipelines using Python, Docker, MongoDB, and AWS for high-volume datasets.
  • Developed NLP modules using spaCy and Transformer models for sentiment and keyword analysis.
  • Automated complete API-based data ingestion workflows reducing manual effort by ~80%.
  • Prototyped LLM-based tools using OpenAI APIs and LangChain, speeding up insights generation by ~50%.
Profile Picture

Arun Singh

Associate Data Scientist

3+
Years Exp
20+
Projects
100%
Commitment

Experience

March 2025 - Present 1 Year+

Associate Data Scientist

Full Time
Atisfy Private Limited India

Designed and deployed scalable data pipelines, NLP modules, and LLM-based tools, accelerating insights generation and improving model deployment efficiency.

June 2023 – Feb 2025 1 Year 9 Months

Associate Software Engineer

Full Time
Sahrudaya Health Care Pvt. Ltd. (Medicover)

Led development of dashboards, HIMS reports, and system integrations, driving operational efficiency across multiple units and automating manual workflows.

Feb 2023 – May 2023 3 Months

Graduate Engineer Trainee

Internship
L&T Mindtree

Gained foundational experience in Python, Shell scripting, and AWS cloud services while automating enterprise tasks.

Personal Projects

A selection of my recent data science, ML engineering, and web dev projects.

Data Insights

Interactive visualizations demonstrating model performance metrics and data distributions.

Model Performance (ROC Curve)

Feature Importance (SHAP)

Technical Arsenal

A comprehensive overview of the tools and technologies I use to build scalable AI/ML pipelines and data-driven solutions.

Programming

Python Python
95%
SQL SQL
90%
JavaScript JavaScript
70%

Core Data Science

NumPy NumPy
95%
Pandas Pandas
95%
Scipy Scipy
95%

Data Engineering & Pipelines

PostgreSQL / MySQL PostgreSQL / MySQL
85%
MongoDB MongoDB
85%
Apache Airflow Apache Airflow
90%

Machine Learning & Deep Learning

Scikit-Learn Scikit-Learn
90%
XGBoost
85%
TensorFlow / PyTorch TensorFlow / PyTorch
85%
MLflow MLflow
80%

NLP & Generative AI

Transformers Transformers
90%
Spacy Spacy
50%
LangChain LangChain
40%

Backend & APIs

FastAPI FastAPI
80%
Flask Flask
85%
Django Django
70%
REST APIs REST APIs
90%

Cloud & DevOps

AWS (EC2, S3, RDS) AWS (EC2, S3, RDS)
85%
Docker Docker
80%
CI/CD (Bitbucket) CI/CD (Bitbucket)
85%
Git Git
90%

Visualization & BI

Matplotlib Matplotlib
90%
Seaborn Seaborn
90%
Plotly Plotly
80%
Tableau Tableau
80%
Power BI Power BI
80%

Web Fundamentals

HTML HTML
75%
CSS CSS
70%
JavaScript (DOM & AJAX) JavaScript (DOM & AJAX)
65%
PHP PHP
65%
React (Basics) React (Basics)
50%

Writings & Insights

Thoughts on machine learning, architecture, and data engineering.

Let's Build Something
Intelligent

I help solve complex problems, build robust ML pipelines, and design generative AI architectures, turning data into actionable insights and end-to-end solutions.