10 Github Repo For Data Scientist Data Analysts Medium

Kenji Sato
-
10 github repo for data scientist data analysts medium

100+ Free Resources On Generative AI for Data Scientists Awesome Generative AI Data Scientist The Future is using AI and ML Together 🚀🚀 100+ Free Resources On Generative AI for Data Scientists A curated list of 100+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI Data Science applications with Large Language Models (LLMs) and deploying LLMs and Generative AI/ML with Cloud-based solutions. Please ⭐ us on GitHub (it takes 2 seconds and means a lot). Contributions are welcome!

Please submit a pull request or open an issue if you have suggestions for new resources or improvements to existing ones. Thanks for your support! Awesome Real-World AI Use Cases Project Description Links 🚀🚀 AI-Powered Data Science Team In Python An AI-powered data science team of copilots that uses agents to help you perform common data science tasks 10X faster. Apps | Examples | GitHub 🚀 Awesome LLM Apps LLM RAG AI Apps with Step-By-Step Tutorials. GitHub AI Hedge Fund Proof of concept for an AI-powered hedge fund.

GitHub AI Financial Agent A financial agent for investment research. GitHub Structured Report Generation (LangGraph) How to build an agent that can orchestrate the end-to-end process of report planning, web research, and writing. Produces reports of varying and easily configurable formats. Video | Blog | Code Uber QueryGPT Uber's QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English (Natural Language) questions, enhancing productivity for engineers, operations managers, and data scientists.

Blog Nir Diamant GenAI Agents Hub Tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. A comprehensive guide for building intelligent, interactive AI systems. GitHub AI Engineering Hub Real-world AI agent applications, LLM and RAG tutorials, with examples to implement. GitHub StockChat An open-source alternative to Perplexity Finance. GitHub Curated Python AI, Data Science, and ML Compilations Data Science And AI Agents Project Description Links Qwen-Agent A framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen.

It also comes with example applications such as Browser Assistant, Code Interpreter, and Custom Assistant. Documentation | Examples | GitHub AI Frameworks (Build Your Own) AI Frameworks (Drag and Drop) Project Description Links LangGraph Studio IDE that enables visualization, interaction, and debugging of complex agentic applications. GitHub Langflow A low-code tool that makes building powerful AI agents and workflows that can use any API, model, or database easier. Documentation | GitHub Pyspur Graph-Based Editor for LLM Workflows. Documentation | GitHub LangWatch Monitor, Evaluate & Optimize your LLM performance with 1-click.

Drag and drop interface for LLMOps platform. Documentation | GitHub AutoGen Studio A low-code interface to rapidly prototype AI agents, enhance them with tools, compose them into teams, and interact with them to accomplish tasks. Built on AutoGen AgentChat. Documentation n8n Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. Documentation | GitHub Project Description Links LangChain A framework for developing applications powered by large language models (LLMs).

Documentation | GitHub | Cookbook LangGraph A library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Documentation | Tutorials LangSmith A platform for building production-grade LLM applications. It allows you to closely monitor and evaluate your application, so you can quickly and confidently ship. Documentation | GitHub Project Description Links LangGraph Prebuilt Agents Prebuilt agents for LangGraph (includes 3rd Party LangGraph extensions). Documentation AI Data Science Team An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

GitHub LangMem LangMem provides tooling to extract important information from conversations, optimize agent behavior through prompt refinement, and maintain long-term memory. GitHub LangGraph Supervisor A Python library for creating hierarchical multi-agent systems using LangGraph. GitHub Open Deep Research An open-source assistant that automates research and produces customizable reports on any topic. GitHub LangGraph Reflection This prebuilt graph is an agent that uses a reflection-style architecture to check and improve an initial agent's output. GitHub LangGraph Big Tool Create LangGraph agents that can access large numbers of tools.

GitHub LangGraph CodeAct This library implements the CodeAct architecture in LangGraph. This architecture is used by Manus.im. GitHub LangGraph Swarm Create swarm-style multi-agent systems using LangGraph. Agents dynamically hand off control to one another based on their specializations. GitHub LangChain MCP Adapters Provides a lightweight wrapper that makes Anthropic Model Context Protocol (MCP) tools compatible with LangChain and LangGraph. GitHub Project Description Links Huggingface An open-source platform for machine learning (ML) and artificial intelligence (AI) tools and models.

Documentation Transformers Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Documentation Tokenizers Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. Documentation | GitHub Sentence Transformers Sentence Transformers (a.k.a. SBERT) is the go-to Python module for accessing, using, and training state-of-the-art text and image embedding models. Documentation smolagents The simplest framework out there to build powerful agents. Documentation | GitHub Project Description Links ChromaDB The fastest way to build Python or JavaScript LLM apps with memory!

GitHub FAISS A library for efficient similarity search and clustering of dense vectors. GitHub Qdrant High-Performance Vector Search at Scale. Website Pinecone The official Pinecone Python SDK. GitHub Milvus Milvus is an open-source vector database built to power embedding similarity search and AI applications. GitHub SQLite Vec A vector search SQLite extension that runs anywhere! GitHub Project Description Links PyTorch PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.

Website TensorFlow TensorFlow is an open-source machine learning library developed by Google. Website JAX Google’s library for high-performance computing and automatic differentiation. GitHub tinygrad A minimalistic deep learning library with a focus on simplicity and educational use, created by George Hotz. GitHub micrograd A simple, lightweight autograd engine for educational purposes, created by Andrej Karpathy. GitHub Project Description Links Transformers Hugging Face Transformers is a popular library for Natural Language Processing (NLP) tasks, including fine-tuning large language models.

Documentation Unsloth Finetune Llama 3.2, Mistral, Phi-3.5 & Gemma 2-5x faster with 80% less memory! GitHub LitGPT 20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale. GitHub AutoTrain No code fine-tuning of LLMs and other machine learning tasks. GitHub Testing and Monitoring (Observability) Web Parsing (HTML) and Web Crawling Project Description Links Gitingest Turn any Git repository into a simple text ingest of its codebase. This is useful for feeding a codebase into any LLM.

GitHub Crawl4AI Open-source, blazing-fast, AI-ready web crawling tailored for LLMs, AI agents, and data pipelines. Documentation | GitHub GPT Crawler Crawl a site to generate knowledge files to create your own custom GPT from a URL. Documentation | GitHub ScrapeGraphAI A web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Documentation | GitHub Scrapling 🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python.

GitHub Firecrawl 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, and extract with a single API. Documentation | GitHub Agents and Tools (Build Your Own) Agents and Tools (Prebuilt) Project Description Links Mem0 Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users. Documentation | GitHub Memary Open Source Memory Layer For Autonomous Agents. GitHub Memobase 1st User Profile-Based Memory for GenAI Apps.

Documentation | GitHub Project Description Links LangWatch Monitor, Evaluate & Optimize your LLM performance with 1-click. Drag and drop interface for LLMOps platform. Documentation | GitHub MLflow MLflow Tracing for LLM Observability. Documentation Agenta Open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place. Documentation LLMOps Best practices designed to support your LLMOps initiatives. GitHub Helicone Open-source LLM observability platform for developers to monitor, debug, and improve production-ready applications. Documentation | GitHub Project Description Links Browser-Use Make websites accessible for AI agents.

Documentation | GitHub WebUI Built on Gradio and supports most of browser-use functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. GitHub WebRover WebRover is an AI-powered web agent that combines autonomous browsing with advanced research capabilities. GitHub Project Description Links Microsoft PromptWizard Task-Aware Prompt Optimization Framework. GitHub Promptify A library for prompt engineering that simplifies NLP tasks (e.g., NER, classification) using LLMs like GPT. GitHub AutoPrompt A framework for prompt tuning using Intent-based Prompt Calibration.

GitHub Project Description Links AI Suite Simple, unified interface to multiple Generative AI providers. GitHub AdalFlow The library to build & auto-optimize LLM applications, from Chatbot, RAG, to Agent by SylphAI. GitHub dspy DSPy: The framework for programming—not prompting—foundation models. GitHub LiteLLM Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format. GitHub AI Agent Service Toolkit Full toolkit for running an AI agent service built with LangGraph, FastAPI, and Streamlit.

App | GitHub Microsoft Tiny Troupe LLM-powered multiagent persona simulation for imagination enhancement and business insights. GitHub Distributed Llama Connect home devices into a powerful cluster to accelerate LLM inference. GitHub Curated AI, ML, Data Science Lists Project Description Links LLM tools for R An ongoing roundup of useful developments in the LLM/genAI space, with a specific focus on R. Website Project Description Links ellmer Makes it easy to use large language models (LLM) from R.

It supports a wide variety of LLM providers and implements a rich set of features including streaming outputs, tool/function calling, structured data extraction, and more. Website hellmer Enables sequential and parallel batch processing for chat models supported by ellmer. Documentation chores Provides a library of ergonomic LLM assistants designed to help you complete repetitive, hard-to-automate tasks quickly. Documentation ggpal LLM assistant specifically for ggplot2.

GitHub gander A high-performance and low-friction chat experience for data scientists in RStudio and Positron–sort of like completions with Copilot, but it knows how to talk to the objects in your R environment. Documentation Project Description Links mall Run multiple LLM predictions against a data frame. The predictions are processed row-wise over a specified column. Website lang Use an LLM to translate a function’s help documentation on-the-fly. Website chattr An interface to LLMs (Large Language Models).

Website Other Popular Interfaces to LLM Models in R Project Description Links chatgpt Interface with models from OpenAI to get assistance while coding. GitHub groqR Brings GroqCloud’s lightning-fast LPU (Language Processing Unit) technology directly to your R workflow. Website gptstudio Easily incorporate use of large language models (LLMs) into their project workflows. Website llmR R interface to various Large Language Models (LLMs) such as OpenAI’s GPT models, Azure’s language models, Google’s Gemini models, or custom local servers.

GitHub tidychatmodels A simple interface to chat with your favorite AI chatbot from R, inspired by tidymodels where you can easily swap out any ML model for another one but keep the other parts of the workflow the same. Website tidyllm Access various large language model APIs, including Anthropic Claude, OpenAI, Google Gemini, Perplexity, Groq, Mistral, and local models via Ollama or OpenAI-compatible APIs. Website gemini.R R package to use Google’s Gemini via API on R. Website PerplexR Intuitive interface for leveraging the capabilities of the Perplexity API Pro subscription.

GitHub ollama-r The easiest way to integrate R with Ollama, which lets you run language models locally on your own machine. Website rollama Wraps the Ollama API, which allows you to run different LLMs locally and create an experience similar to ChatGPT/OpenAI’s API. Website Project Description Links Ragnar Helps implement Retrieval-Augmented Generation (RAG) workflows.

Website LLM Deployment (Cloud Services) Service Description Links AWS Bedrock Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. AWS Bedrock Microsoft Azure AI Services Azure AI services help developers and organizations rapidly create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.

Microsoft Azure AI Services Google Vertex AI Vertex AI is a fully-managed, unified AI development platform for building and using generative AI. Google Vertex AI NVIDIA NIM NVIDIA NIM™, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations. NVIDIA NIM Project Description Links LangChain Cookbook Example code for building applications with LangChain, with an emphasis on more applied and end-to-end examples. GitHub LangGraph Examples Example code for building applications with LangGraph.

GitHub Llama Index Examples Example code for building applications with Llama Index. GitHub Streamlit LLM Examples Streamlit LLM app examples for getting started. GitHub Amazon Web Services (AWS) Project Description Links Amazon Bedrock Workshop Introduces how to leverage foundation models (FMs) through Amazon Bedrock. GitHub Google Cloud Platform (GCP) Project Description Links Google Vertex AI Examples Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop, and manage machine learning and generative AI workflows using Google Cloud Vertex AI.

GitHub Google Generative AI Examples Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI. GitHub Project Description Links NVIDIA NIM Anywhere An entry point for developing with NIMs that natively scales out to full-sized labs and up to production environments. GitHub NVIDIA NIM Deploy Reference implementations, example documents, and architecture guides that can be used as a starting point to deploy multiple NIMs and other NVIDIA microservices into Kubernetes and other production deployment environments.

GitHub Course Description Links 8-Week AI Bootcamp To Become A Generative AI-Data Scientist Focused on helping you become a Generative AI Data Scientist. Learn how to build and deploy AI-powered data science solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2. Enroll Here

People Also Asked

10 GitHub Repo for Data Scientist / Data analysts - Medium?

100+ Free Resources On Generative AI for Data Scientists Awesome Generative AI Data Scientist The Future is using AI and ML Together 🚀🚀 100+ Free Resources On Generative AI for Data Scientists A curated list of 100+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI Data Science applications with Large Language Models (LLMs) and depl...

10 GitHub Repositories to Master Data Science - KDnuggets?

GitHub Course Description Links 8-Week AI Bootcamp To Become A Generative AI-Data Scientist Focused on helping you become a Generative AI Data Scientist. Learn how to build and deploy AI-powered data science solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2. Enroll Here

data-scientist · GitHub Topics · GitHub?

GitHub Course Description Links 8-Week AI Bootcamp To Become A Generative AI-Data Scientist Focused on helping you become a Generative AI Data Scientist. Learn how to build and deploy AI-powered data science solutions using LangChain, LangGraph, Pandas, Scikit Learn, Streamlit, AWS, Bedrock, and EC2. Enroll Here

10 GitHub Repositories to Master Data Science - Open ...?

Please submit a pull request or open an issue if you have suggestions for new resources or improvements to existing ones. Thanks for your support! Awesome Real-World AI Use Cases Project Description Links 🚀🚀 AI-Powered Data Science Team In Python An AI-powered data science team of copilots that uses agents to help you perform common data science tasks 10X faster. Apps | Examples | GitHub 🚀 Awes...

10 GitHub Awesome Lists for Data Science – Data Scientists?

100+ Free Resources On Generative AI for Data Scientists Awesome Generative AI Data Scientist The Future is using AI and ML Together 🚀🚀 100+ Free Resources On Generative AI for Data Scientists A curated list of 100+ resources to help you become a Generative AI Data Scientist. This repository includes resources on building GenAI Data Science applications with Large Language Models (LLMs) and depl...