Thierno Ibrahima Diop,达喀尔开发者,达喀尔地区,塞内加尔
Thierno is available for hire
Hire Thierno

Thierno Ibrahima Diop

Verified Expert  in Engineering

Data Scientist and Developer

Location
Dakar, Dakar Region, Senegal
Toptal Member Since
April 25, 2022

Thierno是一位首席数据科学家,对自然语言处理(NLP)和机器学习(ML)充满热情。. 他已经指导数据科学家学徒三年了. 他之前在网络和移动应用程序开发方面做了三年的自由职业者. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.

Portfolio

NuurAI
GPT, Natural Language Processing (NLP)...
Karat
代码审查,源代码审查,招聘,面试,编程
FLock.io
Natural Language Processing (NLP), GPT...

Experience

Availability

Part-time

Preferred Environment

Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, Flask, SpaCy, Gensim, OpenAI

The most amazing...

...我开发的模型是一个检测代码中不同安全问题的系统. 它是使用大型语言模型构建的,例如GPT和LLaMA.

Work Experience

CEO | Lead Data Scientist

2022 - PRESENT
NuurAI
  • 领导机器学习工程师团队,应用深度学习从音频输入中检测受欢迎的背诵者.
  • 指导机器学习工程师应用深度学习来计算用户与背诵者的相似性.
  • 帮助团队实现深度学习技术,并用我们的用例进行实验.
Technologies: Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Audio, TensorFlow, PyTorch, Python 3, Artificial Intelligence (AI), Jupyter Notebook, Scikit-learn, Keras, DVC, Git, Matplotlib, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, Team Management, Interviewing, Hiring, Code Review, Programming, PostgreSQL

Senior Interview Engineer

2021 - PRESENT
Karat
  • 在不到一年的时间里,完成了400多次面试,升入大四.
  • 在与客户分享结果之前,负责其他面试官的质量控制.
  • Gave live reviews for the onboarding of new interviewers.
技术:代码审查,源代码审查,招聘,面试,编程

NLP Research Engineer

2023 - 2023
FLock.io
  • Tested different prompt techniques (zero-shot learning, few-shot learning, chain-of-thought, 与不同的法学硕士就20多个安全问题进行了讨论.
  • 优化llm以解决复杂的安全问题,并为模型准备数据.
  • 创建管道以处理具有中间表示的代码并评估llm.
  • 使用来自llm的嵌入,使用GMM和LDA进行主题建模.
  • 使用LLM生成代码,通过创建代理对不同的安全问题进行模糊测试.
  • Built the API and created the releases used in production.
  • Multithreaded to accelerate prediction and inference time.
Technologies: Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Python, Artificial Intelligence (AI), Machine Learning, Deep Learning, Topic Modeling, Clustering, Fuzz Testing, Language Models, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API

Lead Data Scientist

2019 - 2021
Baamtu
  • Created a text-to-speech program with the Wolof language. 使用Wolof语言将文本转换为音素的算法与两个参与者协调数据收集,并评估音素覆盖率.
  • 对沃洛夫语的自动语音识别做出了贡献. 设计了一个收集原始Wolof音频的平台,用于自我监督学习.
  • 建立光学字符识别(OCR)和计算机视觉模型,从国民身份证中提取结构化数据. 内部部署模型和AWS Lambda功能以实现可伸缩性. Built a rotation model to handle the image rotation.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Python, DVC, Bash Script, Amazon S3 (AWS S3), Amazon Web Services (AWS), Amazon EC2, Neural Networks, DeepSpeech, Deep Learning, NumPy, OCR, Seaborn, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Git, Jupyter Notebook, SpaCy, Machine Learning, Artificial Intelligence (AI), Artificial Neural Networks (ANN), APIs, SQL, Team Management, Source Code Review, Interviewing, Hiring, Code Review, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker

Data Scientist

2018 - 2019
Baamtu
  • 运用自然语言处理和自然语言分析从法律文本中提取有用信息. Developed a regex tester library.
  • 为一家电信公司开发了一个抽取式聊天机器人,用于自动FAQ,通过抓取网站和Twitter来收集数据.
  • Performed data collection and annotation. Deployed using AWS Lambda.
  • 利用Spark开发了一个规则系统,利用Apache Airflow实现了一个灵活的计分系统,具有作业管理和计分系统调度功能.
  • 使用来自多个来源的数据在电信领域执行客户细分. 将聚类模型与理论指标和业务指标进行比较.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Matplotlib, Python 3, Flask, Spark, Apache Airflow, Git, DVC, Gensim, SpaCy, Kaldi, Docker, Bash Script, Audio, Artificial Intelligence (AI), Jupyter Notebook, Keras, Streamlit, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), OCR, NumPy, SciPy, Seaborn, TensorBoard, APIs, SQL, Java, Source Code Review, Programming, Chatbots, Semantic Web, Databases, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker, Amazon DynamoDB

Developer

2015 - 2018
Freelance
  • 作为全栈web和移动开发人员,同时为多个客户工作.
  • 参与了prodispo移动和web应用程序的构思和实现.
  • 开发了一个用于购买电话信用的web应用程序.
  • 使用WebSocket创建并使用WebChat应用程序.
  • 为Gainde 2000会议的非物质化开发REST api, 以通关管理为核心的塞内加尔海关战略平台.
  • Created a web app for various football competitions.
  • 构建了一个web服务和一个社交跨平台移动应用.
  • Developed and orchestrated a news website using WordPress.
Technologies: PHP, Symfony, Angular, Ionic, React, Bash Script, Python 3, Jupyter Notebook, Git, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), APIs, Programming, PostgreSQL, AWS Lambda, Amazon DynamoDB

Automatic Speech Recognition for the Wolof Language.

开发了沃洛夫语的语音识别模型. 该项目涉及音频数据收集,并对多种模型和方法进行了评估. 数据必须在多样性和正确性方面进行验证和清理. 我将学习和从头开始的培训与传统和混合方法相结合.

This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.

Wolof Speech Recognition

对创建沃洛夫语自动语音识别做出贡献. 我设计了一个平台来收集原始的Wolof音频用于自我监督学习,并构建和部署了结果模型.

Chatbot for Customer Support in Telecommunication

一个半自动化客户支持和FAQ的聊天机器人应用程序. 这些数据是从多个网站上抓取的,并经过清理,构建了一个提取聊天机器人.
使用多个相似度量对多个文本特征提取和模型进行了测试和比较.

Languages

Python 3, Python, Bash Script, SQL, PHP, Java, R

Frameworks

Flask, Spark, Streamlit, Symfony, Angular, Ionic, Scrapy

Libraries/APIs

TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, PyTorch, SpaCy, React, NumPy, SciPy, DeepSpeech

Tools

Gensim, Apache气流,亚马逊SageMaker, Kaldi, Git, Seaborn, TensorBoard, Whisper

Platforms

Jupyter Notebook、Amazon EC2、Amazon Web Services (AWS)、AWS Lambda、Docker

Storage

Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases

Other

Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, GPT, Generative Pre-trained Transformers (GPT), Team Management, Amazon Textract, ChatGPT, DVC, OCR, Deep Learning, Artificial Neural Networks (ANN), APIs, Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API

Paradigms

Fuzz Testing

2015 - 2018

Master's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal

2013 - 2015

Bachelor's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal

JANUARY 2018 - PRESENT

Cloudera CCA 175 Spark and Hadoop Developer

Cloudera

Collaboration That Works

How to Work with Toptal

在数小时内,而不是数周或数月,我们的网络将为您直接匹配全球行业专家.

1

Share your needs

在与Toptal领域专家的电话中讨论您的需求并细化您的范围.
2

Choose your talent

在24小时内获得专业匹配人才的简短列表,以进行审查,面试和选择.
3

Start your risk-free talent trial

与你选择的人才一起工作,试用最多两周. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring