Here you can find my publications and blog posts about AI, machine learning, and related topics.
Recent Publications
Build An End-to-End SQL + RAG AI Agent
A practical guide to building a modular SQL + RAG AI agent using open source tools. It enables natural language access to structured data for enterprise use cases.
Key Insights
- Simple Workflow: Four steps—embed, retrieve, ground, orchestrate.
- SQL + LLM Bridge: Natural language over structured data, no SQL needed.
- Lightweight Stack: Open source, cloud-friendly, production-ready.
- Modular Design: Easy to extend with different models and databases.
Does DeepSeek Solve the Small-Scale Model Performance Puzzle?
This article explores DeepSeek-R1, a distilled reasoning model designed to deliver high performance in a compact form factor. It examines how DeepSeek-R1 performs on Intel hardware, highlighting its efficiency and suitability for resource-constrained environments.
Key Insights
- Efficient Reasoning: DeepSeek-R1 maintains strong reasoning capabilities despite its reduced size.
- Optimized Performance: The model is tailored for Intel hardware, ensuring efficient execution.
- Practical Deployment: Its compact design makes it ideal for deployment in environments with limited computational resources.
LF AI & Data Foundation: Intel’s Ezequiel Lanza Named Technical Advisory Committee (TAC) Chairperson
This article highlights the recent nomination of Ezequiel “Eze” Lanza as chairperson of the LF AI & Data Foundation’s Technical Advisory Committee (TAC). It outlines his vision, priorities, and the impact of his leadership on the open source AI ecosystem.
Key Insights
- Community Leadership: Eze brings a collaborative, inclusive approach to fostering innovation across open source AI and data projects.
- Strategic Vision: His focus includes responsible AI, scalable infrastructure, and emerging trends like agentic AI and efficient models.
- Ecosystem Growth: As TAC chairperson, Eze will help shape governance, align project priorities, and connect contributors across academia, industry, and individual developers.
Understanding Retrieval Augmented Generation (RAG)
This article explains how RAG enhances language models by retrieving relevant external documents at runtime. It outlines how separating the knowledge base from the model improves accuracy, adaptability, and enterprise applicability.
Key Insights
- Grounded Outputs: Combines retrieval with generation to produce fact-based responses.
- Updatable Knowledge: No need to retrain—just update the external data source.
- Scalable Design: Works with vector databases and is optimized for Intel hardware.
- Enterprise Focus: Suitable for real-world applications, especially where accuracy and efficiency matter.
How to Containerize Your Local LLM
This guide outlines the process of containerizing a local Large Language Model (LLM), such as LLaMA2, to create a scalable and portable API service. By decoupling model storage from the container, it facilitates efficient deployment and integration into applications. Medium
Key Insights
- Microservice Architecture: Encapsulates LLM logic within a container, promoting scalability and maintainability.
- External Model Storage: Stores large model files outside the container to reduce image size and simplify updates.
- Practical Implementation: Provides step-by-step instructions using Intel’s open-source resources.
- Modular Design: Supports integration with various front-end interfaces and deployment environments.
Improve your Tabular Data Ingestion for RAG with Reranking
This article demonstrates how to enhance RAG system accuracy by adding a reranker to select the most relevant context chunks from tabular data, addressing the challenge of context mismatches in retrieval-augmented generation.
Key Insights
- Reranking Solution: Adds an additional scoring step after initial retrieval to prioritize the most relevant context chunks before LLM processing.
- Data Pipeline: Covers complete workflow from PDF extraction through indexing with both unified and distributed context collection strategies.
- Practical Implementation: Provides code examples using ChromaDB and demonstrates improved relevance scoring with real billionaire data from Wikipedia.
LLMs Dance Party: Foundation vs. Fine-Tuned Models
This article uses a creative DJ analogy to explain the differences between foundation models and fine-tuned models, helping developers decide which type of large language model best suits their specific project needs.
Key Insights
- DJ Analogy: Foundation models are like wedding DJs (broad appeal across genres) while fine-tuned models are like specialized DJs (deep expertise in specific genres).
- Decision Factors: Choice depends on three key considerations - scope and versatility, resource efficiency, and customization needs including privacy requirements.
- Practical Examples: Demonstrates the difference using ChatGPT’s general responses versus a Kubernetes-specialized model’s domain-specific knowledge.
The Case for Human-Centered XAI
This article examines the gap between current explainable AI (XAI) approaches and end-user needs, advocating for human-centered design that prioritizes user comprehension over technical sophistication in AI explanations.
Key Insights
- User Knowledge Gap: Study reveals that current XAI techniques work well for AI-experienced users but fail to provide meaningful explanations to users with low AI knowledge.
- Four Explanation Types: Research tested heatmap-based, example-based, concept-based, and prototype-based explanations using the Merlin bird identification app.
- Human-Centered Approach: Emphasizes the need for XAI techniques that serve end users rather than AI creators, requiring real user studies to validate explanation effectiveness.
What is Explainable AI (XAI) and Why Does It Matter?
This article explores explainable AI (XAI) fundamentals and its role in building trustworthy models, covering responsible AI principles, development practices, and different types of explanations to help users understand AI decision-making.
Key Insights
- Responsible AI Foundation: Four key principles (fairness, transparency, accountability, privacy) explained through a pizza analogy, forming the foundation for trustworthy AI development.
- XAI Types: Three categories of explainability - data explainability (bias detection), model explainability (understanding architecture), and post-hoc explainability (decision reasoning).
- Audience-Tailored Explanations: Emphasizes that explanations should be customized for different audiences - regulatory, development, and end-user - with varying levels of technical detail.
How to Apply Transformers to Time Series Models
This article explores how to adapt transformer architectures for time series forecasting, addressing the unique challenges of applying these models to sequential temporal data and introducing solutions like Informer and Spacetimeformer.
Key Insights
- Quadratic Complexity Challenge: Traditional transformers face computational bottlenecks in time series due to quadratic growth in attention calculation with sequence length.
- Network Modifications: Two critical improvements - learnable positional encoding for temporal patterns and ProbSparse attention to reduce computational complexity.
- Practical Solutions: Open-source models like Informer and Spacetimeformer show improved performance over LSTM, especially for long-term predictions with real-world applications.