Building AI tools for real businesses
We build AI tools that solve actual problems. From streamlining operations to boosting innovation, we make AI practical and profitable for you.
We are building Sirji as an open-source framework to create & run custom AI agents for your everyday dev tasks.
A custom agent is a modular AI component designed to execute specific tasks based on predefined pseudocode. Developers articulate their coding style and domain expertise in plain English through pseudocode, enabling Sirji to perform tasks as instructed.
Sirji is implemented as a VS Code extension that provides an interactive chat interface right within your IDE. It leverages the capabilities of VS Code, including the Editor, Terminal, and Project Explorer.
We worked with Blockscout to display complex blockchain transactions in an easily understandable format for non-technical users.
We ran fine-tuning experiments on various models and ended up fine-tuning Llama-2-7b-chat-hf. This work involved collecting and cleaning up training data, maintaining multiple fine-tuning configurations, and evaluating the fine-tuning results such as losses at different epochs, resource utilization, and other key metrics.
Serving the fine-tuned model in production involved working with the Hugging Face Transformers Library and vLLM.
We built an AI-powered vector search (or semantic search) for Jam, a decentralized social app built on the Farcaster protocol.
The user’s social post is reduced to a numeric representation in the form of a vector (an N-dimensional array). This numeric representation allows us to translate semantic similarity (as how we humans understand and search) to ‘nearness’ in a multi-dimensional vector space using algorithms like ANN (approximate nearest neighbor).
The work also involved tinkering with embedding models and vector databases such as Weaviate.
While building a natural language AI entry point on top of a GraphQL explorer for Airstack, we had to work with a few hundred test cases - pairs of inputs and expected outputs. We extended OpenAI’s Evals framework to make it work for GraphQL.
But since working with JSONL, executing CLI commands, and keeping track of prompt versions can be daunting, we built a prompt evaluator specifically for product managers. It’s a visual tool with a great UX that acts as a wrapper for Evals. It has a prompt template, experiments, and test cases as the building blocks.
We have open-sourced the prompt evaluator. Here are the frontend and backend repos.
While working on Thursday, we built a fun game called Doodle Race. What is the doodle race game? You draw and AI guesses! Whoever manages to make AI guess the fastest, wins!
We used the quickdraw dataset, a collection of 50 million drawings across 345 doodle categories, to build a Convolutional Neural Network (CNN) architecture model. We experimented with creating a CNN from scratch as well as using the transfer learning technique on a pretrained model (efficientnet-b0).
We also ran experiments to determine the best way of transfer learning. You can find the details and the results here and here.
We worked with PromoteHour, a new-age PR agency, to automate the manual process of discovering relevant journalists and pitching their client products.
We built the tooling that uses LLMs to parse tens of thousands of unstructured web pages daily and extract structured data in a cost-effective way. It then creates embeddings, performs vector search, and generates reports that can be acted upon by the next set of agents.
A similar approach of using multiple agents to parse, extract, and act upon the original unstructured data sources was also applied when we built Raised, a web app that brings together the latest funding data from across the web.
Kedar Chandrayan from our team joined Gregory Kamradt as a co-maintainer of an open-source repository called Needle In A Haystack (NIAH).
NIAH evaluates LLMs to see how good they are at recalling information in long context scenarios. It is a benchmarking technique where the model is tasked with identifying specific information (the "needle") within a larger context (the "haystack").
NIAH has gained popularity in recent times with both Google and Anthropic adopting it for Gemini 1.5 and Claude 3.
It’s never really about AI though. The best products succeed by hiding the complexities behind simple and intuitive UX.
For example, in Listener we used speech-to-text and time offset values to bookmark important moments of a Zoom call in real-time and turn long recordings into bite-sized video clips.