Day 28 of 100 Days of AI

I shared a post today about a team of agents I built to do research work. More about this here.

This evening I also decided to return to the absolute basics of AI, and have started another online course.

AI for Everyone” is a non-technical introduction to AI by the prominent AI researcher, Andrew Ng. It’s a 4-week self-paced course that I plan to complete in half the time or less.

I plan to pick up some of the very basics about AI, even though in the last few weeks I’ve already got to a place where I can build and train my own models. I especially want to fill in the gaps on key developments. For example, on the course today I came across this chart and learnt something new: Traditional AI (mostly supervised machine learning techniques) go only so far — even though they still offer tons of value. More recent developments, however, have come from neural networks. I’ll have to explore them further in the coming weeks.

Day 27 of 100 Days of AI

I spent a big chunk of today learning all about AI agents. Here’s how researchers describe them:

“As opposed to zero-shot prompting of a large language model where a user types into an open-ended text field and gets a result without additional input, agents allow for more complex interaction and orchestration. In particular, agentic systems have a notion of planning, loops, reflection and other control structures that heavily leverage the model’s inherent reasoning capabilities to accomplish a task end-to-end. Paired with the ability to use tools, plugins, and function calling, agents are empowered to do more general-purpose work.”

From: The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

Agents have a lot of potential to automate all kinds of knowledge work tasks and I’m excited to build a few things with them. I’ll be sharing my first experiment with them tomorrow.

Day 26 of 100 Days of AI

I was finally able to train a classification model using Google’s Vertex AI. When I do this locally on my machine with the same training data it trains in a few seconds. On Vertex AI it took 1 hour 46 minutes!

You can see the training budget I set below in node hours and the training performance.

I don’t fully understand why the training took so long in the cloud but I think that since I chose AutoML — a configuration for non-machine learning experts — Google’s platform executed a bunch of automated machine learning methods which take up a big chunk of processing time. These include feature engineering, testing various model architectures, and hyperparameter tuning.

It’s possible to train a model on Vertex AI without using AutoML but that needs significantly more configuration. For small data sets I’m likely better off just writing and running the training code locally. Still, what Vertex AI offers is massively helpful for people building software for public or private business consumption.

Here is one of the charts Google automatically creates from the classification model. I used heart attack data from Kaggle but since Vertex AI requires at least 1,000 data points, I created fictional data by duplicating several rows of data. This means that my model isn’t of much use, but it was good learning exercise. (Note: in this fictional dataset, caa — number of major vessels that are narrow or blocked on a fluoroscopy — is the most important feature in predicting a heart attack.)

Once the model was trained, I deployed it to an endpoint (accessible via an API) and below I tested it with a few inputs for a prediction.

Key Takeaways:

  • Building and deploying models in the cloud is now easier than ever, thanks to tools like Vertex AI. However, for small datasets and experiments, staying local on a machine is probably best.

Day 25 of 100 Days of AI

I spent today catching up on AI news, and reading about Meta’s new open-weights model Llama 3. Note the use of the word “open-weights” rather than “open-source”, since Llama 3 has some restrictions on use.

This is an impressive release by Meta. They trained the model on 24,000 GPUs using 15 trillion tokens of data (GPT3 was trained on 300bn) and some of the evaluation metrics look strong.

Tomorrow I’ll do some research on how to run open-weight models either locally or in the cloud.

Day 24 of 100 Days of AI

Today I decided to try out Google’s Vertex AI — a machine learning platform that makes it easier to train and deploy ML models.

Google Vertex AI

In the last few weeks I’ve been writing code in Python to train models, but Vertex AI provides a low-code platform and a set of tools that simplify the deployment of ML models at scale.

Despite the promises of ease-of-use, I struggled to get Vertex AI to work on a small dataset (1k rows of data). Both training runs I tried went for 5-12 minutes before throwing up errors I couldn’t figure out.

Meanwhile, I was able to get a simple logistic regression model going in Python and trained locally in seconds. Here’s the performance of the model on fictional heart attack data.

Doing the same thing on Vertex AI should be quicker, and easier to deploy into a app. But it looks like I still have lots to figure out about the platform.

Day 23 of 100 Days of AI

SVM, Precision & Recall.

I completed the final honors exercise today. Here’s a confusion matrix of my support vector machine model model. This chart is read diagonally left to right, and then diagonally right to left. In other words, we start with the diagonal top left to bottom right:

  • Top Left (True Positives): the model correctly predicted rain in 98 cases.
  • Bottom Right (True Negatives): the model correctly predicted that it would not rain in 445 cases.

Then we read the chart going from top right to bottom left:

  • Top Right (False Negatives): the model incorrectly predicted ‘No Rain’ in 84 cases where it actually rained.
  • Bottom Left (False Positives): the model incorrectly predict ‘Rain’ in 28 cases where it did not actually rain.

Key takeaways:

  • Useful metrics that come from this chart include:
    • Precision: This is a metric that tells us how many predictions actually happened out of all the predictions for rain. e.g. in the example above:
      • Precision for ‘Rain’ predictions: The model predicted rain cases of 126 total (i.e. 98 + 28) but only 98 were actually true rain cases in the test data. This gives a precision rate of 77.8%.
    • Sensitivity (or Recall): This is a metric that tells us what proportion of predictions were correct vs. wrong. e.g. in the example above:
      • Sensitivity for ‘Rain’: The model correctly predicted rain cases of 98 rain cases out of a total of 182 (98 + 84) actual rain occurrences. This gives a sensitivity ratio of 53.8%.

Update on where I am with the honors labs:

  1. Linear Regression – completed.
  2. KNN – completed
  3. Decision Trees – completed
  4. Logistic Regression – completed
  5. SVM – completed

Day 22 of 100 Days of AI

I’m almost done with the honors assignments and today I went through a logistic regression model exercise. The code is on Github.

I used ChatGPT to help me write out the code for the visualizations. Here’s one I found particularly helpful. It shows the top 10 coefficients (aka features) that are most impactful on the model. Overall this model also had over 80% accuracy on predicting rain for the next day.

Update on where I am with the honors labs:

  1. Linear Regression – completed.
  2. KNN – completed
  3. Decision Trees – completed
  4. Logistic Regression – completed
  5. SVM – to do

Day 21 of 100 Days of AI

Today I trained a decision tree model on weather data from the honors lab I’m completing.

The model achieves a 83% accuracy on predicting if it will rain or not, using historical data. The chart below shows the decision tree steps.

More performance metrics can be viewed below.

I’ve put the code on Github.

A quick update on the honors assignments I’m completing:

  1. Linear Regression – completed.
  2. KNN – completed
  3. Decision Trees – to do
  4. Logistic Regression – to do
  5. SVM – to do

Day 20 of 100 Days of AI: Too Much Hype, Taming the Excitement and Getting Real

Today, I caught up on some AI news.

There are lots of cool AI demos floating around on the net. Some look super impressive, but they fall apart when you dig into actual use cases.

In December last year, we learned that the Google Gemini model was edited favourably to look more impressive than it actually is. There’s no doubt that these models are incredible, but the demos are often designed to stir up more hype than is warranted.

Recently, there’s been talk of tools like Devin replacing software engineers. But experts in the field are more sceptical. Here’s a great breakdown video that tames the hype. (The Reddit discussion is good, too.)

And what of large language models in general? People are starting to see limits. Here’s a good perspective from one person who’s used LLMs on half a billion tokens in an enterprise use case.

Overall it’s great to see all the cool things generative AI can do. But we need to tame expectations! These tools have limits.

Day 19 of 100 Days of AI

Today I completed the KNN lab. The work is ungraded and I haven’t yet checked whether my results are in line with other students’ but I’ll dig into this once I have all the modes. Here are the results for the KNN model I built today. The code is on Github.

  1. Linear Regression – completed.
  2. KNN – completed
  3. Decision Trees – to do
  4. Logistic Regression – to do
  5. SVM – to do