Luke Salamone's Blog

Notes on Deepseek R1

Tue, 28 Jan 2025 08:35:55 -0800

DeepSeek R1 is a large language model which employs test-time compute to generate a response. Unlike many past decoder-based models that simply continue the given text (and may be fine-tuned for conversation), R1 generates reasoning tokens before producing a final answer. DeepSeek researchers report that its reasoning performance is on par with OpenAI’s O1 model.

Terminology

First, I will briefly describe some terminology related to training techniques:

Supervised fine-tuning (SFT) is a process that uses input/output pairs to directly fine-tune a model. In a reinforcement learning setting, SFT can help to mitigate cold start issues by providing initial policy behavior prior to RL training. The downside of SFT is that the input/output pairs can be expensive to acquire.

Space Is Really Big

Mon, 29 Jul 2024 12:16:49 -0700

More than 30 earths could fit between the earth and the moon.

Our elementary school models of the solar system really undersell how big space is. The problem is, space is too big and human brains are bad at exponentials. Logarithmic charts like this one are technically accurate, but my brain has a hard time contextualizing it.

To get an idea of how big space really is, let’s imagine that the earth is one millimeter wide. At that scale:

Custom PyTorch Collate Function

Fri, 12 Jul 2024 14:48:27 -0700

If your Dataset class looks something like

class MyDataset(Dataset):
  
  # ... boilerplate ...

  def __getitem__(self, idx):
    item = self.data[idx]
    return item['anchor'], item['positive'], item['negative']

your collate function should be

def collate_fn(data):
    anchors, pos, neg = zip(*data)
    anchors = tokenizer(anchors, return_tensors="pt", padding=True)
    pos = tokenizer(pos, return_tensors="pt", padding=True)
    neg = tokenizer(neg, return_tensors="pt", padding=True)
    return anchors, pos, neg

and you can use it like

dataset = MyDataset()
dataloader = DataLoader(dataset, 
                        batch_size=4, 
                        shuffle=True,
                        pin_memory=True,
                        collate_fn=collate_fn)

for anchors, positives, negatives in dataloader:
  anchors = anchors.to(device)
  positives = positives.to(device)
  negatives = negatives.to(device)
  
  # do more stuff

How does the collate_fn work?

The PyTorch collate function accepts a list of results from calls to the dataset getitem function and combines their components into tensors for convenient training.

Very Large Datasets in PyTorch

Thu, 27 Jun 2024 18:40:06 -0700

In God we trust. All others must bring data. ~ W. Edwards Deming

Datasets that fit in memory

For simple machine learning problems, your PyTorch dataset class probably looks something like this:

class SimpleDataset(Dataset):
    def __init__(self, features, targets):
        self.features = []
        for feature in features:
            self.features.append(self._feature_transform(feature))
        self.targets = targets

    def _feature_transform(self, feature):
        # Optional feature transformation function which 
        # converts each feature into its input representation 
        # for the model. This might be an expensive operation, 
        # so its best to do now rather than during training.
        return some_transformation_fn(feature)

    def __len__(self):
        return len(self.features)

    def __getitem__(self, idx):
        return self.features[idx], self.targets[idx]

With this method, we basically load all of the data into RAM at once, which is perfectly fine for small datasets. But sooner or later you’re going to run into a machine learning problem with a large dataset. What do I mean by this? I mean a dataset which can’t easily fit into RAM/VRAM.

How to Create Rust Python Bindings

Tue, 18 Jun 2024 16:21:35 -0700

Rust is super fast. Python is super flexible. Porting slow python code to rust can make your life a lot easier, and it’s not too difficult to set up.

I will demonstrate rust bindings for summing the integers in a large text file containing a billion digits that looks like

6,9,8,3,0,1,8,4,9,7,6,3,4,2,6,0,0,5,1,1, . . . ,4,5,9,3,3,2,8,3

General steps

Install Rust and Maturin

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
pip install maturin

See also install rust and install maturin

The Most Ramified Chess Position of 2023

Thu, 13 Jun 2024 19:38:39 -0700

I spent some time considering words to describe a chess position with many legal moves. “Complex” doesn’t quite capture the situation since we would usually describe a complex position as one with many tactical interations. Ramified seems to make the most sense, as it describes “branching out”.

The opening position in chess has 20 legal moves. From there, the number of legal moves in a position tends to increase as pieces move towards the center of the board, before decreasing as the number of pieces on the board drops in the endgame.

What are Sparse Autoencoders?

Thu, 06 Jun 2024 16:30:27 -0700

TLDR: A sparse autoencoder is just a regular autoencoder that encourages sparsity with an L1 penalty or KL divergence loss rather than using a low-dimensional bottleneck.

How does HNSW work?

Mon, 20 May 2024 13:38:01 -0700

Suppose we have a vector database with a billion items in it (the haystack). And suppose we are looking for K vectors, the needles which maximize some similarity function. (In the case of cosine similarity or euclidean distance, we may be maximizing 1-distance(x,y).) And also suppose that we’d like to do this quickly.

Naive and semi-naive approaches

One approach might be to compare every vector and take the argmax. In that case, for vectors of length D our runtime will be 1 billion x D.

Learning the Haystack

Wed, 27 Mar 2024 18:19:54 -0700

Embeddings, or vector representations of a document (which could be a piece of text, image, sound, etc.), can be extremely useful for making sense of large datasets. They transform information into a vector space such that their distance corresponds to their similarity.

Vectorized K-Means Clustering

Sun, 04 Feb 2024 23:39:29 -0700

K-means clustering (previous discussion) is an unsupervised learning algorithm which assigns points to one of K different clusters based on the distance of that point to a centroid. The points may represent physical locations, or embeddings in high-dimensional vector space.

🌟Check out the demo (in two dimensions) below. Centroids are colored white.🌟

Note that the points are changing color only, not moving.

General algorithm

The basic K-means algorithm is fairly simple and has two steps, repeated until convergence (i.e. when no points change cluster):

Summary: Deep & Cross Net v2

Mon, 02 Oct 2023 12:39:18 -0700

Paper link: https://arxiv.org/pdf/2008.13535

Learning to rank is an important problem in many machine-learning products such as search, recommendation, and advertising. Originally, many machine learning systems used simple logistic regression models, but it quickly became apparent that combining two or more features together was even better. This is called feature crossing.

A lot of research and engineering work has gone into learning useful feature crosses. The fundamental problem is that although higher-order feature crosses can be more informative, they are also more sparse, and the number of high order features grows exponentially. Some attempts to address this have been:

What is a blunder in chess?

Mon, 25 Sep 2023 20:47:30 -0800

What is a blunder in chess? The tension between the qualitative and quantitative answers to this question is at the heart of different approaches towards chess, and more broadly, how quantitative metrics may lack context, but qualitative metrics lack precision.

Qualitative answer

There are many qualitative answers to this question, especially when comparing “blunders” and “mistakes”:

“a move that negatively affects their position in a significant way” ~ chess.com
“severely worsens the player’s situation by allowing a loss of material, checkmate, or anything similar” ~ Wikipedia
“Blunders tend to be immediately refutable, while mistakes require planning to capitalize on.” ~ r/chess

An issue with these qualitative answers is that while their words may be correct, smart people may still disagree with their applicability at the margins. For a suboptimal move to have a “significant” negative effect, it requires that the opponent notices and takes advantage of it.

A 3D Game of Life

Wed, 23 Aug 2023 23:34:38 -0700

Conway’s Game of Life is a simulation developed in 1970 describing a grid of binary cells and transition rules for each cell which depend on the state of the cell’s neighbors. It’s capable of creating some pretty cool patterns.

This variant of the Game of Life uses three overlapping channels, so instead of just one simulation, there are three simultaneous simulations. I visualize these in the three color channels, red, green and blue. Two or more channels active on the same cell are represented with additive color mixing.

Can ChatGPT Recognize Handwritten Digits?

Sun, 30 Jul 2023 22:45:57 -0700

TLDR: No. No it cannot.

This was admittedly a fairly stupid experiment on the face of it. ChatGPT is a decoder-only model. It shouldn’t be able to perform an image recognition task. But then again, a decoder-only model wouldn’t have been my first choice for translation or summarization either. In my experience, ChatGPT has created translations which are at least as coherent and idiomatic as Google Translate, if not more so.

Paper Summary: Antenna Design with Evolutionary Algorithms

Mon, 17 Apr 2023 19:46:25 -0700

This is a summary of Automated Antenna Design with Evolutionary Algorithms, a 2006 paper by Hornby et al. As large language models become more and more synonymous with “AI”, it is interesting to see how researchers solved problems in the past.

Typically, antennas are designed and built by hand by domain experts. This is a very time-consuming process, however, so researchers have been investigating evolutionary algorithms since the 1990s. Inspired by natural evolution, an evolutionary algorithm is based on small, random changes and an evaluation metric. In this paper, the authors describe the use of an evolutionary algorithm to design an antenna for a small satellite weighing only 25 kilograms called ST5.

Paper Summary: Dual-Encoders in Ranking

Sat, 17 Dec 2022 16:53:47 -0800

In Defense of Dual-Encoders for Neural Ranking by Menon et. al. discusses the question of why dual-encoder (DE) models, also called Bi-Encoders elsewhere, don’t match the performance of cross-attention (CA) models. The authors investigate what is actually going on, and demonstrate some improved performance over baseline DE models with a new model distillation method.

Background

Search requires an automatic way to find the most relevant documents to a query. There are bag-of-word approaches to this task (for example BM25) and neural approaches. An example of a bag-of-words approach might simply be to count the number of similar words between the query and each document, and return the document with the highest number of similar words. There are word-stuffing issues with this idea, but the larger issue is that a bag-of-words strategy can’t account for synonyms. If I search for bad guy I will never find villain without some additional logic to account for this. A neural network implicitly understands the relationship between words, and avoids the fragile logic of simple word counts.

My Favorite Antimaia Games

Sat, 26 Nov 2022 20:25:13 -0800

This is a follow up to When Suboptimal Minimax is Better. After running 400 simulations, I can conclusively say that opponent modeling is pretty cool.

The TLDR on opponent modeling is that if we have a pretty good idea of what the opponent might do, we can beat them faster by playing moves which aren’t objectively “optimal” as far as minimax is concerned. Here, Maia 1900 is a model of a relatively high-level chess player. Antimaia 1900 is specifically designed to counter Maia 1900.

The Other End of the Earth

Wed, 23 Nov 2022 10:07:36 -0800

White areas show points of earth on land whose antipode is also on land. This is only about 8.6% of all of earth’s surface.

If you want to fly across the Pacific Ocean, you’ll have to board an airplane and fly around 12 hours. It’s pretty slow. A much faster route would be to go directly through the center of the earth. “Digging to China” was a common expression I heard growing up, with the implication that the opposite side of the globe is somewhere in Asia.

A Few Notes on the Transformer

Wed, 16 Nov 2022 15:24:15 -0500

A self-attention block depicted as a neural network.

In this post I will describe the attention mechanism, commonly used in transformers, a popular neural language architecture. Most of the most well-known large language models of late are based on the transformer architecture. Attention was first described in Attention is All You Need by Vaswani et al.

What is attention?

At a high level, attention is a mechanism for neural networks to boost portions of an input which are relevant and ignore those which aren’t. In language models, attention is used as a way for the model to learn which portions of a sentence are relevant to each word.

Rolling My Own Blog Search

Wed, 09 Nov 2022 02:42:51 -0700

I’ve found myself hitting ctrl+f on this blog enough that I figured it’s about time to add some search functionality to it. While there are certainly prefab solutions out there, this task is simple enough and fairly instructive. I had a few requirements, though:

The search needs to be fast, useful, and aesthetically pleasing.
Search in the browser. Standing up a server is a lot of extra work. It’s also overkill since I only have about 30 articles so far.

Semantic search

I did some experiments with small neural networks deployed using ONNX but ultimately they didn’t seem to be a good fit for this blog. The search experience was not quite as snappy as I’d have liked it to be, and while I was able to get the model under 10MB, it still added a good amount of bloat to the page size. Further, it wasn’t clear to me that the search results were significantly better, and in some cases they were worse. In any case, the advantages were not enough to justify the added page size.

A new type of chess tournament

Sat, 08 Oct 2022 15:17:36 -0700

This is part 2 of a paper I wrote for Ken Forbus’ Qualitative Reasoning course, adapted for this blog. You can find a printable version of the paper here and part 1 here.

In the previous post I discussed the history of chess engines and why they don’t “think” like we think. Trading interpretability for computation cycles ultimately led to the engines we have today, fairly alien in nature and perhaps less pedagogically useful because of it. At the time, though, the goal was to beat human grandmasters by any means necessary, a great engineering feat that the field had been working on for decades.

The Chess Engine's Final Horizon

Fri, 07 Oct 2022 20:17:21 -0700

This is part 1 of a paper I wrote for Ken Forbus’ Qualitative Reasoning course, adapted for this blog. You can find a printable version of the paper here and part 2 here.

Computers that play chess, otherwise known as chess engines, have existed since at least the late 1940s. Because the game was said to require the perfect combination of planning, strategy, psychology, and calculation, chess was once thought to be an activity directly correlated with intelligence, and that only a truly intelligent computer should be able to defeat humans. However, as a recent chess.com report explains, computers are now far stronger than humans:

When Suboptimal Minimax is Better

Sat, 02 Jul 2022 16:24:10 -0500

Minimax solves for optimal opponent play, minimizing the best move an opponent could make. But what if we knew the opponent wouldn’t make the best move? What if we knew what the opponent would do ahead of time? In that case, we could beat them faster by playing moves which take advantage of this fact, even if our move isn’t objectively the best move. Don’t play the game, play the man.

Alphabet Chess

Fri, 10 Jun 2022 23:56:14 -0500

TLDR: Alphabet chess is a chess variant that allows handicapping by mixing in a bit of poker into the beginning of the game. Moves must be played according to a secret word at the beginning of the game.

Chess has been played in different forms since the seventh century, and in its modern form since the nineteenth century. Opening theory, i.e. the study of the best moves to begin the game with, has been developing since then.

Paper Summary: COMET (Knowledge Graph Construction)

Tue, 17 May 2022 17:47:25 +0700

Selected {subject, relation, object} tuples generated by COMET

Paper link: https://arxiv.org/abs/1906.05317

This paper describes COMET, a method of generating knowledge bases automatically. Previous work largely focused on encyclopedic knowledge, which has well-defined relationships. This paper, however, focuses on commonsense knowledge. In this paper the authors introduce a “commonsense transformer” which trains on a knowledge base consisting of tuples and a pre-trained language model. Their trained model generates new nodes in the knowledge graph and completes phrases based on edges in the existing graph.

How to Create a Custom Pytorch Dataloader

Thu, 28 Apr 2022 18:22:07 -0500

First, create a custom dataset class.

from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
  def __init__(self, features, labels):

    assert len(features) == len(labels)
    self.features = features
    self.labels = labels

  def __len__(self):
    return len(self.features)

  def __getitem__(self, idx):
    return self.features[idx], self.labels[idx]

Next, create a custom dataloader where we specify the batch size.

features, labels = load_data()

# features & labels must have equal lengths
# e.g. features = [[1,2,3],[4,5,6]]
#      labels = [7,8]

dataset = CustomDataset(features, labels)
dataloader = DataLoader(dataset,
                        batch_size=batch_size,
                        shuffle=True)

Finally, iterate over the dataloader during training.

How to Zip and Unzip a tar.gz File

Wed, 30 Mar 2022 20:05:26 -0500

If you want to extract a tar archive

tar -xf archive.tar.gz

If you want to compress a directory

tar -czvf archive.tar.gz /path/to/directory

That’s all.

Paper Summary: Defending Against Neural Fake News

Sun, 19 Sep 2021 20:13:09 -0500

Defending Against Neural Fake News by Zellers et al. presents a model for controllable text generation called Grover. This model can be used to create highly believable computer-generated news articles. The authors present this paper as a method of detecting and preventing the spread of fake news. They claim their model is 92% accurate at detecting fake news stories, partially due to artifacts that generators include in the generated text.

Connect Jupyter to Remote

Tue, 07 Sep 2021 09:10:56 -0500

Here’s how to connect to a remote Jupyter notebook.

Create an ssh tunnel to your remote machine:

ssh -L 8080:localhost:8080 user@12.34.56.78

# or use a .pem file to connect to ec2
ssh -L 8080:localhost:8080 -i "aws.pem" ec2-user@ec2-12-34-56-78.compute-1.amazonaws.com

Start Jupyter on that machine in headless mode:

jupyter notebook --no-browser --port=8080

Use a browser to open one of the urls that Jupyter presents:
http://localhost:8080/?token=xyz

What is Marginalization?

Wed, 07 Jul 2021 14:23:12 -0500

In machine learning and statistics, marginalization simply means summing over a set of independent variables. For example, suppose an avid tennis player kept track of the number of days he played tennis over a period of time as well as the weather on that day:

		weather
		sunny	cloudy	rainy	totals
play?	yes	70	25	1	96
no	70	5	9	84
totals	140	30	10	180

(In this table we’re keeping track of the number of days. If you want probabilities, divide each value in the table by 180. But I think whole numbers are easier to think about so I’m keeping them.)

Colab: Connect to Google Drive

Wed, 30 Jun 2021 22:58:18 -0500

Here’s how to connect your Google Colab notebook to your Drive directory:

from google.colab import drive
drive.mount('/content/gdrive')

Follow the prompts from there. That is all.

BERT vs GPT-2 Performance

Mon, 21 Jun 2021 01:04:42 -0500

There are quite a few BERT vs GPT-2 breakdowns online, mostly focusing on the architectural differences between the two models. However, I am more interested in the performance differences between the two models, specifically their predictive capabilities. This blog post outlines the results of my experiments.

The code used in this experiment can be found on my Github

BERT

The Devlin et al. model was released in November 2018. It is a transformer-based language model pretrained on masked input (also known as the cloze task). During pretraining, 15% of tokens are hidden from the model, and it is trained to predict the masked tokens. As a result, I was able to evaluate its ability to correctly predict a masked token at a random position in a fixed-size input.

How does GPT-2 Tokenize Text?

Thu, 17 Jun 2021 19:30:48 -0500

Let’s explore how GPT-2 tokenizes text.

What is tokenization?

It’s important to understand that GPT-2 doesn’t work with strings directly. Instead, it needs to tokenize the input string, which is essentially a process for converting the string into a list of numbers, or “tokens”. It is these tokens which are passed into the model during training or for inference. As a concrete example, let’s look at a few sample sentences:

What Are Attention Masks?

Tue, 15 Jun 2021 19:09:36 -0500

TLDR: Attention masks allow us to send a batch into the transformer even when the examples in the batch have varying lengths. We do this by padding all sequences to the same length, then using the “attention_mask” tensor to identify which tokens are padding.

Here we use a batch with three samples padded from the left since we want to predict the next token on the right. (Padding on the right would probably predict another pad.)

How Does Convolution Work?

Mon, 14 Jun 2021 21:05:06 -0500

Convolutional neural networks have had breakthrough success in image recognition, natural language processing, and even board games like Chess and Go. But what’s really going on during convolution? Well, I think the easiest way to explain is with an interactive demo. Feel free to play around with the parameters below to see for yourself!

number:
padding:
kernel size:
stride:
speed:

You can use the settings above to control the hyperparameters of the convolutional layer.

Python: Serve an HTML File

Sun, 09 May 2021 15:06:11 -0500

If you want to serve some HTML with python run

python -m http.server 8000

Then navigate to http://localhost:8000.

This is not meant for production environments but will get you around CORS restrictions that would come from simply opening a local file in your browser.

How to Train and Run a Simple Language Model

Fri, 16 Apr 2021 21:08:53 -0500

This article will show how to run a simple language model, KenLM. It’s not as powerful as transformer-based models like BERT or GPT-3, but depending on what you’re trying to accomplish it may be more than enough. This tutorial should take you about 15 minutes, including the time to run the scripts.

Let’s work backwards from where we’re trying to get to. When you’ve finished, you should be able to run the following script:

What is Temperature in NLP?🐭

Fri, 02 Apr 2021 00:50:38 -0500

Temperature is a parameter used in natural language processing models to increase or decrease the “confidence” a model has in its most likely response.

What is Perplexity?

Thu, 01 Apr 2021 12:14:49 -0500

TLDR: NLP metric ranging from 1 to infinity. Lower is better.

In natural language processing, perplexity is the most common metric used to measure the performance of a language model. To calculate perplexity, we use the following formula:

S3 Bucket Url

Wed, 10 Mar 2021 03:03:53 -0600

Assuming your bucket is publicly accessible, the url of your S3 bucket will be

http://[bucket-name].s3-website-[region].amazonaws.com

For example for “mybucket” in “us-east-1” your url will be

http://mybucket.s3-website-us-east-1.amazonaws.com

About My Quick Reference Articles

Sun, 07 Mar 2021 14:44:37 -0600

I’ve created a few quick-reference articles and it might not be clear why. There are a few reasons:

These articles are mainly a reference for me. I find myself searching the same things over and over, looking for the purple link, scrolling through the article, then copy & pasting code. I’d rather not go through the hassle. These articles aim to solve that problem.
I aim to keep the answers above the fold. I don’t want to have to scroll down to find the answer. I almost never read the surrounding prose when I am in “coding mode”.
I don’t have ads or popups on my blog. I will never ask people to sign up for a newsletter or login to read more. I also don’t use pictures unless there’s a good reason. The “thinking AI robot stock photo” industry is definitely a bubble.
Writing these things out explicitly helps me to remember them. Paradoxically, this may make these how-to pages less useful to me, but maybe someone else will find them useful.

These quick reference articles don’t explain much because I don’t need an explanation of what is going on. I just need a chunk of working code. There are other websites which have far more comprehensive guides covering how to do things. They cover all of the fundamentals of how things are done. But I don’t need that, I just want a 30 second reference with a working chunk of code.

Python: Read & Write Json

Sun, 07 Mar 2021 14:05:27 -0600

Often it is useful to save python data to json files. The following code will demonstrate how that can be done.

“God bless JSON!” ~ a soon to be famous programmer

import json

data = {'a': 1, 'b':'hello', 'c':False}
filename = 'awesome_data.json'

# write data to file
with open(filename, 'w') as f:
  json.dump(data, f)


# read json from file
with open(filename, 'r') as f:
  data = json.load(f)


print(data)
# prints {'a': 1, 'b':'hello', 'c':False}

Autoencoding Stock Prices

Sun, 07 Mar 2021 01:31:51 -0600

Autoencoding stock prices as found in Heaton et al., 2016

So you want to build an autoencoder? Great! This article will demonstrate how to build an autoencoder and use it to measure stock prices against an index. This technique is described in more technical terms here.

Once we’ve trained the autoencoder, we can use it to measure how well each component follows the other members of the index. This can be useful for finding deeper insights into an index, and doesn’t require a priori knowledge of the index price or the weighting of its components. Note, this is only one metric which one could use to determine how well one member of the group follows the group overall. Another might be Pearson Correlation.

Python: Formatting a string

Wed, 24 Feb 2021 21:22:42 -0600

There are three main ways to format strings in python:

name = 'Luke'
food = 'pizza'

# old style
"My name is %s and I like %s." % (name, food)

# str.format()
"My name is {0} and I like {1}.".format(name, food)

# f-strings
f"My name is {name} and I like {food}."

Siamese Neural Networks (Video)

Thu, 17 Dec 2020 11:22:43 -0600

The following is a transcript of the above video

In this paper, the authors present a novel neural network architecture to enable audio search via sounds humans are able to make, for example humming and whistling. This is an important capability when searching through audio for a specific sound.

Motivation

Imagine you have hundreds of unlabeled sound effects on your computer, and you are looking for a specific one. It could be very tedious to listen to every single one until you can find the right sound. Even if the sounds do have some kind of word labels, it could be hard to pinpoint exactly which words to search for. A lot of sounds don’t exactly lend themselves to text descriptors, so finding the right sound can be difficult with a text search.

Managing Python Environments

Sat, 24 Oct 2020 17:45:41 -0500

Need to switch between python versions often? Use pyenv.

Installing pyenv

# install pyenv
curl https://pyenv.run | bash

# check pyenv install location
which pyenv

Install another python version

# see a list of available python versions
pyenv install --list

# check installed python versions
pyenv versions

# installs python 3.7.5
pyenv install 3.7.5

Switch python versions

# use python 3.7.5 everywhere on your machine
pyenv global 3.7.5

# use python 3.7.5 in current directory
pyenv local 3.7.5

# use python 3.7.5 in current shell session
pyenv shell 3.7.5

How does K-means clustering work?

Wed, 07 Oct 2020 17:39:22 -0700

K-means clustering (not to be confused with K-nearest neighbors) is an unsupervised learning algorithm used for grouping similar points together into clusters.

Algorithm

The basic K-means algorithm is fairly simple and has two steps, repeated until convergence:

assign points to cluster corresponding to closest centroid
update centroid locations to the mean of all points assigned to the associated cluster

The algorithm converges when the centroids stop moving, i.e. no points can be switched between clusters to a closer centroid.

What is the Hardest Hangman Word?

Tue, 21 Jul 2020 17:34:05 +0800

It seems like a simple enough question. Which word should you choose so that it takes your opponent the most guesses to discover it? Should you choose a long word to use up your opponent’s guesses? Or perhaps a short word with obscure letters? In this document I look into this question. But first, a bit of background.

If you’re not familiar with the rules of hangman, it is a guessing game played between two people. Player A chooses a secret word, and tells player B the length of the secret word. Player B guesses letters which she thinks might be in the word. If she chooses a correct letter, player A reveals the locations of each instance of the guessed letter. However, if player B guesses an incorrect letter, this counts as a “strike” against her. After an agreed-upon number of strikes, player B loses.

Estimating Pi with a Monte Carlo Simulation

Thu, 09 Jul 2020 15:40:14 +0800

A Monte Carlo simulation is a method of estimating events or quantities which are difficult or computationally infeasible to derive a closed-form solution to. The value of the mathematical constant Pi is a good example of this: although it is possible to calculate the exact value of Pi, a good estimate is easily demonstrated with just a few lines of code.

Creating an AI for Gomoku

Tue, 19 May 2020 14:28:57 +0800

Gomoku is a strategy game similar to tic tac toe, but played on a larger board and with the goal of getting 5 in a row rather than 3. Since the game has perfect information and has simple rules, I thought it would be a fun exercise in creating a game AI. In February 2020 I decided to code up Gomoku2049. The game is a demonstration of MiniMax, which is an algorithm for finding the move which minimizes the opponent’s best moves. This article is an overview of the game’s technical highlights.