One of the great problems in the realm of Domain Generalization is generalizing the idea of numbers such that I could train a machine learning model on a data set like MNIST then use that model to accurately classify another number dataset like USPS or SHVN. Current models struggle to generalize since they are unable to capture the abstract idea of number. When a model is trained on a number set, its description of a number is rooted in size, color, shape, and location. So when the model is tested on another dataset where the representation of a number may vary in the aforementioned qualities, the model fails. So the question is, can we develop a model to generalize.
I realized early on that to find such a model it is first worth simplifying the data set so that the object being abstracted is simple. So I came up with the idea of the Dot or No-Dot data set. The code I have above generates images of specified dimension and quantity, half of which have a dot and the other half does not. The images that have no dot are a randomly selected gradient. In images with a dot, I randomly vary the background color, the background gradient, the dot color, the dot location, and the dot size. As a result, in each image the idea of a dot or no dot is represented differently, hence a model that is able to classify the images correctly has abstracted the idea of dot invariant of color, size and location. The motivation is that such a model may also be able to abstract the idea of a number.
import numpy as np
import matplotlib.pyplot as plt
from random import randint, choice
def generate_mixed_dataset(num_images):
images = [] # Store generated images
labels = [] # Store labels for the images
for _ in range(num_images // 2):
# Create a homogeneous background image
base_color = np.random.randint(0, 256, 3) # Pick a random color
homogeneous_image = np.zeros((32, 32, 3), dtype=np.uint8)
for x in range(32):
for y in range(32):
# Add some random variation to the color
variation = np.random.randint(-10, 11, 3)
pixel_color = np.clip(base_color + variation, 0, 255)
homogeneous_image[x, y] = pixel_color
images.append(homogeneous_image)
labels.append(0) # 0 for no dot
# Create an image with a dot on a gradient background
base_color = np.random.randint(0, 256, 3) # Again, pick a random color
gradient_image = np.zeros((32, 32, 3), dtype=np.uint8)
for x in range(32):
for y in range(32):
# Make a gradient variation
variation = np.random.randint(-10, 11, 3)
pixel_color = np.clip(base_color + variation, 0, 255)
gradient_image[x, y] = pixel_color
dot_size = randint(1, 4) # Choose a random size for the dot
x = randint(0, 32 - dot_size) # Random position for the dot
y = randint(0, 32 - dot_size)
dot_color = np.random.randint(0, 256, 3) # Random color for the dot
# Make sure the dot color stands out
while np.any(np.abs(dot_color - gradient_image[x, y]) < 20):
dot_color = np.random.randint(0, 256, 3)
for i in range(dot_size):
for j in range(dot_size):
gradient_image[x + i, y + j] = dot_color # Place the dot
images.append(gradient_image)
labels.append(1) # 1 for dot present
return images, labels
Brief overview of the coding aspects of the project or links to code repositories...
The following code is the code I used to generate randomly, color, located, oriented tetris shapes. The following dataset represents the next step for a model built to abstract structures. Unlike the dot example, the following dataset requires that model to understand also the structure of the object, invariant to the structure's orientation. Additionally, the task also increases in difficulty now that the classifier must be non-binary. A model that is able to classify the following dataset appropriately may be robust enough for other complex structures like shapes,or maybe even numbers.
import numpy as np
import matplotlib.pyplot as plt
from random import randint, choice
def generate_tetris_shape(shape_type):
shapes = {
"I": np.array([[1, 1, 1, 1]]),
"O": np.array([[1, 1], [1, 1]]),
"T": np.array([[1, 1, 1], [0, 1, 0]]),
"S": np.array([[0, 1, 1], [1, 1, 0]]),
"L": np.array([[0, 0, 1], [1, 1, 1]])
}
return shapes[shape_type]
def rotate_shape(shape):
return np.rot90(shape, k=randint(0, 3))
def generate_mixed_dataset(num_images, image_dim):
images = []
labels = []
for _ in range(num_images // 2):
# Homogeneous background image
base_color = np.random.randint(0, 256, 3)
homogeneous_image = np.zeros((image_dim, image_dim, 3), dtype=np.uint8)
for x in range(image_dim):
for y in range(image_dim):
variation = np.random.randint(-10, 11, 3)
pixel_color = np.clip(base_color + variation, 0, 255)
homogeneous_image[x, y] = pixel_color
images.append(homogeneous_image)
labels.append("None")
# Image with a Tetris shape
base_color = np.random.randint(0, 256, 3)
gradient_image = np.zeros((image_dim, image_dim, 3), dtype=np.uint8)
for x in range(image_dim):
for y in range(image_dim):
variation = np.random.randint(-10, 11, 3)
pixel_color = np.clip(base_color + variation, 0, 255)
gradient_image[x, y] = pixel_color
shape_type = choice(["I", "O", "T", "S", "L"])
shape = generate_tetris_shape(shape_type)
shape = rotate_shape(shape)
x = randint(0, image_dim - shape.shape[0])
y = randint(0, image_dim - shape.shape[1])
shape_color = np.random.randint(0, 256, 3)
for i in range(shape.shape[0]):
for j in range(shape.shape[1]):
if shape[i, j] == 1:
gradient_image[x + i, y + j] = shape_color
images.append(gradient_image)
labels.append(shape_type)
return images, labels
Brief overview of the coding aspects of the project or links to code repositories...
The challenge of accurately classifying images of dots and Tetris shapes primarily revolves around how these images are represented, rather than the intricacies of the classification model itself. The objective is to develop an image representation that is invariant to specific factors such as the location, size, and color of the shapes. To achieve this, we adopt an innovative approach by using a weighted complete graph to describe each image. In this graph, each pixel is a node, and the edges between these nodes are weighted based on the similarity between two pixels, determined by their color difference and spatial distance. This methodology is particularly effective in identifying dots, as they tend to form nodes that are distant from other nodes in the graph.
The process begins with the image_to_graph function, which iterates through every pixel in the image. For each pixel, a vector is created, with each entry representing the weight of the pixel's edge to other pixels. The weight is set to zero if the pixels differ in color, and if they are the same color, it is inversely proportional to their distance. The vectors are then sorted in decreasing order based on these weights, keeping only the top K weights for each pixel, thereby focusing on each pixel's K closest neighbors.
Following this, the create_graph_from_image function takes over. It organizes the vectors of nodes based on their magnitude in increasing order, highlighting the node that is least related to any others. The matrix of vectors is then flattened into a single large vector, corresponding to each image. This large vector encapsulates the relational dynamics of the nodes that stand out in the graph.
Finally, this graph-based representation of images is utilized to train a K-Nearest Neighbors (KNN) regression model. The model predicts whether an image contains a dot or a Tetris shape by analyzing the isolation of certain nodes (pixels) from others within the graph. This approach shifts the focus from the absolute characteristics of pixels to their relational structure within the image, enabling an effective classification of dots and Tetris shapes.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
def calculate_edge_weight(pixel1, pixel2, coord1, coord2, color_similarity_threshold=15):
# Calculate spatial distance and normalize it
spatial_dist = np.linalg.norm(np.array(coord1) - np.array(coord2))
# Calculate color distance
color_dist = np.linalg.norm(np.array(pixel1) - np.array(pixel2))
# Check if the color distance is within the similarity threshold
if color_dist < color_similarity_threshold:
# If colors are similar, return the normalized spatial distance
return 1/spatial_dist
else:
# If colors are not similar, return 0
return 0.0
def image_to_graph(image):
num_pixels = image.shape[0] * image.shape[1]
node_vectors = np.zeros((num_pixels, num_pixels))
for x1 in range(image.shape[0]):
for y1 in range(image.shape[1]):
idx1 = x1 * image.shape[1] + y1
for x2 in range(image.shape[0]):
for y2 in range(image.shape[1]):
idx2 = x2 * image.shape[1] + y2
if idx1 != idx2: # Avoid self-edges
weight = calculate_edge_weight(image[x1, y1], image[x2, y2], (x1, y1), (x2, y2))
node_vectors[idx1, idx2] = weight
sorted_node_vectors = np.sort(node_vectors, axis=1)[:, ::-1][:, :25] # Tune the 25, it represents K, or the amount of entries of each vector we want to keep
return sorted_node_vectors
def create_graphs_from_images(images, labels):
graph_vectors = []
for i, image in enumerate(images):
sorted_vectors = image_to_graph(image)
# Sort nodes by the magnitude of their vectors in increasing order
magnitudes = np.linalg.norm(sorted_vectors, axis=1)
sorted_indices = np.argsort(magnitudes) # Sort in ascending order
sorted_vectors = sorted_vectors[sorted_indices]
# Flatten to a single vector and append the label
flattened_vector = sorted_vectors.flatten()
labeled_vector = np.append(flattened_vector, labels[i])
# Get the first entries of the vector
first_entries = labeled_vector[:1000] #1000 represents how many entries of the large vector we want to have
graph_vectors.append(first_entries)
return graph_vectors
# Use the generated dataset
labeled_graphs = create_graphs_from_images(dataset_images, dataset_labels)
# 'labeled_graphs' contains the flattened, sorted, labeled graph representation of each image
# Separate the features and labels
features = np.array([graph[:-1] for graph in labeled_graphs])
print(features)
labels = np.array([graph[-1] for graph in labeled_graphs])
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
# Create a KNN classifier instance
# 'n_neighbors' is the K value. You might need to experiment with this number to find the optimal value.
knn = KNeighborsClassifier(n_neighbors=3)
# Train the classifier
knn.fit(X_train, y_train)
# Make predictions
y_pred = knn.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {accuracy * 100}%")
How does the model perform? Well it actually performs exceptionally, classifying both the dot dataset and the tetris shape dataset with 100% accuracy. Interesting enough, the model only needs to be trained with about 100 cases to achieve this accuracy. Surprisingly, when I used the model to classify 0, 1 ,2 and 3 in the MNIST dataset, it did so with 86% accuracy with only 300 training cases. This all points to the concept being viable, yet it still requires tuning.
Most of the model/method is built using my intuition, so it's yet to be optimized. I have already thought of several ways of improving it, such as using a different ML model instead of KNN, as well as refining the graph vectorization process such that more information about the image’s structures are retained. Regardless, there is still untapped potential yet to be explored.