💡

We know there are several approaches to solve a supervised learning problem and Least Squares Classifier is one of them. Here we focus on a classifier by minimizing the square loss given as

\begin{equation}

\ell_{ls}(y, f(x)) = (y - f(x))^2

\end{equation}

The training set in the supervised setting is assumed to be composed of \(N\) labeled points \( \{ x_i, y_i \}_{i = 1}^N\), where \(x_i \in \mathbb{R}^D\) and \(y \in {0,1}^N\) for any \(i = 1,\dots,N\). We consider the class of linear classifiers with intercept, that is

\begin{equation}

f(x) = w_s^Tx + w_0

\end{equation}

where \(w_s \in \mathbb{R}^D\), \(w_0 \in \mathbb{R}\). In order to lighten the notation, the intercept can be embedded in the slope coefficient as \(w = (w_s^T, w_0)^T\). The feature vectors are also modified as \(x_i = (x_i^T, 1)^T\), with a clear abuse of notation. Therefore, the dimensionality of \(w\) and of each \(x_i\) becomes \((D + 1)\). It is quite useful to express the problem in matrix notation, thus we call \(X \in \mathbb{R}^{N \times (D+1)}\) the design matrix that has \(x_i^T\) at the i-th row. Similarly, \(y_i \in {0,1}\) is the vector of labels.

The least squares classifier is obtained by minimizing the following empirical risk:

\begin{equation}

\begin{gathered}

R(w|X, y) =\sum_{i = 1}^N \ell(y_i,, w^Tx_i)\\

\hspace{2.55cm}= \sum_{i = 1}^N(y_i - x_i^Tw)^2\\

\hspace{1.85cm}= \parallel y - Xw \parallel^2,

\end{gathered}

\end{equation}

that is the sum of the squared errors. Then, the optimal vector \(w\) is

\begin{equation}

\hat{w}_{sup} = \underset{w \in \mathbb{R}^{D + 1}}{\operatorname{argmin}} R(w|X, y)

\end{equation}

By convexity of the objective function, we can compute \(\hat{w}_{sup}\) by setting the gradient of \(R(w|X , y)\) to 0. The resulting solution is the following

\begin{equation}

\begin{gathered}

\hat{w}_{sup} = \left(\sum_{i = 1}^N x_ix_i^T\right)^{-1} \sum_{i = 1}^N y_ix_i\\

\hspace{-0.8cm}= \left(X^TX\right)^{-1}X^Ty

\end{gathered}

\end{equation}

The matrix \( X^TX = \sum_{i = 1}^N x_ix_i^T\) is usually referred to as the sample correlation matrix, while \(X^\dagger = \left(X^TX\right)^{-1}X^T\) is the pseudo-inverse of \(X\). Clearly, this notation for the pseudo-inverse is meaningful if \( X^TX\) is invertible, which is not the case for instance if \(d > N\). In case invertibility does not hold, the optimal vector of parameters can be computed either using the singular value decomposition to compute the pseudo-inverse \(X^\dagger \) or adding a regularization term in the loss function. In particular, the most common choices of regularization are the Lasso and Ridge, which consist in adding respectively the \(L_1\) or \(L_2\) norm of \(w\).

And that was a brief mathematical introduction to the least squares classifier. If you're just starting to learn data science, linear algebra would be a good place to start. Check out the article linked below for an introduction to the system of linear equations and if you have any suggestions you can post them in comments below.

]]>Linear algebra plays a very important role in machine learning and general mathematics and linear equations have always been a central part of linear algebra. A lot of problems in general, can be formulated as some system of linear equations and it is important to know the tools which linear algrebra provides in order to solve those problems.

In this article, we will take a brief look at the systems of linear equations and some of their properties. Before we do that, let's take a look at how a general form of such a system looks like:

\begin{equation}

\begin{gathered}

a_{11}x_1 + \dots + a_{1n}x_n = b_1 \\

\vdots\\

a_{m1}x_1 + \dots + a_{mn}x_n = b_m

\end{gathered}\label{1}

\end{equation}

where \(a_{ij} \in \mathbb{R}\) and \(b_i \in \mathbb{R}\). In the above *general form of system of linear equations*, \(x_1, \dots, x_n\) are the *unknowns* of this system and every tuple \( \left(x_1, \dots, x_n\right) \in \mathbb{R}^n \) that satisfies the equation \(\left(\ref{1}\right)\) is a solution to the linear equation system.

We need to know how many solutions exist for a given set of linear equations. In order to find out, let us consider a set of linear equations:

\begin{align}

x_1 + x_2 + \hspace{0.2cm}x_3 = 3\label{2} \\

x_1 - x_2 + 2x_3 = 2\label{3} \\

2x_1 \qquad+ 3x_3 = 1\label{4}

\end{align}

We can say that the above system of linear equations has *no solution* because when we add first two equations \(\left(\ref{2}\right)\) and \(\left(\ref{3}\right)\), we get \(2x_1 + 3x_3 = 5\), which contradicts the equation \(\left(\ref{4}\right)\).

We can try looking at a different set of linear equations:

\begin{align}

x_1 + x_2 + \hspace{0.3cm}x_3 = 3\label{5} \\

x_1 - x_2 + 2x_3 = 2\label{6} \\

\qquad x_2 + \hspace{0.3cm}x_3 = 2\label{7}

\end{align}

Here, we can add the equations \(\left(\ref{5}\right)\) and \(\left(\ref{7}\right)\), which will yield \(x_1 = 1\). After adding equations \(\left(\ref{5}\right)\) and \(\left(\ref{6}\right)\), we get \(2x_1 + 3x_3 = 5\), which means \(x_3 = 1\) (as we plugged in \(x_1 = 1\) in the equation here. And finally, we can plug in the value of \(x_3\) in equation \(\left(\ref{7}\right)\) to yield out \(x_2 = 2\).

Therefore, \(\left(1,1,1\right)\) is the only possible and *unique solution* here and we can verify it by plugging the values of \(\left(x_1, x_2, x_3\right)\) to be \(\left(1,1,1\right)\) respectively.

Let us now consider a third example. Look at the following set of linear equations:

\begin{align}

x_1 + x_2 + \hspace{0.2cm}x_3 = 3\label{8} \\

x_1 - x_2 + 2x_3 = 2\label{9} \\

2x_1 \qquad+ 3x_3 = 5\label{10}

\end{align}

Here, we can immediately notice that equations \(\left(\ref{8}\right)\) and \(\left(\ref{9}\right)\) add to give us equation \(\left(\ref{10}\right)\). Therefore, we can omit the third equation (redundancy). Now, from equations \(\left(\ref{8}\right)\) and \(\left(\ref{9}\right)\), we get \(2x_1 = 5 - 3x_3\) and \(2x_2 = 1 + x_3\) (We added both the equations to get the first one and subtracted equation \(\left(\ref{9}\right)\) from \(\left(\ref{8}\right)\) to get the second equation).

Now, we define \(x_3 = a \in \mathbb{R}\) as a free variable, such that any triplet

\[ \left( \frac{5}{2} - \frac{3}{2}a, \frac{1}{2} + \frac{1}{2}a, a \right), \hspace{1cm}a \in \mathbb{R} \]

is a solution of the system of linear equations, i.e., we obtain a solution set that contains *infinitely many *solutions.

So, we have found out that, in general, for a real-valued system of linear equations we obtain either zero, exactly one, or infinitely many solutions.

In a system of linear equations with two variables \(x_1 , x_2\) , each linear equation defines a line on the \(x_1x_2\)-plane. Since a solution to a system of linear equations must satisfy all equations simultaneously, the solution set is the intersection of these lines. This intersection set can be a line (if the linear equations describe the same line), a point, or empty (when the lines are parallel).

Now, look at the following system of linear equations:

\begin{align}

4x_1 + 4x_2 = 5\label{11} \\

2x_1 - 4x_2 = 1\label{12}

\end{align}

An illustration for the above system is provided below.

Where the solution space is the point \( \left( x_1. x_2 \right) = \left( 1, \frac{1}{4} \right) \) as I have indicated in the illustration above. Similarly, for three variables, each linear equation determines a plane in three-dimensional space. When we intersect these planes, i.e., satisfy all linear equations at the same time, we can obtain a solution set that is a plane, a line, a point or empty (when the planes have no common intersection).

For a systematic approach to solving systems of linear equations, we will introduce a useful compact notation. We collect the coefficients \(a_{ij}\) into vectors and collect the vectors into matrices. In other words, we write the system from \(\left(\ref{1}\right)\) in the following form:

\[ \begin{bmatrix}

a_{11} \\

\vdots \\

a_{m1}

\end{bmatrix} x_1 +

\begin{bmatrix}

a_{12} \\

\vdots \\

a_{m2}

\end{bmatrix}x_2 + \dots +

\begin{bmatrix}

a_{1n} \\

\vdots \\

a_{mn}

\end{bmatrix}x_n =

\begin{bmatrix}

b_{1} \\

\vdots \\

b_{m}

\end{bmatrix} \]

\[

\Longleftrightarrow \begin{bmatrix} a_{11}& \dots & a_{1n} \\

\vdots & & \vdots \\

a_{m1} & & a_{mn} \\

\end{bmatrix}

\begin{bmatrix}

x_1 \\

\vdots \\

x_n

\end{bmatrix}

=

\begin{bmatrix}b_{1} \\\vdots \\b_{m}\end{bmatrix}

\]

In order to understand how to solve these linear equations, we need to have a close look at these *matrices *and define computation rules – which we will do in a later article. I hope you liked an introduction to linear algebra with the system of linear equations. Please feel free to leave any feedback in the comment section below and consider subscribing for newsletter updates.

In my last article, I tried to give a beginner friendly explanation of the * Big O Notation* but we skipped out on a lot of stuff. So I am going to talk about

One thing to keep in mind is that Big O is just a notation and can be used to denote a variety of things but we will stick to the worst case running time only in this article.

As you already know, we are focusing on the worst-case running times but even talking about worst-case running time can be tricky if we are trying to be precise. We have to identify the *worst case input* in detail, which can be counter-intuitive and then work through a lot of different cases. Also, if you followed the last article, we were counting the single operations or steps of different algorithms, which might get annoying as the algorithm becomes more and more complex.

As we have discussed earlier too, that we drop the constants for the Big O notation (again, read the last article) — so, if an algorithm has a running time of \(2n^2 + 23\) operations and another algorithm has a running time of \(2n^2 + 27\) operations (both for an input of size n), we wouldn't really care about the 23 or 27 — we would say that essentially they have the same running time.

If you analyze the math correctly, you'll notice a big flaw with what I just said about dropping the constants. According to the statement I made above, an algorithm with a running time of \(10000n\) and another algorithm with a running time of n, would have the same running time, according to the Big O notation, i.e. \(O(n)\). But, any first grader can argue that, a function which takes \(10000n\) time units to run is way slower than a function which takes n time units to run.

The fact that the functions are both \(O(n)\) doesn't change the fact that they don't run in the same amount of time, since that's not what Big O notation is designed for. Big O notation only describes the growth rate of algorithms in terms of mathematical function, rather than the actual running time of algorithms on some machine.

Mathematically speaking, let us assume two functions \(f(x)\) and \(g(x)\) be positive for \(x\) sufficiently large. We say that \(f(x)\) and \(g(x)\) grow at the same rate as \(x\) tends to infinity, if

$$ \displaystyle \lim_{x \to \infty} \frac{f(x)}{g(x)} = M \neq 0 \text{ (M is a finite non-zero number)}. $$

Now, let \(f(x) = x^2\) and \(g(x) = \frac{x^2}{2}\) then while \(\lim_{x\to\infty} \frac{f(x)}{g(x)}\) is equal to 2. Therefore, \(x^2\) and \(\frac{x^2}{2}\) have the same growth rate, and hence we can say \(O(x^2)\) is equal to \(O(x^2/2)\).

Yes I get you mate — you're tired of listening that running of this algorithm of this much and running time of that algorithm is that much, but you never got to know, how the heck they are even being calculated! So, in order to make you understand and at the same time keeping this very simple, we will go back to counting. We will count time units for every operation performed and we will stick to 3 small rules:

'**simple**' operations (*such as adding two numbers, multiplying them or executing an if or for statement*) take

loops count as often they run (*for example, if you have a loop that executes a simple operation 100 times, that would count as 100 time units*)

memory access is free, i.e. *0 time units* (*example: reading a part of input or writing something to memory cell*)

In order to understand the above steps, let's look at some examples. Look at the lines of code below and try to guess how many time units would this code chunk takes to execute:

```
a = 1
b = 2 * a
```

So, the answer here is that the machine is going to take just a single time unit (one time unit) to execute these two lines of code because line `a = 1`

is just a memory access and we get memory access for free. The second line performs a simple operation, which takes 1 time unit.

Now, let's try a little more challenging example. It's not really challenging but a little more difficult than the first one. Look at the code below and try to find the number of time units and note that the comparison `while s > 0`

counts as a simple operation:

```
s = 5
while s > 0:
s = s -1
```

If your answer is 11 time units then you're right! The first line `s = 5`

is free as it is a memory access. Then the second line is executed 6 times as **s** starts at 5 then it becomes 4, 3, 2, 1 and 0 which takes 6 time units. Finally, the last line is executed 5 times which takes 5 time units – bringing the total to 11 time units.

So, as you can see, exactly counting the number of time units even though we have a very simple model of just 3 rules and a code that doesn't even have a variable input is already quite tedious. That's why we introduce some additional simplifications (*Spoiler: Approaching the Big O!*) that gives us a little bit levy here so that we don't have to through this exact counting process but still learn about the algorithm.

Now, you already know that the * running time of an algorithm varies according to the size of the input (n)* but the

`s[0] to s[n-1]`

i.e., a string of length n and it counts the number of times the character 'a' appears in the string by looping over the string and if it finds the character 'a', it would increase the counter. As discussed earlier, we are going to take the `for character in s:`

line as a simple operation (takes `if character == 'a'`

as a simple operation. Now, for the last time, I want you to count the number of time units taken by the code to execute. (```
INPUT: String s[0]...s[n-1] // string of length n
```

```
count = 0
for character in s:
if character == 'a':
count = count + 1
```

There's actually two correct answers here depending upon how you think about it. Both answers have \(2n + 1a\) in common and depending upon how you count the `for character in s:`

line, the answer would be either \(2n + 1a + 0\) or \(2n + 1a + 1\). So, let's discuss, why? The first line `count = 0`

obviously takes 0 time units. The second line either takes **n time units** if you assume that the for loop goes exactly through each character of the string and then stops immediately or it takes **n + 1 time units** if you assume that it executed like a regular for or while loop. This again shows that it can be very annoying to do exact time counting. The next step though, is always executed **n times**. And finally, the counter is always increased when the algorithm encounters an 'a', so that is executed exactly **a times**. If you sum up, you get either \(2n + 1a + 0\) or \(2n + 1a + 1\) as your answer.

Let's assume for now, that the running time of the above algorithm is \(2n + 1a + 1\). As you've noticed, even for this simple algorithm, the running time depends on both the size of the inputs and the structure of the input, which in this case, is the number of times the letter 'a' occurs. This is of course very problematic because on the one hand when we get more complicated algorithms the formula (*here: *\(2n + 1a + 1\)) is going get very complicated and secondly, we don't even know what kinds of strings the algorithm will encounter, so we cannot get rid of this variable **a** without making any assumptions.

Based upon all this information, we can take 3 kinds of views with regard to our running time:

Optimistic View - **Best Case Scenario** - *when we will not encounter any number of the letter 'a' in our input, i.e. the running time becomes *\(2n + 1\)*.*

Average View - **Average Case Scenario** - *Gosh, how do we even define an average input?*

Pessimistic View - **Worst Case Scenario** - *when all the characters of the string s are 'a' - that means we have n number of a's (or a = n) i.e. the running time becomes *\(2n + 1n + 1\)* or *\(3n + 1\)*.*

So, which one of the three are we going to choose? Best case running time is often rather trivial or meaningless. For example, if we use best case running time for our algorithm that counts the number of a's, it would only be valid for strings that contain no 'a' at all – we can't take this. The average case view, could be very interesting view and practice because we run the algorithm a couple of times, we might not care about how much that algorithm runs in a single run but over many inputs. But, as I told you earlier, its hard to define what an average input looks like so as interesting the average case looks like, it is not suitable here. We will **always assume that the algorithm receives the input that makes it run as long as possible** because it offers *guarantees* — by taking a worst-case view, we know that our algorithm would not run longer than what the worst-case analysis suggests, no matter what happens!

Let's consider two different algorithms, Algorithm A with a running time of \(3n^2 - n + 10\) and Algorithm B with running time of \(2^n - 50n + 256\). So, when you compare both of these algorithms, you can easily overlook the numbers 10 and 256 because those are just constants. Also we don't really care about the \(-n\) or \(-50n\) because we just look at the growth of \(2^n\) vs \(3n^2\). And even 3 is not relevant here in \(3n^2\) because even if we had \(5n^2\) or \(6n^2\) or even \(100n^2\), \(2^n\) would still grow much faster (*Why? Get a basic mathematics book and find out!*).

So, what did we just do when we determined that the running time of Algorithm B grows much faster than the running time of Algorithm A? Well, first of all, we said that there was some value of n (*some value for the size of the input*) where the Algorithm B's running time function is always larger than Algorithm A's running time function. So, considering the running time of Algorithm A is some function \(f(n)\) and the running time of Algorithm B is some function \(g(n)\) and if \(g(n)\) grows faster than \(f(n)\) then there must be some value of n for which \(g(n)\) is larger than \(f(n)\) and for any value larger than that, the same should also be true \((g(n) > f(n))\) — Let's call that value \(n'\). Based on all of this, we can conclude that:

There is some number \(n'\), so that \(g(n') \geq f(n')\)

For any \(n'' > n'\), we have \(g(n'') \geq f(n'')\)

Now, I also said that we do not want to care about constants — so we do not want to care about if \(f(n)\) starts with \(3n^2\) or \(5n^2\) — we would just say that the function \(f(n)\) basically grows depending upon \(n^2\). So, in order to do that we need another number, a constant \(c\) to be multiplied to \(g(n)\) that would allow to scale the function \(g(n)\) – basically speaking, if we can multiply the function \(g(n)\) with some number so that it outgrows \(f(n)\) then we would still be satisfied. Then we would say that \(g(n)\) grows at least as fast as \(f(n)\). So the first statement can be restated as:

*There is some number \(n'c\), so that \(c\cdot g(n') \geq f(n')\)*

It can also be said that, \(f(n)\) is contained in Big O of \(g(n)\) and is represented as \(f(n) \in O(g(n))\). So, **Big O means that \(g(n)\) is a function that grows at least as fast as \(f(n)\)**.

Big O notation is very useful because we can use it to concentrate just on the fastest growing part of a function and even without all the involved constants. Now let's focus on some example, so that you learn to recognize these bounds correctly.

\(3n + 1 \in O(n)\) — *obvious, meh* 🤷🏻♂️

\(18n^2 - 50 \in O(n^2)\) — *meh again* 🤷🏻♂️

\(2^n + 30n^6 + 123 \in O(2^n)\) — *as \(2^n)\) grows faster than \(n^6\)*

\(2^n \cdot n^2 + 30n^6 + 123 \in O(2^n \cdot n^2)\) — \(2^n \cdot n^2\)* being the fastest growing function*

FUN TRIVIA: Big O notation is also sometimes called theLandau Notation, named after the German mathematician,— he didn't actually invent the notation. It was done by another German mathematician calledEdmund Landau, but he popularized it in the 20th century. In a way, this is quite ironic because Edmund Landau was actually known as one of the most exact and pedantic mathematicians. His goal was uncompromisingly rigorous. Even in his lectures, he used to have an assistant who was instructed to interrupt him if he even omitted the slightest detail. It is interesting that he introduced a notation that omits all the details.Paul Bachman

So, now that you've learnt the basics about Big O and Worst Case Analysis, let's take a look at another example and try to calculate the running time. But this time, we will use the Big O! So, this time we are using a similar example as earlier but instead of counting number of times the letter 'a' appears in the string, we are counting the number of times the sequence 'ab' appears in the string.

```
INPUT: String s[0]...s[n-1] // string of length n
```

```
count = 0
for i in range(n - 1):
if s[i] == 'a':
if s[i+1] == 'b':
count = count + 1
```

As you remember, this was painful to do as we had to manually count everything. But now that we have the Big O notation available, our task is much easier because we can just make two observations and will be able to state the running time. So the first observation you make is that the algorithm will actually go through the string one by one, and since it always looks at a single character and the next character, **the algorithm will look at each character in the string at most twice**. And the second thing to notice is that each time the algorithm does consider a character, it will perform a constant number of operations — so if it finds an 'a' it would either do one or two operations and if doesn't find an 'a', it would do zero operations — and this is an advantage, because we can ignore the constants while using the Big O notation. So overall, this means if you have an input of length n, the algorithm will perform a number of steps that is some constant times n plus some constant for all the rest of the operations, i.e., \(c_1 \cdot n + c_2\) which indeed \(\in O(n)\).

Let's consider one more example and try to find out the running time of this code:

```
result = 0
for i in range (0, n):
for j in range(i, n):
result = result + j
```

And the answer for this is \(O(n^2)\). So let's discuss, why! The first line is a memory access, so its free. As discussed earlier, second line is going to be executed either n or n + 1 times – that doesn't make any difference as we are using the Big O now, so the second line is executed *n times*. Now, we need to find out how often the inner loop is executed. So, the first time, it's actually executed n times, then the next time it is going to be executed n-1 times and so on and so forth because as the value of **i** increases the inner loop is executed less and less times. So, the total times the inner loop is executed is `n + (n-1) + (n-2) + ... + 2 + 1`

which is equal to \( \frac{n^2 + n}{2} \) . And again, we can ignore the 2 in the denominator making it equal to \( n^2 + n \). So the complete total is \(n^2 + 2n \in O(n^2)\).

That was a long article — I hope you didn't get bored reading that. Well, once again, this is point where I beg you for following the blog. If you found this information helpful, share it with your friends and if you have any feedback, please leave them in the comment section below and consider subscribing for newsletter updates.

]]>Data Structures and Algorithms is about solving problems efficiently. A bad programmer solves their problems inefficiently and a really bad programmer doesn't even know why their solution is inefficient. So, the question is, *How do you rank an algorithm's efficiency?*

If you want to learn the math involved with the Big O, read Analysing Algorithms: Worst Case Running Time.

The simple answer to that question is the **Big O Notation**. How does that work? Let me explain!

Say you wrote a function which goes through every number in a list and adds it to a *total_sum* variable.

```
# Function 1
def find_sum(number_list):
total_sum = 0
for num in number_list:
total_sum += num
return total_sum
```

If you consider "addition" to be 1 operation then running this function on a list of 10 numbers will cost 10 operations, running it on a list of 20 numbers costs 20 operations and similarly running it on a list of n numbers costs the *length of list* (n) operations.

Now let's assume you wrote another function that would return the first number in a list.

```
# Function 2
def first_sum(number_list):
return number_list[0]
```

Now, no matter how large this list is, this function will never cost more than one operation. Fairly, these two algorithms have different **time complexity** or *relationship between growth of input size and growth of operations executed*. We communicate these time complexities using * Big O Notation*.

Big O Notation is a mathematical notation used to classify algorithms according to how their run time or space requirements grow as the input size grows.

Referring to the complexities as *' n'*, common complexities (ranked from good to bad) are:

- Constant -
**O(1)** - Logarithmic
**O(log n)** - Linear -
**O(n)** - n log n -
**O(n log n)** - Quadratic -
**O(n²)** - Exponential -
**O(2ⁿ)** - Factorial -
**O(n!)**

Our first algorithm runs in *O(n)*, meaning its operations grew in a linear relationship with the input size - in this case, the amount of numbers in the list. Our second algorithm is not dependent on the input size at all - so it runs in constant time.

Let's take a look at how many operations a program has to execute in function with an input size of *n = 5 vs n = 50*.

Big O Of | n = 5 | n = 50 |
---|---|---|

O(1) |
1 | 1 |

O(log n) |
4 | 6 |

O(n) |
5 | 50 |

O(n log n) |
20 | 300 |

O(n²) |
25 | 2500 |

O(2ⁿ) |
32 | 1125899906842624 |

O(n!) |
120 | 3.0414093e+64 |

It might not matter when the input is small, but this gap gets very dramatic as the input size increases.

If n were 10000, a function that runs in *log(n)* would only take 14 operations and a function that runs in *n!* would set your computer on fire!

For Big O Notation, we *drop constants* so *O(10.n)* and *O(n/10)* are both equivalent to *O(n)* because the graph is still linear.

Big O Notation is also used for **space complexity**, which works the same way - *how much space an algorithm uses as n grows* or *relationship between growth of input size and growth of space needed*.

So, yeah! This has been the simplest possible explanation of the Big O Notation from my side and I hope you enjoyed reading this. If you found this information helpful, please share it with your friends. Also, feel free to leave any feedback in the comment section below and consider subscribing for newsletter updates.

]]>