k – Tutorials for learning q/kdb+

Hamming distance

March 6, 2018September 5, 2018aa1024Leave a comment

In Information theory , the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

Source : Wikipedia Hamming distance

In KDB, finding the hamming distance of two strings would be really trivial.

q)hammDist:{count where x=y}
q)hammDist["karolin" ; "kathrin"]
3
q)hammDist[1011101b;1001001b]
2

K equivalent would be :

k)hammDist:{#&~x=y}

Though it would need some input checks like type check, string length comparison.

q)hammDist:{$[count[x]=count[y];count where  (),xy;'`mismatch]}
q)
q)hammDist["k" ; "k"]
0
q)hammDist["k" ; "ks"]
'mismatch
q)hammDist[1011101b;1001001b]
2

Welcome 2014 !!!

January 1, 2014February 2, 2014aa10242 Comments

I was code golfing on stack exchange and came across a simple but interesting problem. The task was to create a program which will print 2014 without using any numeric characters [0123456789] and also it should not depend on 2014 being the current year.

If the 2nd constraint was not placed, it could have been a trivial program in q, just calling .z.d and casting it to year.

q)`year$.z.d
2014i

Since it a was a code golf program, i tried multiple solutions to save characters.

Here are some of the solutions which i thought; you are most welcome to add if something more smaller comes in your mind.

q)/27 chars
q)k)+/"i"$"}}}}}}}}}}}}}}}JA"
2014

q)/21 chars
q)k)-[*/"i"$"..";"i"$"f"]
2014

q)/16 chars
q)k)(*/"i"$".,")-@""
2014

q)/17 chars
q)k)-(@.z.t)*"i"$"j"
2014

Mode

February 9, 2013March 28, 2013aa10241 Comment

The mode is the value that appears most often in a set of data.

Like the statistical mean and median, the mode is a way of expressing, in a single number, important information about a random variable or a population.

The mode is not necessarily unique, since the same maximum frequency may be attained at different values.

Source: Wikipedia

The mode of a sample is the element that occurs most often in the collection.

For example: Mode of the sample {3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29} is 23.

It is not necessary that the mode for a given sample is always unique. There could be multiple modes of the given dataset, if it have 2 modes then we call the dataset as bimodal and if it have more than 2 modes then we call the dataset as multimodal.

bimodal example : {1, 3, 3, 3, 4, 4, 6, 6, 6, 9} have 2 modes 3 & 6.

There is currently no built in function in q for finding the mode. Lets try writing it, a simple function could be :

q)mode1:{key[d]where max[c]=c:count each value d:group x}
q)mode1  1 2 3 2 4 5 3 5 6 4 3 2
2 3

The above function is simple but we are not utilizing the features which comes automatically with the dictionary datatype in q.

Here is the new definition, which will return all the modes of the input sample:

q)mode:{where max[c]=c:count each d:group x}
q)mode  1 2 3 2 4 5 3 5 6 4 3 2
2 3

Note that the count, max and where works in a different way in case of dictionary. Here “count each” is actually counting the values corresponding to each key and returns a dictionary, max looks up the maximum value of dictionary range and where returns the key for true dictionary range.

The k equivalent of the above mode function is

k)mode:{&:max[c]=c:#:'=:x}

Share this:

Share this:

Share this: