I am glad that I was not the only person to go through the need of SQL production through Portuguese-speaking lands.
As @Motta spoke in his confusing and unattractive response, you can use the fact that the product log is equal to the sum of the logs and then turn your productive for summation (with some adaptations that need to be cancelled later, of course).
So let’s dive into the land of logarithms?
This answer will be divided into a few sections, to make it easier to manage and to be able to segment the reading. They are:
- from products to sums and back again, with SQL
- Product log is the sum of the log, but why?
- say no to unnecessary power: multiplying
log(x)
by a constant
NOTE: as logarithms are only well defined with real picture for numbers > 0
, that response fails miserably if the dataset has at least one number in that situation, it will need to make an adaptation.
You can check on Sqlfiddle my answer to the data reported in the question. Noted that the answer to the case B
is different from the expected? Well, my answer is the true geometric mean, maybe it was a floating point error/accuracy when writing the expected result.
From products to sums and back again, with SQL
You have a problem that is the calculation of a product. An example of productive well
basic is the following, for those who are not used to:
What does that mean? Does that mean y = x0 * x1 * x2 ... * x10
. The product of the elements of the set, a counterpoint to the sum of the elements of the set.
Note how extremely similar it is to a sum:
In this case, the product is represented by the Greek letter pi, whose sound is reminiscent of P of Portuguese and denotes product. The sum is represented by the Greek letter sigma, whose sound is reminiscent of S of Portuguese and denotes summing up.
The sum, then, is the same idea of production but using the sum operator: y = x0 + x1 + x2 ... + x10
Assuming as truth that log(A*B) = log(A) + log(B)
, we have to log(x0 * x1 * x2 ... * xn) = log(x0) + log(x1 * x2 ... * xn) = log(x0) + log(x1) + log(x2) ... + log(xn)
. So, we can make a sum of the logarithms, however, to completely replace, we need to undo the logarithm of the play.
But how? Simple, by its definition.
In the definition of logarithm, we have to e^x = n
, with e
and n
known. The logarithm is the operation used to find the x
of the question, therefore e^x = n <==> x = log(n)
.
From that definition, we have to e^(log(x0 * x1 * x2 ... * xn)) = x0 * x1 * x2 ... * xn
. Precisely the definition of logarithm. But thanks to the property log(A*B) = log(A) + log(B)
, we can replace the product by sum without losing semantics: e^(log(x0) + log(x1) + log(x2) ... + log(xn)) = x0 * x1 * x2 ... * xn
.
How to represent this in SQL? Using the exponential sum of the logarithms of a given column:
SELECT
campo,
exp(sum(log(valor))) as produtorio
FROM
dados
GROUP BY
campo
Note that, as it is not necessary to explain the basis, I am not using nor LOG10
nor POWER(10, expoente)
, yet simple LOG
and EXP
Product log is the sum of the log, but why?
Be A
and B
any two positive real numbers. No other imposition on them is made. That means there is some a = log2(A)
and b = log2(B)
which are the logarithms of A
and B
respectively in base 2.
If I ever multiply A*B
I’ll get a number C
, also real and positive. How a = log2(A)
, that means that A = 2^a
. Similarly we have B = 2^b
. Soon, A*B = C = 2^a * 2^b
. We can simplify the power of 2 like this: C = 2^(a+b)
.
And what if I took the log2
of everything?
Well, knowing that C = 2^(a+b)
, we have to log2(C) = a + b
. But we must not forget that C = A*B
. Soon, log2(A*B) = a + b
. Replacing the initial formulas of a
and b
:
log2(A*B) = log2(A) + log2(B)
The calculations were made on base 2 by illustration, but could be any basis, even φ.
Say no to unnecessary power: multiplying log(x) by a constant
But you don’t just want the product, you want the geometric average. The product was just a means to reach the geometric average. The formula for the geometric mean of a set is given by the following formula:
That is, it is the exponentiation of the production by the inverse of the quantity of elements in the production.
If you’re going to transport exactly that to SQL it looks something like this:
SELECT
campo,
power(exp(sum(log(valor))), 1/count(*)) as media_geo
FROM
dados
GROUP BY
campo
But it gets weirder to exponent twice in a row, don’t you think? Besides appearing a lot of computational demand for nothing? And, yes, you’re right: there’s work being done unnecessarily here.
It has a very important property of exponentiation that we’re going to use it now: it’s distributive about multiplication:
And there’s another cool property of logarithms: log(a^b) = b * log(a)
. From here, we can quietly arrive at this transformation:
In SQL, it would look like this:
SELECT
campo,
exp(sum(log(valor))/count(*)) as media_geo
FROM
dados
GROUP BY
campo
Using these formulas I get the following error, "Invalid floating point operation occurred."
– henriquew
I already got this error when I passed negative numbers or zeroes to the log. I will test more calmly the query against the database provided in your question.
– Jefferson Quesado
The reply of Quesado was 10 , I will keep mine as a reminder , never respond from the tablet laziness makes you edit badly and dirty .. this solution via logarithm and potentiation has the problem of negative or very large values.
– Motta
In the values provided by the question worked. The problem may be the presence of null or negative numbers, for which the log is not defined in real. By the way, the geometric mean of the case
B
is 12.5489..., as I found in my consultation– Jefferson Quesado
@Motta, I got used to editing on the smartphone... I think the tablet is even easier. Obviously this answer here was not written from the smartphone, but from the notebook. In compensation, this one here was just smatphone, this other I used the smartphone a lot, but I can’t remember if it was just to create the images or if it was for everything
– Jefferson Quesado
@henriquew what data set used?
– Jefferson Quesado
Good afternoon, my error was the following, within the various records that had, havian numbers of type (0.1 , 0.8 , 0.59), in "fórumula Exp(sum(log(value))/Count(*)) as media_geo" then occurred the error "An invalid floating point operation occurred." Thank you all!
– henriquew
@henriquew had no problems with this data in the formula. I put these 3 values. I tested with 15 values of this suit and got no problems
– Jefferson Quesado
Try the following values (167.00 68.00 1876.00 225.00 2732.00 3754.00 9999.00 231.00 340.00 946.00 163.00 3348.00 819.00 1426.00 148.00 618.00 5721.00 1753.00&Xa;1978.00&Xa;3828.00 1104.00) or (0.57 0.88 1.63 1.38 0.32 0.57 1.32 1.38 2.22 0.79 3.85 0.46 0.63 0.86 0.88 0.86 2.19 1.47 0.32 0.82 2.40) in the formula "Exp(sum(sum(ANSDET.VALUE)/Count(*)) as VALUE"
– henriquew
Average of 882.011621537022, and average of 0.9984505031117724. I used the grouping
D
for the first set of numbers, andE
for the second set– Jefferson Quesado
My results were not perfect, but I did the best I could in the operations that SQL Server provides. Each floating point calculation implies losing precision 1 2 3.
– Jefferson Quesado
See the result for
D
, and toE
. Note that I only lost accuracy inD
, inE
was a satisfactory response– Jefferson Quesado