0
I’m doing a job using Pyspark and SQL
have that function:
selection = F.udf(lambda x: cumulative_sum._1 if 100*random.random()<= x, FloatType())
cumulative_sum.withColumn('Selection2', selection(cumulative_sum.cumSum)
My table calls cumulative_sum and my goal is to check if each column value cumSUm is less than a randomly generated number, if it is I would add the respective value of column _1 in the new column Selection2. However, when I run this code, I have this error:
SyntaxError: invalid syntax
How can I fix this?
I’m sure you can do that without an UDF, that would be more efficient. You can add an example dataset an input and one with the result you want?
– Dee