Adding points in a known data range

Asked

Viewed 135 times

0

I am working on data analysis using Python and for this I am training SVC and K-Means algorithms. The data used for the training have a fixed spacing between each sample, because they are sampled by an oscilloscope in fixed periods of time, on the other hand, I have data obtained also by simulation, which for performance issues, have varied spacing between samples and a smaller number of points, which makes it difficult to use these two data sources in the same analysis. There is a method using numpy or pandas to perform this preprocessing of my simulation data?

An Ex of what must be done:

Simulation array = [ 1.0, 2.0, 3.0]

Processed array = [ 1.0, 1.5, 2.0, 2.5, 3.0]

  • Wouldn’t doing this in simulated data change the outcome? Depending on how these data were used, entering intermediate values could generate false results in your analysis.

  • The points to be added must make sense, something like a value obtained through interpolation, would only add more steps in the silhouette that already exists. I do not know deeply the pandas or numpy, do not know if there is such a possibility.

2 answers

1


I fully agree with the commenting about changing the data you will give as input. However, I know that we do not always have the data as we need. It would be more right maybe to make one downsampling of one of the data, reducing errors due to extra data due to interpolation.

Once given the Warning, What you want is an interpolation!

To do this, use the function interp numpy

import numpy as np
x = np.linspace(0, 2*np.pi, 10)
y = np.sin(x)
xvals = np.linspace(0, 2*np.pi, 50)
yinterp = np.interp(xvals, x, y)

This example is the same as in the manual, which explains tb boundary conditions, which may be relevant as you are applying.

np.interp also works for downsampling.

  • Thank you so much for your help!

1

Following your tip, I read the documentation of Numpy and Scipy, and using numpy itself that has a function called "interp" (as in the documentation above), but my preference is for the package "scipy" that has several forms of interpolation, as in the following example:

First I import numpy and scipy

import numpy as np from scipy import Interpolate I create the data now:

dados_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] dados_y = [10.0, 20.0, 30.0, 40.0, 50.0, 70.0, 90.0, 100.0, 200.0, 300.0] I now interpolate:

interp = Interpolate.interp1d(data_x, data_y) bear in mind that now the interp variable contains an object that is able to interpolate

Finally, step to interpolation new data:

new_x = np.arange(1, 10, 0.1) new_y = interp(new_x) Now, the new_y variable contains a numpy array, like this in the example:

array([ 10. , 11. , 12. , 13. , 14. , 15. , 16. , 17. , 18. , 19. , 20. , 21. , 22. , 23. , 24. , 25. , 26. , 27. , 28. , 29. , 30. , 31. , 32. , 33. , 34. , 35. , 36. , 37. , 38. , 39. , 40. , 41. , 42. , 43. , 44. , 45. , 46. , 47. , 48. , 49. , 50. , 52. , 54. , 56. , 58. , 60. , 62. , 64. , 66. 68. , 70. , 72. , 74. , 76. , 78. , 80. , 82. , 84. , 86. , 88. , 90. , 91. , 92. , 93. , 94. , 95. , 96. , 97. , 98. , 99. , 100. , 110. , 120. , 130. , 140. , 150. , 160. , 170. , 180. , 190. , 200. , 210. , 220. , 230. , 240. , 250. , 260. , 270. , 280. , 290.])

Also, if you need to interpolate values in 1D, 2D, 3D, take a look at Ocds:

Function in numpy: https://docs.scipy.org/doc/numpy/reference/generated/numpy.interp.html

Function in the scipy: https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.interpolate.interp1d.html

All interpolations: https://docs.scipy.org/doc/scipy-0.19.1/reference/interpolate.html

Thank you so much for ajdua

Browser other questions tagged

You are not signed in. Login or sign up in order to post.