Swap String Values from a Dataset to'Float' values

Question

Swap String Values from a Dataset to'Float' values

Asked 6 years, 7 months ago

Viewed 528 times

0

Good afternoon guys!!! I have a little problem to perform a work of a college subject. I’m picking up a ready dataset from another article that was held.

Dataset goes something like this:

3,24.3,389693,21,23,tcp,1540,-------,4,11339,16091,24780100,Switch1,Router,35.529786,35.529786,35.539909,0,328.240918,505490,1540,0.236321,0,35.519662,35.550032,1,50.02192,Normal
15,24.15,201196,23,24,tcp,1540,-------,16,6274,16092,24781700,Router,server1,20.176725,20.176725,20.186848,0,328.205808,505437,1540,0.236337,0,20.156478,20.186848,1,50.030211,Normal
24.15,15,61905,23,22,ack,55,-------,16,1930,16092,885060,Router,Switch2,7.049955,7.049955,7.059958,0,328.206042,18051.3,55,0.008441,0,7.039952,7.069962,1.030045,50.060221,UDP-Flood
24.9,9,443135,23,21,ack,55,-------,10,12670,16085,884675,Router,Switch1,39.62797,39.62797,39.637973,0,328.064183,18043.5,55,0.008437,0,39.617967,39.647976,1.030058,50.060098,Normal
24.8,8,157335,23,21,ack,55,-------,9,4901,16088,884840,Router,Switch1,16.039806,16.039806,16.04981,0,328.113525,18046.2,55,0.008438,0,16.029803,16.059813,1.030054,50.061864,Normal
24.1,1,219350,21,1,ack,55,-------,2,6837,16091,885005,Switch1,clien-1,21.885768,21.885768,21.895771,0,328.297902,18056.4,55,0.00844,0,21.865762,21.895771,1.030016,50.043427,Normal
24.13,13,480053,24,23,ack,55,-------,14,13609,16103,885665,server1,Router,42.45032,42.45032,42.460323,0,328.460278,18065.3,55,0.008446,0,42.45032,42.48033,1.030032,50.055747,Normal

It’s a dataset they made available about Ddos attacks. I will from this dataset perform the application of supervised classifiers such as Naivebayes, Randomforest and Multi Layer Perceptron (Artificial Intelligence).

The language I’m using is Python (Required) and I’m using Numpy to get the dataset. This function looks like this:

np.set_printoptions(formatter={'float': lambda x: "{0:0.10f}".format(x)}) 
X = np.loadtxt("datasetTrabalho.data", delimiter=",")

But every time I try to do something, it makes mistakes like that:

File "trabalho.py", line 190, in <module>
    main()
  File "trabalho.py", line 98, in main
    X = np.loadtxt("testeTrabalho.data", delimiter=",") # pega o dataset
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1101, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 746, in floatconv
    return float(x)
ValueError: could not convert string to float: 'tcp'

I need a means help to change these Dataset Strings values to Integer values, to use the appropriate classifiers for the job. Interesting if someone also has another library to solve this problem. I will be grateful for the help.

2 answers

1

To convert the data type into a Dataframe column (if using Pandas), you can run the command:

DF['NomeDaColuna'] = DF['NomeDaColuna'].astype(float)   # converte para float, neste caso

How you’ll be converting strings in floats, make sure that the strings have the format 'x.y', where x and y are numbers (also works without the decimal part '.y')

Solved, thank you very much!!!

– Arthur Abitante

2019/06/17 at 14:07
@Arthurabitante how good!! :)

– Leonardo Bohac

2019/06/17 at 14:11

Browser other questions tagged python

You are not signed in. Login or sign up in order to post.

by Marlysson • **905** points · Answer 1 · 2018-12-06T22:05:51+00:00

And the error is that numpy is trying to convert numbers to float as you defined but there is a string "tcp" in the first line, with this causing the exception.

ALTERNATIVE

You could use the library pandas, where is a data processing lib.

With it you would read this dataset quietly. I suggest using along with the jupyter.