How to include columns in a Dataframe in Julia?

Asked

Viewed 186 times

5

I got the following DataFrame generic with 100 lines:

using DataFrames

df = DataFrame(X=LinRange(0.0,2π,100));

head(df)
6×1 DataFrame
│ Row │ X         │
│     │ Float64   │
├─────┼───────────┤
│ 1   │ 0.0       │
│ 2   │ 0.0634665 │
│ 3   │ 0.126933  │
│ 4   │ 0.1904    │
│ 5   │ 0.253866  │
│ 6   │ 0.317333  │

For example, calculate the sine of the values in the column X and insert into a new column SinX?

3 answers

5


To include the new column, simply assign the value(s) to that new column.

Example:

df[:SinX] = sin(df[:X])
head(df)

    X                   SinX
1   0.0                 0.0
2   0.06346651825433926 0.0634239196565645
3   0.12693303650867852 0.12659245357374926
4   0.19039955476301776 0.1892512443604102
5   0.25386607301735703 0.2511479871810792
6   0.3173325912716963  0.31203344569848707

Update: The form used to index the Dataframe and calculate the sine in the response, are obsolete from version >= 1.0.

The correct form of the command to insert the new column is:

df[:SinX] = sin.(df[:, :X])

The dot between the function sin and parentheses indicate that it is a vector operation, and the index [:X] indicates a search for all elements (:) column X.

Upshot:

head(df)
6×2 DataFrame
│ Row │ X         │ SinX      │
│     │ Float64   │ Float64   │
├─────┼───────────┼───────────┤
│ 1   │ 0.0       │ 0.0       │
│ 2   │ 0.0634665 │ 0.0634239 │
│ 3   │ 0.126933  │ 0.126592  │
│ 4   │ 0.1904    │ 0.189251  │
│ 5   │ 0.253866  │ 0.251148  │
│ 6   │ 0.317333  │ 0.312033  │

For more details, see the manual (in English): DataFrames.jl

2

I think using the function map is more efficient by the question of composition, so you do not need to go through the DataFrame twice, one inicilizing and the other assigning a functional value.

df[:error] = map((x,y) -> x-y , df[:A], df[:B])

although specifically in your case it leads only to more code, I believe it leads to a better understanding.

df[:SinX] = map((x) -> sin(x), df[:X])

That means that for every entry of df[:X] Save to a new entry by applying the function sin(x)

  • Direct attribution runs through the Dataarray df[:X] only once, in the same way as the map. Although the code generated by the two forms is very similar (@code_lowered()), direct allocation is more efficient than map (for the example of the question, about 790 times faster and 450 times lower memory consumption), as it avoids checks and type conversions (to view these problems: @code_warntype(map((x) -> sin(x), df[:X])))

  • in this case yes, because you are using only one variable, my idea was to give an option to those who have to map something derived from two fields.

-1

Versao 1.3.1

Example:

df[!, :SinX]  = sin.(df[:, :x])

Browser other questions tagged

You are not signed in. Login or sign up in order to post.