Load multiple Amazon S3 files

Asked

Viewed 52 times

0

I’m trying to load multiple files that are on Amazon S3, I’m using Panda, Anaconda, but I’m in error.

bg = s3.Bucket("bucket")
objects = bg.objects.filter(Prefix="bucket/")

for obj in s3.Bucket('bucket').objects.all():
    print (obj)

file_list = []
for obj in objects:
    df = pd.read_csv(f's3://bucket/{obj.key}')
    file_list.append(df)
    final_df = pd.concat(file_list)

Output:

s3.ObjectSummary(bucket_name='bucket', key='codigo_python.txt')
s3.ObjectSummary(bucket_name='bucket', key='codigo_python_V2.txt')

Error message

ValueError: No objects to concatenate
  • The value of Objects should not be s3.Bucket('bucket').objects.all() ??? That is to say objects = s3.Bucket('bucket').objects.all() and then for object in objects?

  • Changed and now what returns is Emptydataerror: No Columns to parse from file

1 answer

0

The following change should work.

bg = s3.Bucket("bucket")
objects = bg.objects.filter(Prefix="bucket/")
obj_summary = s3.Bucket('bucket').objects.all()

file_list = []
for obj in obj_summary:
    print (obj)
    file_object = s3.get_object(Bucket='bucket', Key='{obj.key}')
    df = pd.read_csv(file_object['Body'])
    file_list.append(df)

final_df = pd.concat(file_list)
  • This with the following error: Emptydataerror: No Columns to parse from file Already includes header=None, delim_whitespace=True,index_col=0) .

  • @user223488, on which line of code you receive this error?

  • No read_csv df = pd.read_csv(f'S3://Bucket/{obj.key}',header=None, delim_whitespace=True,index_col=0)

  • @user223488, I changed the code in the post pointing to the file_object. See what the result of this.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.