0
Good afternoon, I am currently developing an Artificial Intelligence project. In the current phase of my project, I have my neural networks implemented and I am in the phase of training the neural network. Initially, I started by training the network on my computer, but in the meantime I got access to a server that allows me to train the network using Jupyter Lab (which allows me to speed up the training process). The problem is that when reading some Numpy files, I get encoding errors, such as this:
InvalidArgumentError: UnicodeEncodeError: 'ascii' codec can't encode character '\xe7' in position 64: ordinal not in range(128)
Traceback (most recent call last):
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
ret = func(*args)
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 789, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 975, in generator_fn
yield x[i]
File "/home/jfm-castilho/Chargrid/dataset_generator.py", line 26, in __getitem__
batch_x.append(np.load(self.representation_path + file + ".npy", allow_pickle=True,encoding = 'latin1'))
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/numpy/lib/npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe7' in position 64: ordinal not in range(128)
[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_13003]
Function call stack:
distributed_function
On my computer, there is no problem reading Numpy files, only when I try to read the files through Jupyter Lab. How can I fix this error. The line on which the error appears is the first line of the code snippet above.
Some considerations:
The version of Numpy is equal in both the computer and Jupyter Lab: 1.18.1
The files read by the computer and Jupyterlab are the same (I uploaded the files to the server where Jupyterlab is located and the Relative Path where the files are located is the same on the computer as in Jupyterlab.)
- I have tested several approaches to solve the problem, such as:
np.load(self.representation_path + file + ".npy", allow_pickle=True,encoding = 'bytes')
np.load(self.representation_path + file + ".npy", allow_pickle=True,encoding = 'ascii')
np.load(self.representation_path + file + ".npy", allow_pickle=True,encoding = 'utf-8')
np.load(self.representation_path + file + ".npy", allow_pickle=True)
with open(self.representation_path + file + ".npy", 'rb') as file: arr = pickle.load(file)
In none of these attempts the result was different, originating in all cases a Unicodeencorror.
I don’t know if it helps the analysis, the line where I store the Numpy Array in a Numpy File is as follows:
np.save(repr_path_pad + simple_img_name[idx], data_padded)
This is the class where I read the files, it’s a Generator that’s used when training the neural network. Batch size equals 7, so it reads 7 files at a time.
class RepresentationGenerator(Sequence):
def __init__(self, representation_path, target_path, filenames, batch_size):
self.filenames = np.array(filenames)
self.batch_size = batch_size
self.representation_path = representation_path
self.target_path = target_path
def __len__(self):
length = len(self.filenames) // self.batch_size
if len(self.filenames) % self.batch_size > 0:
length += 1
return length
def __getitem__(self, idx):
files_to_batch = self.filenames[idx * self.batch_size: (idx + 1) * self.batch_size]
batch_x = []
batch_SS = []
for file in files_to_batch:
batch_x.append(np.load(self.representation_path + file + ".npy", allow_pickle=True))
batch_SS.append(np.load(self.target_path + 'semantic segmentation/' + file + ".npy", allow_pickle=True))
batch_x = np.array(batch_x).astype(np.float16)
batch_SS = np.array(batch_SS).astype(np.float16)
return batch_x, batch_SS
Below I leave the code snippet where the above class is called
train_generator = RepresentationGenerator(representation_path=repr_path_pad, target_path=target_path_pad,
filenames=training_filenames, batch_size=self.batch_size)
val_generator = RepresentationGenerator(representation_path=representations_path, target_path=target_path,
filenames=validation_filenames, batch_size=self.batch_size)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=model_name + '.h5',
save_weights_only=True,
verbose=1)
plot_history = PlotHistory(history_fit, model_name, self.model, model_path=model_path,
load_previous=load_previous)
self.model.fit(train_generator,
steps_per_epoch=len(train_generator),
verbose=1,
epochs=num_epochs_train,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[cp_callback, plot_history]
)
Below I leave the full error log
-
InvalidArgumentError Traceback (most recent call last)
<ipython-input-1-9a8acfabebd2> in <module>
212 split_dataset_file=split_dataset, ocr_filename=ocr_file, annotated_filename=annotated_files,
213 num_epochs_trainning=num_epochs_train, history_fit=history_fit_image, width_padding=w_padding,
--> 214 upsample_path=original_repr_path, upsample_target_path=original_target_path)
<ipython-input-1-9a8acfabebd2> in main(images_path, representation_path, targets_path, repr_pad_path, target_padded_path, prefix, make_new_representation, train, use_previous_weights, split_dataset_file, model_filename, model_path, downsample, ocr_filename, annotated_filename, num_epochs_trainning, history_fit, width_padding, predict, upsample_path, upsample_target_path, update_dicts, num_chars)
94 split_dataset=split_dataset_file,
95 validation_filenames=data['val_imgs'], history_fit=history_fit,
---> 96 model_name=model_filename, num_epochs_train=num_epochs_trainning)
97 if predict: # if want to predict
98 if not train: # if neural network wasn't trained, load model
~/Chargrid/neural_network.py in train(self, representations_path, target_path, repr_path_pad, target_path_pad, training_filenames, validation_filenames, model_path, model_name, num_epochs_train, history_fit, split_dataset, batch_size)
85 epochs=num_epochs_train,
86 validation_data=val_generator,
---> 87 validation_steps=len(val_generator)
88 )
89 except KeyboardInterrupt:
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
817 max_queue_size=max_queue_size,
818 workers=workers,
--> 819 use_multiprocessing=use_multiprocessing)
820
821 def evaluate(self,
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
340 mode=ModeKeys.TRAIN,
341 training_context=training_context,
--> 342 total_epochs=epochs)
343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN)
344
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
126 step=step, mode=mode, size=current_batch_size) as batch_logs:
127 try:
--> 128 batch_outs = execution_function(iterator)
129 except (StopIteration, errors.OutOfRangeError):
130 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in execution_function(input_fn)
96 # `numpy` translates Tensors to values in Eager mode.
97 return nest.map_structure(_non_none_constant_value,
---> 98 distributed_function(input_fn))
99
100 return execution_function
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
566 xla_context.Exit()
567 else:
--> 568 result = self._call(*args, **kwds)
569
570 if tracing_count == self._get_tracing_count():
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
597 # In this case we have created variables on the first call, so we run the
598 # defunned version which is guaranteed to never create variables.
--> 599 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
600 elif self._stateful_fn is not None:
601 # Release the lock early so that multiple threads can perform the call
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
2361 with self._lock:
2362 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2363 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2364
2365 @property
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _filtered_call(self, args, kwargs)
1609 if isinstance(t, (ops.Tensor,
1610 resource_variable_ops.BaseResourceVariable))),
-> 1611 self.captured_inputs)
1612
1613 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1690 # No tape is watching; skip to running the function.
1691 return self._build_call_outputs(self._inference_function.call(
-> 1692 ctx, args, cancellation_manager=cancellation_manager))
1693 forward_backward = self._select_forward_and_backward_functions(
1694 args,
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in call(self, ctx, args, cancellation_manager)
543 inputs=args,
544 attrs=("executor_type", executor_type, "config_proto", config),
--> 545 ctx=ctx)
546 else:
547 outputs = execute.execute_with_cancellation(
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
/opt/conda/envs/csw-aii/lib/python3.6/site-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: UnicodeEncodeError: 'ascii' codec can't encode character '\xe7' in position 64: ordinal not in range(128)
Traceback (most recent call last):
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
ret = func(*args)
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 789, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 975, in generator_fn
yield x[i]
File "/home/jfm-castilho/Chargrid/dataset_generator.py", line 26, in __getitem__
batch_x.append(np.load(self.representation_path + file + ".npy", allow_pickle=True,encoding = 'latin1'))
File "/opt/conda/envs/csw-aii/lib/python3.6/site-packages/numpy/lib/npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe7' in position 64: ordinal not in range(128)
[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_13003]
Function call stack:
distributed_function
The encoding of your input file is even
LATIN1
? Have you tried using theencoding="utf-8"
?– Lacobus
How can I know which encoding, I do not pass this information when I save the Numpy Array to the file. I never considered utf-8, because in the Numpy documentation it says that encodings other than ASCII, latin1, or bytes https://numpy.org/devdocs/reference/generated/numpy.load.htmlhighlight=load#numpy should not be considered.
– João Castilho
I tested now with 'utf-8' and the again gave the same error
– João Castilho