1
I’m on a personal project that involves leaving an article written in Latex as clean as possible for sending the translation.
Aiming to increase my productivity, avoid problems of information leakage and facilitate for the translator, who does not have much familiarity with this type of writing, I decided that would send the original text more "clean", without the equations for example. Then I can extend the same concept to figures, etc.
Problem: After extracting what is desired (equations for example), two new files are saved, one containing only the equations and the other clean, without the equations. I even set up the code below, which already works. The challenge is: (1) At each extraction, a reference should be left to return the same equation in the same place as it was, when received the translated text; (2) To return the original equations, there must be another script for this purpose.
Any strategy suggestions to better address this challenge ?
Follow the current code working, without yet meeting the items (1) and (2) above.
print('Inicio do Script')
infileName = open('document.tex','r')
inOrig = infileName.readlines()
outfileName_eq = open('document_equacoes.tex','w')
outfileName_tex = open('document_limpo.tex','w')
extract_block = False
oneWrite = False
lista = [['begin{equation}', 'end{equation}'],\
['begin{equation*}', 'end{equation*}'],\
['begin{eqnarray}', 'end{eqnarray}'],\
['begin{eqnarray*}', 'end{eqnarray*}'],\
['begin{align}', 'end{align}'],\
['begin{align*}', 'end{align*}']]
for list in lista:
print('Examinando '+ list[0] + ' e ', list[1])
for line in inOrig:
if list[0] in line:
extract_block = True
if extract_block:
outfileName_eq.write(line)
if list[1] in line:
extract_block = False
outfileName_eq.write("%------------------------------------------\n\n")
#separado para melhor entendimento do funcionamento
for line in inOrig:
for list in lista:
if list[0] in line:
extract_block = True
oneWrite = True
if list[1] in line:
extract_block = False
oneWrite = True
if not (extract_block or oneWrite):
outfileName_tex.write(line)
oneWrite = True
oneWrite = False
infileName.close()
outfileName_eq.close()
outfileName_tex.close()
print('Fim do Script')
The Latex document I used for testing is the following, which to match the above code, must be saved as "Document.tex"
\documentclass{article}
\usepackage[utf8]{inputenc} % Disponibiliza acentos.
\usepackage[english,brazil]{babel}
\usepackage{lipsum}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\title{Titulo do Artigo}
\author{Nome do Autor}
\begin{document}
\maketitle
\begin{abstract}
\lipsum[1]
\end{abstract}
\section{Primeira Seção}
\lipsum[2-4]
\section[Exemplo de Fórmula]{Fórmula}
Neste trecho existe um exemplo de como aparece geralmente
uma equação. A primeira equação é a de Báskara, conforme (\ref{eq:bask01})
\begin{equation}
x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}
\label{eq:bask01}
\end{equation}
Outra forma de expressar as fórmulas que também precisam ser verificadas abaixo
\begin{eqnarray*}
x =& a^b\\
y =& h^{\pi.r}
\end{eqnarray*}
A seguinte é muito parecida com a de Báskara e pode ser visto em (\ref{eq:bask02}), no entanto não existe em literatura.
\begin{equation}
x = \frac{-b/2 \pm \sqrt{b^2-4ac}}{2acb}
\label{eq:bask02}
\end{equation}
Outra forma de expressar as fórmulas que também precisam ser verificada
\begin{eqnarray}
x &=& a^b\\
y &=& h^{\pi.r}
\end{eqnarray}
Finalmente outro método com $k_2$ tal como (\ref{eq:seqEq})
\begin{align}
k_1&= s^2\\
k_2&= k^2 \label{eq:seqEq}
\end{align}
Fim das descrições gerais
\end{document}
You have the string of the equation, right? Create a hash MD5 (library
hashlib
) and assemble a dictionary where the key is the hash and the content is the text of string. This dictionary you can save to disk as a JSON file (libraryjson
), at the place where the equation was you leave a Latex comment containing only the hash.– Giovanni Nunes
It seems to be a good solution. I also thought of a tag. Via MD5 seems to be even better, because I don’t worry about "inventing" a different one for each element I’m looking to extract. I’ll have to test it. I’m not very familiar with JSON, but I believe it will be better, because it is structured. Thanks for the tip.
– cbe-user-99263
The MD5 key will even help you identify the identical equations and using the Latex comment tag, besides being easier to identify, you do not break the compilation of the document. Work with dictionaries and use functions
json.load()
to load it from disk andjson.dump()
to save him.– Giovanni Nunes