From what I’m seeing here, your problem is representation of knowledge, more specifically on the composition of molecules and ions.
Reverse semantics
My first point of attention was the name of the function, cracking
. Of Wiktionary:
crack
- crack
- crack
- snap
- bankrupt
- split
Normally, in computing and programming, I see much more use of sense 1, crack:
disable a blocking system for unrestricted or restricted use of a computer program using a computer player
That is, security related. Breaking encryption, access keys to an offline system, things like that.
So I come across the argument of this function: chemform
. At first it did not indicate me anything of anything, so I went down a little more. And down I arrived in the comments.
Nominally, what caught my attention in the comments was that they were in English. Now, if they are in English, then it is possible that other parts were also... chemform
? chem-form
? chemestry formula!
Okay, so the argument is a chemical formula. So what would be the cracking of that? Doesn’t it seem that one is trying to make a cyber attack and force the alkali to reveal their earthly secrets... then crack 6. crack?
If you want to split the chemical formula of a compound? Interesting... then, what do you want to know with these parts? An indexed set of multipliers and an indexed set of atoms? Need to return both. But and the third indexed variable? pos
? That remains a mystery.
How to break a chemical formula?
Well, that depends on many factors.
In general, chemical formulas are not linear representations, but rather trees that have been simplified to facilitate printing and communication through character sequences.
An example of a more suitable representation of the tree is a mercury (I) salt: mercury nitrate (I).
This salt consists of two NO nitrate anions31- and a cation of mercury (I) Hg22+. His representation is Hg2(IN THE3)2.
Why does this happen? Well, it turns out that mercury has two possible ionic states:
- Hg2+
- Hg22+
Therefore, the second form, with two mercury and after losing 2 electrons, has an average charge of 1+. But it doesn’t happen that mercury appears alone with just one lost electron. Other interesting points of elements that appear to be able to be simplified according to the proportion of atoms relative to the chemical formula of the molecule/ion:
- Homonuclear diatoms
- Other allotropic forms
Generally speaking, trying to simplify the molecular formula, The2 and the3 could be simplified to O, but this does not reflect the chemical properties of these 2 substances
- Hydrofluoric acid
The hydrogen bridges formed between fluorine and hydrogen are as strong as covalent bonds between hydrogen and fluorine, so it is represented as (HF)2, not as HF
It also has the case, more special for ions, where the ion has an oxidation charge as a whole, but not for its separate parts:
- Peroxide O22-, with average oxidation number of -1
- Superoxide O21-, with average oxidation number of -1/2
Form of representation
The form of representation of a substance, intuitively, is given by the following rules:
- A chemical formula consists of one or more parts
- A part consists of a quantizable followed by a quantity index (sub-standard) and a load index (superscript))
- Are quantizable:
- When a quantizable consists of a chemical formula of more than one part and it has an index, parentheses are usually placed before the indices
For example, by deriving mercury nitrate (I) on top of these rules:
- Chemical formula
- Parte Parte
- Quantizable index Quantizable index
- Hg2 (Chemical formula)2
- Hg2 (Parte Parte)
- Hg2 (NO3)2
More formally, this is a context-free grammar. The demonstration that it is context-free fits one response at a time, but it presents self-nesting, which already excludes it from being regular, being of a more powerful level; it also has the fact that one "part" does not interfere with another, therefore each "part" can be interpreted without needing external context, therefore "context-free".
The grammar for this would be, starting from F
:
F ==> P
F ==> MP
MP ==> MP P
MP ==> P
P ==> Qs
P ==> Qs I
P ==> Qc
P ==> (Qc) I
Qc ==> F
Qs ==> <<átomo>>
I ==> <<número>>
Where:
F
formula
MP
and P
multi-part and part
Qs
and Qc
simple quantizable (one atom) and complete quantizable
I
quantity index
<<número>>
a number
<<átomo>>
one of the approximately 120 elements known by chemistry, represented here by the atomic acronym
In the case of the formula for mercury nitrate (I), the leftmost derivation is (terminals indicated between quotation marks):
F
MP
MP P
P P
Qs I P
"Hg" I P
"Hg" "2" P
"Hg" "2" "(" Qc ")" I
"Hg" "2" "(" F ")" I
"Hg" "2" "(" MP ")" I
"Hg" "2" "(" MP P ")" I
"Hg" "2" "(" P P ")" I
"Hg" "2" "(" Qs P ")" I
"Hg" "2" "(" "N" P ")" I
"Hg" "2" "(" "N" P ")" I
"Hg" "2" "(" "N" Qs I ")" I
"Hg" "2" "(" "N" "O" I ")" I
"Hg" "2" "(" "N" "O" "3" ")" I
"Hg" "2" "(" "N" "O" "3" ")" "2"
This is the derivation tree that represents the past formula. This is exactly the knowledge indicated by Hg2 (NO3)2, no more and no less.
Now, what is the cracking
? Is getting knowledge the way it was passed on in a manipulable way useful for the computer to work? By what it indicates (return of an indexable set of atoms and an indexed set of multipliers), I believe not. I believe it is closer to representing the ratio of each atom in the molecule/ion argument.
Apparently, the desired was to represent their atoms and their respective multipliers. Your solution is the solution marked as accepted consists of connecting these two information simply by positioning. Then vec.multi[i]
and vec.parte[i]
is the right way to get the atom and its multiplier. This means that it is plausible to obtain different amounts for the same atom, as long as it is at different rates. Such as CH ethyl alcohol3CH2OH, where hydrogen appears in 3 different places.
I am not in favor of this representation, because it is always necessary to make two vector accesses to obtain information. I am more in favor of the representation indicated by Maniero: atom identifier and multiplier together. So, to know who is its atom and its multiplier, just do a vector access. Just do atoms[i]
. No matter what you do atoms[i].multiplie
and atoms[i].part
to take the multiplier and the part, would have a greater chance to take advantage of the cache processor due to locality. Perhaps smart compilers can identify that atoms[i]
is an idempotent operation (in the sense that Atom a = atoms[i]; Atom b = atoms[i];
will cause a
and b
have the same values) and only perform it once.
Another way of acting would be through a hashmap
or something like that, where the atom would be the key to the map and it would store quantities. The advantage of this method is that it is guaranteed that it only stores each atom once. But at the moment it is one hashmap
, you give up the fact that it is an indexable set and live with a non-exable set. If the index mattered to you before, well, you lost it. It would need more auxiliary structures to represent the atom, the multiplier and the index in a hashmap
.
Still in the mind of hashmap
, you can create a hashmap
fixed size that will not suffer from possible increases in the scattering table. How? Well, we have less than 120 known distinct atoms. That number is getting harder and harder to climb, maybe 200 atoms are beyond the limit that scientists are able to manufacture. So we can map each element to its respective atomic number. For example, hydrogen would occupy position 1, helium at position 2, carbon at position 6, nitrogen at 7 and so on. There would be a static vector in the code relating atomic numbers to atoms, perhaps a search tree representing the opposite.
What is the relationship between
multi
andparte
? Wouldn’t it suddenly be a case of being avector<par_multi_atom> multipart
, containing a structure composed of multipliers and atoms?– Jefferson Quesado
Could, the problem is that I need this function to be responsible for this task, I can not play another processing in main.
– Patrick Machado