Tuesday, July 5, 2011

An easy one with Symfony, MySql, Doctrine, PHPExcel and ExtJS

Codice Fiscale
The Italian fiscal code card, is an alphanumeric code of 16 characters with the purpose of unambiguously identify the residents on Italy. The page Codice_Fiscale clearly explains how the code is composed by the combination of the person's family name, name, birth date, sex and birth place. For example:

 The person:
  • Family name: Lago
  • Name: Eduardo
  • Birth date: November, 9 1980
  • Sex: Male
  • Birth place: Cuba
have a code:

  LGA DRD 80S09 ZCUB X



How to link each each part of the code with the corresponding person data?
  • Characters 1-3 come from the family name
  • Characters 4-6 come from the name
  • Characters 7-11 come from the birth date and the sex
  • Characters 12-15 come from the birth place
  • Character 16 is an especial character, it's a sort of checksum/parity control

The site http://www.codicefiscale.com/ is a very useful online tax code calculator.

The batch tax code validator
The presented application (ittaxcodevalidator) is PHP-based batch fiscal code validator that massively process citizens data in an Excel-spreadsheets, computes the tax code and compares it to the one supplied in the last column. The spreadsheet is supplied by an upload form.
Once the process is finished, then the application publish back a download link with a modified version of the spreadsheet with two additional columns added at the end, the first one with the calculated code, and the other indicating whether the tax code is compliant (OK/FAILED). Each row is also highlighted with red/green colour for not-compliant/compliant.

Input spreadsheet columns have the format:

NAME|FAMILY NAME|SEX|BIRTH DATE|BIRTH PLACE* (municipality)|BIRTH PLACE** (province)|TAX CODE
Eduardo | Lago | male | Nov 9, 80  | cuba   |        | LGADRD80S09ZCUBX
Mario     | Rossi | male | Jan 16, 60 | Roma | RM  |  RSSMRA60A15H501E   

The output spreadsheet columns have the format:


NAME|FAMILY NAME|SEX|BIRTH DATE|BIRTH PLACE* (municipality)|BIRTH PLACE** (province)|TAX CODE|VALIDATED TAX CODE|RESULT
Eduardo | Lago | male | Nov 9, 80  | cuba   |        | LGADRD80S09ZCUBX | LGADRD80S09ZCUBX | OK
Mario     | Rossi | male | Jan 16, 60 | Roma | RM  |  RSSMRA60A15H501E | RSSMRA60A16H501G | FAILED

* if the birth place is in a foreign Country, this field is filled with the name of the Country
** if the birth place is in a foreign Country, this field is not filled, this is the way to know it's a foreign country. If the birth place is on Italy, then use then inquiry the Belfiore code table (Spreadsheet located at extras/italian_municipalities_encoding/italian_municipalities_encoding.xls)

Sample spreadsheets can be found in directory data/xls/samples/ at: https://bitbucket.org/eduardo_lago_aguilar/ittaxcodevalidator 

The web application is fully developed with Symfony Web PHP-Framework v1.4.8 combined with Doctrine ORM and MySql Server. The PHPExcel v1.7.4 API is used for the spreadsheet loading/processing/saving. The user interface is built with the Sencha ExtJS AJAX framework.  
 
The algorithm
The rules for the tax code calculation are straightforward:
Characters 1-3
Use the first 3 consonants of the family name (or the family names if there are more than one, but in the exact order), if there are not enough consonants, always in the exact order, just after all the possible consonants already placed,  replace the missing consonant with the first vowel available and if the family name is so short that there are neither enough vowels, then replace every missing character with a 'X' character.

Characters 4-6
If the name has at least 4 consonants then take, as the chosen 3 consonants, 1st, 3rd and 4th consonant while, if there are not enough consonants, then take the first 3 in the exact order. If there are yet not enough consonants, always in the exact order, just after all the possible consonants already placed, then replace the missing consonant with the first vowel available and if the name is so short that there are neither enough vowels, then replace every missing character with a 'X' character.

Characters 7-11
The first 2 characters are the digits of the last two digits of the birth year (ex: 1960 is 60) and the 3rd character is the encoding of the birth month according to the following map:

A : january
B : february
C : march
D : april
E : may
H : june
L : july
M : august
P : september
R : october
S : november
T : december

the last 2 characters are the digits of the birth day, eventually filled with 0 if the birth day is just one digit. It's important to represent that the sex has to be taken in count cause if the person belongs to the female gender, it has to be added to the birth day the number 40 (ex: a woman born the 6 you will decode with 46, a woman born the 15 you will decode wih 55 and so on).

Characters 12-15
These characters are a simple translation of the Italian encoding for the municipalities by means of the inquiry of the Belfiore code table (Spreadsheet located at extras/italian_municipalities_encoding/italian_municipalities_encoding.xls), once found the proper birth place. The Province encoding (ex: the province code for Rome is 'RM') avoids the problem in having municipalities with the same name (there are no equal Municipality names in the same Province). If the birth place is in a foreign country, then encode with letter 'Z' followed by the 3 digits that represent the Country code according to the ISO 3166-1.

Character 16
The checksum-parity control is calculated this way:
  1. extract an ordered set from the rpevious 15 charatcers taking just the odd positions, in the above example the ordered set is 'LAR8S9CB'.
  2. extract an ordered set from the previous 15 charatcers taking just the even positions, in the above example the ordered set is 'GDD00ZU'.
  3. the "odd positions set" is decoded summing all the numbers that come from the encoding of each character presented in the table ODD ALPHANUMERIC CHARACTER.
  4. the "even positions set" is decoded summing all the numbers that come from the the encoding of each character presented in the table EVEN ALPHANUMERIC CHARACTER.
  5. then sum the both numbers that come from the previous two encodings and divide this number by 26. The rest of this operation is the target digit for as the last character of the tax code once decoded through the table REST.
  6. the previous steps don't guarantee that every single person gets a different code; there can be persons with the same name, born in the same day and in the same municipality and so, in such cases, it is applied a sort of "patch" that works this way: when a tax code is already existent it is changed the last numeric character starting from the right according to the table PATCH. However, this last case it's quite rare and it's not possible to prior know if this tax code is already existent or not, so when this case is presented it is just important to realize it's not a wrong format but it's a format that can be accepted because of the rule.
     
ODD ALPHANUMERIC CHARACTER
CharacterValueCharacterValueCharacterValueCharacterValue
1 0 A 1 J 21 S 12
2 5 B 0 K 2 T 14
3 7 C 5 L 4 U 16
4 9 D 7 M 18 V 10
5 13 E 9 N 20 W 22
6 15 F 13 O 11 X 25
7 17 G 15 P 3 Y 24
8 19 H 17 Q 6 Z 23
EVEN ALPHANUMERIC CHARACTER
CharacterValueCharacterValueCharacterValueCharacterValue
0 0 9 9 I 8 R 17
1 1 A 0 J 9 S 18
2 2 B 1 K 10 T 19
3 3 C 2 L 11 U 20
4 4 D 3 M 12 V 21
5 5 E 4 N 13 W 22
6 6 F 5 O 14 X 23
7 7 G 6 P 15 Y 24
8 8 H 7 Q 16 Z 25

REST
Rest Letter Rest Letter Rest Letter Rest Letter
0 A 7 H 14 O 21 V
1 B 8 I 15 P 22 W
2 C 9 J 16 Q 23 X
3 D 10 K 17 R 24 Y
4 E 11 L 18 S 25 Z
5 F 12 M 19 T

6 G 13 N 20 U



PATCH
Digit Letter Digit Letter Digit Letter
0 L 4 Q 8 U
1 M 5 R 9 V
2 N 6 S

3 P 7 T