Pages

Tuesday, October 23, 2018

Oracle OUTBOUND - Writing Special Characters with UTL_FILE in UTF-8 Format

If we are using a (AL32)UTF8 database then you can simply add the below line at the beginning the file to write out a BOM first.
    UTL_FILE.PUT_LINE(l_file, CHR(15711167));

Run the below SQL to identify the DB character set, 'NLS_CHARACTERSET'
    select * from nls_database_parameters where parameter like '%CHAR%';

What is BOM?
            The UTF-8 BOM (Byte order Mark) is a sequence of bytes (EF BB BF) that allows the reader to identify the file as an UTF-8 file.

--> CREATE DIRECTORY XXAA_FILE AS '/usr/tmp/';

DECLARE
   l_file        UTL_FILE.FILE_TYPE;
   l_file_name   VARCHAR2 (200) := 'XXAA_FILE_' || TO_CHAR (SYSDATE, 'YYYY_MM_DD_MISS') || '.txt';

BEGIN
   --> Writing into File
   l_file := UTL_FILE.FOPEN ('XXAA_FILE', l_file_name, 'w');
   UTL_FILE.PUT_LINE (l_file, CHR (15711167));
   UTL_FILE.PUT_LINE (l_file, '連邦エクスプレスコーポレーション');
   UTL_FILE.FCLOSE (l_file);

EXCEPTION
   WHEN UTL_FILE.WRITE_ERROR
   THEN
      UTL_FILE.FCLOSE_ALL;
      RAISE_APPLICATION_ERROR (-20056, 'Invalid Operation');

END;
/


UTL_FILE
UTL_FILE is a package to read and/or write files. Because UTL_FILE is a database package all these operations are done on the server (i.e. a new file is created
on the OS of the database server or a file that you want to read must reside on the server, irrespective of the client you run the code from). 

For more information about UTL_FILE, please see these notes:
Note 44307.1 Using the UTL_FILE Package to Perform File I/O (UNIX)
Note 45172.1 Running UTL_FILE on Windows NT

Normal Character Set Conversion
In a normal client/server setup the NLS_LANG environment variable on the client informs Oracle how to convert the data from the client character set to the database character set.
See the following note for a complete overview of how that works:

Note 158577.1 NLS_LANG Explained (How does Client-Server Character Conversion Work?)

Because UTL_FILE runs completely inside the database server it does NOT use the NLS_LANG (also not the NLS_LANG value of the server) to do character set conversion. 

UTL_FILE's FOPEN, GET_LINE, PUT, PUT_LINE and PUT_F do not perform ANY character set conversion by itself.

UTL_FILE's FOPEN, GET_LINE, PUT, PUT_LINE and PUT_F assumes the Encoding/characterset of the file to be in the database NLS_CHARACTERSET.

select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

Character sets and loading flat txt or csv files
The characters in a file are always stored according to a certain code page (character set).
if you want to load a flat txt or csv file then you need to first of all know the encoding of the flat txt or csv file.
The most COMMON encodings for flat text files are the "Windows ANSI" charactersets seen most data files etc are created on windows systems.
( = EE8MSWIN1250 , CL8MSWIN1251, WE8MSWIN1252, EL8MSWIN1253 ,TR8MSWIN1254, IW8MSWIN1255, AR8MSWIN1256, BLT8MSWIN1257 ,VN8MSWIN1258, TH8TISASCII, JA16SJIS , KO16MSWIN949, ZHS16GBK, ZHT16MSWIN950 , ZHT16HKSCS31 )
For more information on character sets under in Windows please see: Note:179133.1 The correct NLS_LANG in a Windows Environment

Files made on Unix systems (using vi for example) are most of the time depending on the LOCALE setting or used telnet/ssh config. 
Note:264157.1 The correct NLS_LANG setting in Unix Environments

While Oracle support will help you with syntax and / or mapping from other vendor encoding names to Oracle encoding names the FIRST thing that need to be known is the encoding / character set of the txt/csv file

Oracle support cannot say what the encoding of a flat text file is or should be.
The only real solution is to have confirmation from the source that provides the flat text file what the encoding of this file is.
Note that "French" or "Spanish" are not encodings those are languages . Any language can be encoded in several different encodings / character sets.
Also a file cannot have "no encoding", any flat text file is always in an encoding.
If your source has no idea or is not reachable then the only thing left is to "guess" what the encoding is of that text file.

An excellent windows tool to "guess" the CHARACTERSET of a txt or csv file is Unipad (a 3the party tool not related to Oracle)
Unipad is available on http://www.unipad.org/main/  .
This editor is an Unicode editor that allows you to load a txt file
* as unicode (UTF-8, UTF16 etc) using "file" - "open"
* as a certain non-unicode characterset using "File" - "Import" choose file and encoding in the dialog.
You can simply "play around" with the charactersets and the moment the data "looks good" you know you used the correct characterset for loading this file in Unipad.

The mapping of most Unipad non-unicode characterset names to Oracle names is pretty straight forward. "ISO/IEC 8859-1 [Latin 1]" is WE8ISO8859P1 in Oracle, "ISO/IEC 8859-15 [Latin 9]" is WE8ISO8859P15, "Windows CP-1250 [Central European]" is EE8MSWIN1250 , "DOS CP 857 [Turkish]" is TR8PC857, "JIS X 0208,EUC-JP" is JA16EUC, etc etc.
Some are less straight forward: "GB2312 [simplified Chinese]" is similar to ZHS16GBK and "Big Five [Traditional Chinese]" is similar to ZHT16MSWIN950, "TIS-620 [Thai]" is similar to TH8TISASCII.
With similar is meant that for some characters there might be a difference.

Note also that it is NOT possible to load one file that contains 2 encodings. If you have for example a file that contains both 8 bit codes and UTF8 then this file needs to be split up before this can be loaded.

Reading a file with UTL_FILE
In this first example we will try to read a file that has been created in a Western (Western Europe/US) version of Windows.
That means that the codepage is WE8MSWIN1252. If we try to read that file and store the characters in a database without any manual conversion we will simply store the WE8MSWIN1252 code points.
So if the database character set is not WE8MSWIN1252 we have to do some manual conversion. In these examples the database character set is AL32UTF8.
First we have to create a file with a bunch of characters. For this test we use notepad to create a file example.txt in the directory c:\utl_convert with the following content:

-------start file-------
âçïù
--------end file--------

Choose "save as" and doublecheck the file is saved using "ANSI" as encoding.

One can create these characters on Windows by holding the ALT key down and typing the code point (in decimal format) on the numeric keyboard.
The above 4 characters are created by typing ALT+131, ALT+135, ALT+139 and ALT+151)

Now connect to a database and create the following table
create table utl_conv_example(col1 varchar2(1000));

Now we can run this simple bit of pl/sql to read the file and store the contents in the table. We will insert 2 rows, 1 without manually converting the data and the second row after converting it into AL32UTF8.

CREATE OR REPLACE DIRECTORY UTL_DIR AS 'c:\utl_convert';

DECLARE
   v_Data1         VARCHAR2 (4000 BYTE);
   v_Raw1          RAW (20000);
   v_Filecharset   VARCHAR2 (40 BYTE);
   v_Dbcharset     VARCHAR2 (40 BYTE);
   v_inf           UTL_FILE.FILE_TYPE;
BEGIN

   -- define File CHARACTERSET
   v_Filecharset := 'WE8MSWIN1252';

   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'AL32UTF8';

   -- using RAW to avoid the problem documented in
   -- Note:257707.1 Last character truncated when using the CONVERT function
   -- due to bug 1840038 UTL_RAW.CONVERT DOES NOT RECOGNIZE CHARACTER SET NAME IN ITS PARAMETERS

   -- UTL_RAW.CONVERT needs a "NLS_LANG style" syntax and not like the docset say a characterset
   -- the best workaround is to prefix the characterset with AMERICAN_AMERICA.

   v_Dbcharset := 'AMERICAN_AMERICA.' || v_Dbcharset;
   v_Filecharset := 'AMERICAN_AMERICA.' || v_Filecharset;

   V_Inf := UTL_FILE.FOPEN ('UTL_DIR', 'example.txt', 'R');
   UTL_FILE.GET_RAW (V_inf, v_Raw1, 1000);

   -- UTL_FILE.GET_RAW is new from 9.2 onwards
   -- inserting unconverted
   v_Data1 := UTL_RAW.cast_to_varchar2 (v_Raw1);

   INSERT INTO utl_conv_example VALUES (v_Data1);

   -- now correcty converting it to the NLS_CHARACTERSET
   -- syntax: result := UTL_RAW.CONVERT(input, new_charset, old_charset);
   v_Raw1 := UTL_RAW.CONVERT (v_Raw1, v_Dbcharset, v_Filecharset);
   v_Data1 := UTL_RAW.cast_to_varchar2 (v_Raw1);

   INSERT INTO utl_conv_example VALUES (v_Data1);

   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/

Now we can have a look at the data that is stored:

SQL> select col1, dump(col1) from utl_conv_example;
COL1 DUMP(COL1)
---- --------------------------------------------
�� Typ=1 Len=4: 226,231,239,249
âçïù Typ=1 Len=8: 195,162,195,167,195,175,195,185

The first record is the record that was inserted without converting. We can see from the dump that the character codes are indeed exactly the same as the code points for these 4 characters in WE8MSWIN1252. These code points don't mean anything in AL32UTF8 so these characters show up as rubbish. 
The second record is the record that was inserted after using the CONVERT sql function to convert the string from 1 character set to another. The code points that show up here are indeed the correct code points for these characters in AL32UTF8 (they each take 2 bytes to store in AL32UTF8).
use SQLdeveloper (and not sqlplus, toad, etc) as client when testing with UTL_FILE, this is a "know good client" that needs no NLS configuration.

note 1628060.1 How to diagnose losing characters , getting "funny" output when inserting or selecting other than A-Z,a-z data ( = non English data like Chinese, Russian, Hebrew , insert any language here to the list that is not English) CHAR, VARCHAR2, LONG or CLOB
If the data is displayed correctly in SQLdeveloper then you are sure it's correct in the database.

So what does this mean?
In simple terms this means that if you read files with UTL_FILE you have to be aware of the codepage they are created in. Although that doesn't sound too difficult there are some things to be aware of:

As stated before, on Unix systems the codepage will generally be the from the ISO 8859 series so you will have to convert from WE8ISO8859P1 or AR8ISO8859P6 etc. etc.

On Windows systems it's slightly more complicated. First of all most Windows systems have 2 character sets, 1 for the GUI system (the 'ANSI' character set) and 1 for the DOS subsystem (when you run command line tools, the 'OEM' character set). 
There is much more information on this in the already mentioned Note.179133.1 "The correct NLS_LANG in a Windows Environment".
You can see the difference if you open the utl_convert.txt example file in Wordpad and select 'save as'. 

If you select "Text format - MS-DOS format" as the type the file will be stored in the DOS codepage (for a Western system it would be MS DOS code page 850 or 437).
Should you run the above example again you would now have to convert from WE8PC850 (or US8PC437) since that is now the encoding of the file.

On top of that there are some GUI applications like MS Word that bypass the ANSI codepage and use straight Unicode data. For example if you open the utl_convert.txt file created for the above example in notepad and select 'save as' you have a choice of encoding schemes. The default is ANSI but you can choose to save the file in UTF-8 encoding. If you would do that and run the above example no conversion would be needed at all (be aware that these files have 3 extra bytes at the beginning of the file as the 'Byte-Order Mark').
Of course this could also be used if the database is not in AL32UTF8. For example you might have a database running in the Arabic Windows characters set AR8MSWIN1256 and you want to read a file that has been delivered to you in UTF-8 encoding. In that case you need to convert from AL32UTF8 into AR8MSWIN1256.

Writing a file
When you're writing a file with UTL_FILE again no conversion takes place. In this example we shall again use an AL32UTF8 database. If you write to a file the standard encoding will be UTF-8. Although most file editors on Windows will understand that format not every application will know what to do with a UTF-8 encoded file. To keep the examples in the same codepage we will use the same data as we had in the first example and write that as WE8MSWIN1252:

create table utl_conv_example2(col1 varchar2(1000));
insert into utl_conv_example2 values('âçïù');

So we have a simple table with only 1 row of data:
SQL> select col1, dump(col1) from utl_conv_example2;
COL1 DUMP(COL1)
---- --------------------------------------------
âçïù Typ=1 Len=8: 195,162,195,167,195,175,195,185

Now again we have a simple bit of pl/sql to write the file:

CREATE OR REPLACE DIRECTORY UTL_DIR AS 'c:\utl_convert';

DECLARE
   v_Data1           VARCHAR2 (4000 BYTE);
   v_Raw1            RAW (20000);
   v_Targetcharset   VARCHAR2 (40 BYTE);
   v_Dbcharset       VARCHAR2 (40 BYTE);
   v_inf             UTL_FILE.FILE_TYPE;
BEGIN
   -- define target CHARACTERSET
   v_Targetcharset := 'WE8MSWIN1252';
   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'AL32UTF8';

   SELECT col1 INTO v_Data1 FROM utl_conv_example2;

   -- using RAW to avoid the problem documented in
   -- Note:257707.1 Last character truncated when using the CONVERT function
   -- due to bug 1840038 UTL_RAW.CONVERT DOES NOT RECOGNIZE CHARACTER SET NAME IN ITS PARAMETERS
   -- UTL_RAW.CONVERT needs a "NLS_LANG style" syntax and not like the docset say a characterset
   -- the best workaround is to prefix the characterset with AMERICAN_AMERICA.
   --
   v_Dbcharset := 'AMERICAN_AMERICA.' || v_Dbcharset;
   v_Targetcharset := 'AMERICAN_AMERICA.' || v_Targetcharset;
   --
   -- convert the data first to RAW
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   -- now write it out "as is"
   -- use the "byte mode" open_mode when wring out raw
   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   UTL_FILE.PUT_RAW (V_Inf, v_Raw1, TRUE);
   -- convert that RAW to Targetcharset
   -- syntax: result := UTL_RAW.CONVERT(input, new_charset, old_charset);
   v_Raw1 := UTL_RAW.CONVERT (v_Raw1, v_Targetcharset, v_Dbcharset);
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/

If you now open the file you will notice that the first line is rubbish because the 8 bytes that were written just represent certain random characters in WE8MSWIN1252. The second line is exactly like expected.

Should you wish to create a file that is UTF-8 encoded please make sure that you start the file with the correct 'Byte-Order Mark' to indicate its encoding.
Information on the byte-order marks to be used on Windows platforms is documented by Microsoft in the following article: 
http://msdn.microsoft.com/en-us/library/dd374101(VS.85).aspx
Or on the Uniocde website http://www.unicode.org/faq/utf_bom.html#BOM
Most sr's about writing out UTF-8 files "not working / giving garbage files" are in fact caused by not using an editor that understands UTF-8. Use Unipad http://www.unipad.org/main/ as editor for testing

Please note that the BOM is *one* Unicode character U+FEFF...
The (AL32)UTF8 codepoint of this BOM 'character' is CHR(15711167) (EFBBBF in hex)

If you are using a (AL32)UTF8 database then you can simply add.
UTL_FILE.PUT_LINE(v_inf, CHR(15711167));

in the procedure to write out a BOM first.
Of course there is then no need to use the CONVERT function as the data is already in UTF-8 format in the (AL32)UTF8 database.(also for an UTF8 db there is NO need to convert it to AL32UTF8).

If you are using CONVERT together with UTL_FILE to output data into a *UTF-8 file* from a *8 bit* characterset (ISO8859Px or MSWIN12xx type) database then you can us this 'trick':
UTL_FILE.PUT_LINE(v_inf, CHR(239) || CHR(187) || CHR(191));
and then output the data in AL32UTF8 using the convert function in the inverse way (from 8bit to AL32UTF8) as shown above.

Outputting UTF16 files:
If you are using CONVERT together with UTL_FILE to output data into a UTF16 file from a UTF8 database then you need a BOM (a BOM is mandatory for UTF16 files), simply make sure that your first UTF8 character is CHR(15711167), after the convert to UTF16 this will be U+FEFF , which is the correct BOM for a UTF16 file.

Most sr's about writing out UTF-16 files "not working / giving garbage files" are in fact caused by not using an editor that understands UTF-16. Use Unipad http://www.unipad.org/main/ as editor for testing

To output a Big endian UTF16 file use AL16UTF16, CharacterSetID = 2000 (always big endian)
the first 2 bytes of a Big Endian UTF16 file should be FEFF ( = the BE UTF16 BOM)
-- AL16UTF16 = Big Endian,
-- assuming AL32UTF8 NLS_CHARACTERSET db

drop table utl_conv_example2 purge;
create table utl_conv_example2(col1 varchar2(1000));
insert into utl_conv_example2 values(UNISTR('\20AC\00c2'));
commit;
CREATE OR REPLACE DIRECTORY UTL_DIR AS 'd:\temp';

DECLARE
   v_Data1           VARCHAR2 (4000 BYTE);
   v_Raw1            RAW (20000);
   v_Targetcharset   VARCHAR2 (40 BYTE);
   v_Dbcharset       VARCHAR2 (40 BYTE);
   v_inf             UTL_FILE.FILE_TYPE;
BEGIN
   -- define target CHARACTERSET
   v_Targetcharset := 'AL16UTF16';
   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'AL32UTF8';


   -- UTL_RAW.CONVERT needs a "NLS_LANG style" syntax and not like the docset say a characterset
   -- the best workaround is to prefix the characterset with AMERICAN_AMERICA.
   v_Dbcharset := 'AMERICAN_AMERICA.' || v_Dbcharset;
   v_Targetcharset := 'AMERICAN_AMERICA.' || v_Targetcharset;

   -- first write a BOM for AL16UTF16
   -- CHR(15711167) is the BOM value in *AL32UTF8*
   v_Data1 := CHR (15711167);


   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   v_Raw1 := UTL_RAW.CONVERT (v_Raw1, v_Targetcharset, v_Dbcharset);
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);


   -- now write data

   SELECT col1 INTO v_Data1 FROM utl_conv_example2;

   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   v_Raw1 := UTL_RAW.CONVERT (v_Raw1, v_Targetcharset, v_Dbcharset);
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/


-- AL16UTF16 = Big Endian,
-- assuming WE8MSWIN1252 NLS_CHARACTERSET db

drop table utl_conv_example2 purge;
create table utl_conv_example2(col1 varchar2(1000));
insert into utl_conv_example2 values(UNISTR('\20AC\00c2'));
commit;
CREATE OR REPLACE DIRECTORY UTL_DIR AS 'd:\temp';

DECLARE
   v_Data1           VARCHAR2 (4000 BYTE);
   v_Raw1            RAW (20000);
   v_Targetcharset   VARCHAR2 (40 BYTE);
   v_Dbcharset       VARCHAR2 (40 BYTE);
   v_inf             UTL_FILE.FILE_TYPE;
BEGIN
   -- define target CHARACTERSET
   v_Targetcharset := 'AL16UTF16';
   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'WE8MSWIN1252';


   -- UTL_RAW.CONVERT needs a "NLS_LANG style" syntax and not like the docset say a characterset
   -- the best workaround is to prefix the characterset with AMERICAN_AMERICA.
   v_Dbcharset := 'AMERICAN_AMERICA.' || v_Dbcharset;
   v_Targetcharset := 'AMERICAN_AMERICA.' || v_Targetcharset;

   -- first write a BOM for AL16UTF16
   -- seen  U+FEFF  is not known in WE8MSWIN1252
   -- compose a 2 byte FE FF sequence and simply write this to the file as workaround

   v_Data1 := CHR (254) || CHR (255);


   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   -- note the lack of conversion
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);


   -- now write data

   SELECT col1 INTO v_Data1 FROM utl_conv_example2;

   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   v_Raw1 := UTL_RAW.CONVERT (v_Raw1, v_Targetcharset, v_Dbcharset);
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/

If you want to output Little Endian UTF16 (= the format used on X86 windows platforms by Microsoft) you can use AL16UTF16LE, CharacterSetID = 2002 (always little endian)
the first 2 bytes of a Little Endian UTF16 file should be FFFE ( = the LE UTF16 BOM)
note: UTL_RAW.CONVERT does not know AL16UTF16LE, hence this way of doing the convert
-- AL16UTF16LE =   Little Endian,
-- assuming AL32UTF8 NLS_CHARACTERSET db


drop table utl_conv_example2 purge;
create table utl_conv_example2(col1 varchar2(1000));
insert into utl_conv_example2 values(UNISTR('\20AC\00c2'));
commit;
CREATE OR REPLACE DIRECTORY UTL_DIR AS 'd:\temp';

DECLARE
   v_Data1           VARCHAR2 (4000 BYTE);
   v_Raw1            RAW (20000);
   v_Targetcharset   VARCHAR2 (40 BYTE);
   v_Dbcharset       VARCHAR2 (40 BYTE);
   v_inf             UTL_FILE.FILE_TYPE;
BEGIN
   -- define target CHARACTERSET
   v_Targetcharset := 'AL16UTF16LE';
   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'AL32UTF8';


   -- first write a BOM for AL16UTF16LE
   -- CHR(15711167) is the BOM value in *AL32UTF8*
   v_Data1 := CHR (15711167);


   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 :=
      UTL_RAW.cast_to_raw (CONVERT (v_Data1, v_Targetcharset, v_Dbcharset));
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);


   -- now write data

   SELECT col1 INTO v_Data1 FROM utl_conv_example2;

   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 :=
      UTL_RAW.cast_to_raw (CONVERT (v_Data1, v_Targetcharset, v_Dbcharset));
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/


-- AL16UTF16LE =   Little Endian,
-- assuming WE8MSWIN1252 NLS_CHARACTERSET db


drop table utl_conv_example2 purge;
create table utl_conv_example2(col1 varchar2(1000));
insert into utl_conv_example2 values(UNISTR('\20AC\00c2'));
commit;
CREATE OR REPLACE DIRECTORY UTL_DIR AS 'd:\temp';

DECLARE
   v_Data1           VARCHAR2 (4000 BYTE);
   v_Raw1            RAW (20000);
   v_Targetcharset   VARCHAR2 (40 BYTE);
   v_Dbcharset       VARCHAR2 (40 BYTE);
   v_inf             UTL_FILE.FILE_TYPE;
BEGIN
   -- define target CHARACTERSET
   v_Targetcharset := 'AL16UTF16LE';
   -- define current NLS_CHARACTERSET
   v_Dbcharset := 'WE8MSWIN1252';


   -- first write a BOM for AL16UTF16LE
   -- seen  U+FEFF  is not known in WE8MSWIN1252
   -- compose a 2 byte FF FE sequence and simply write this to the file as workaround

   v_Data1 := CHR (255) || CHR (254);


   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 := UTL_RAW.CAST_TO_RAW (v_Data1);
   -- note the lack of conversion
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);



   -- now write data

   SELECT col1 INTO v_Data1 FROM utl_conv_example2;

   v_inf := UTL_FILE.FOPEN ('UTL_DIR', 'writing.txt', 'AB');
   v_Raw1 :=
      UTL_RAW.cast_to_raw (CONVERT (v_Data1, v_Targetcharset, v_Dbcharset));
   UTL_FILE.PUT_RAW (v_inf, v_Raw1);
   UTL_FILE.FClose (v_inf);
EXCEPTION
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'UTL_FILE Error');
END;
/

Debugging
If you have problems getting any of the pl/sql code in this note to work it is probably useful to replace the EXCEPTION clause with a 'full' EXCEPTION clause like this:

EXCEPTION
   WHEN UTL_FILE.INVALID_PATH
   THEN
      RAISE_APPLICATION_ERROR (-20100, 'Invalid Path');
   WHEN UTL_FILE.INVALID_MODE
   THEN
      RAISE_APPLICATION_ERROR (-20101, 'Invalid Mode');
   WHEN UTL_FILE.INVALID_FILEHANDLE
   THEN
      RAISE_APPLICATION_ERROR (-20102, 'Invalid Filehandle');
   WHEN UTL_FILE.INVALID_OPERATION
   THEN
      RAISE_APPLICATION_ERROR (-20103, 'Invalid Operation -- May signal a file locked by the OS');
   WHEN UTL_FILE.READ_ERROR
   THEN
      RAISE_APPLICATION_ERROR (-20104, 'Read Error');
   WHEN UTL_FILE.WRITE_ERROR
   THEN
      RAISE_APPLICATION_ERROR (-20105, 'Write Error');
   WHEN UTL_FILE.INTERNAL_ERROR
   THEN
      RAISE_APPLICATION_ERROR (-20106, 'Internal Error');
   WHEN NO_DATA_FOUND
   THEN
      RAISE_APPLICATION_ERROR (-20107, 'No Data Found');
   WHEN VALUE_ERROR
   THEN
      RAISE_APPLICATION_ERROR (-20108, 'Value Error');
   WHEN OTHERS
   THEN
      RAISE_APPLICATION_ERROR (-20109, 'Unknown UTL_FILE Error');
END;

Known issues
<bug 7012221> UTL_FILE DEPENDS ON SERVER-SIDE NLS_LANG, (not fixed yet),
while utl_file does not use NLS_LANG for conversion it does use it for the line termination this causes issues when using UTL_FILE when the database and listener where started with NLS_LANG set to UTF8 or AL32UTF8 errors seen inlcude ORA-29285: file write error., workaround start database and listener using NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 (or other 8 bit characterset)
when using GridInfrastructure check the  setting in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt , when using RAC check the setting in svrctl for the database and listener(s)
similar issues as <bug 7012221> : BUG 16937428 - CHARACTER ACCENTS AREN'T DISPLAYED CORRECTLY WITH UTL_FILE FUNCTIONS. 

With UTL_FILE package, PL/SQL programs can Read and Write operating system text files. It provides a restricted version of OS stream file I/O. Procedures in UTL_FILE can also raise predefined PL/SQL exceptions such as NO_DATA_FOUND / VALUE_ERROR.


UTL_FILE
Package Exceptions
Description
INVALID_PATH
File location is invalid.
INVALID_MODE
open_mode parameter in FOPEN is invalid.
INVALID_FILEHANDLE
File handle is invalid.
INVALID_OPERATION
File could not be opened or operated on as requested.
READ_ERROR
Destination buffer too small or Operating System error occurred during the read operation.
WRITE_ERROR
Operating system error occurred during the write operation.
INTERNAL_ERROR
Unspecified PL/SQL error
CHARSETMISMATCH
File is opened using FOPEN_NCHAR, but later I/O operations use nonchar functions such as PUTF or GET_LINE.
FILE_OPEN
Requested operation failed because the file is open.
INVALID_MAXLINESIZE
MAX_LINESIZE value for FOPEN( ) is invalid; it should be within the range 1 to 32767.
INVALID_FILENAME
The filename parameter is invalid.
ACCESS_DENIED
Permission to access to the file location is denied.
INVALID_OFFSET
Causes of the INVALID_OFFSET exception:
·                     ABSOLUTE_OFFSET = NULL and RELATIVE_OFFSET = NULL, or
·                     ABSOLUTE_OFFSET < 0, or
·                     Either offset caused a seek past the end of the file
DELETE_FAILED
Requested file delete operation failed
RENAME_FAILED
Requested file rename operation failed


Description
FCLOSE
Closes a file
FCLOSE_ALL
Closes all open file handles
FCOPY
Copies a contiguous portion of a file to a newly created file
FFLUSH
Physically writes all pending output to a file
FGETATTR
Reads and returns the attributes of a disk file
FGETPOS (fn)
Returns the current relative offset position within a file, in bytes
FOPEN (fn)
Opens a file for input or output
FOPEN_NCHAR (fn)
Opens a file in Unicode for input or output
FREMOVE
Deletes a disk file, assuming that you have sufficient privileges
FRENAME
Renames an existing file to a new name, similar to the UNIX mv function
FSEEK
Adjusts the file pointer forward or backward within the file by the number of bytes specified
GET_LINE
Reads text from an open file
GET_LINE_NCHAR
Reads text in Unicode from an open file
GET_RAW
Reads a RAW string value from a file and adjusts the file pointer ahead by the number of bytes read
IS_OPEN (fn)
Determines if a file handle refers to an open file
NEW_LINE
Writes one or more operating system-specific line terminators to a file
PUT
Writes a string to a file
PUT_LINE
Writes a line to a file, and so appends an operating system-specific line terminator
PUT_LINE_NCHAR
Writes a Unicode line to a file
PUT_NCHAR
Writes a Unicode string to a file
PUTF
PUT procedure with formatting
PUTF_NCHAR
PUT_NCHAR procedure with formatting, & writes a Unicode string to a file, with formatting
PUT_RAW
Accepts as input a RAW data value and writes the value to the output buffer


UTIL_FILE Subprograms
Description
Syntax
Notes
FCLOSE
Closes an open file identified by a file handle
Exceptions:
WRITE_ERROR
INVALID_FILEHANDLE
UTL_FILE.FCLOSE (
file IN OUT FILE_TYPE);
·         If there is buffered data yet to be written when FCLOSE runs, then you may receive a WRITE_ERROR exception when closing a file.
FCLOSE_ALL
Closes all open file handles for the session. This should be used as an emergency cleanup procedure, for example, when a PL/SQL program exits on an exception.
Exceptions:
WRITE_ERROR
UTL_FILE.FCLOSE_ALL;
·         FCLOSE_ALL does not alter the state of the open file handles held by the user.
·         This means that an IS_OPEN test on a file handle after an FCLOSE_ALL call still returns TRUE, even though the file has been closed. No further read or write operations can be performed on a file that was open before an FCLOSE_ALL.
FCOPY
Copies a contiguous portion of a file to a newly created file
Exceptions:
INVALID_FILENAME
INVALID_PATH
INVALID_OPERATION
INVALID_OFFSET
READ_ERROR
WRITE_ERROR
UTL_FILE.FCOPY (
src_location IN VARCHAR2,
src_filename IN VARCHAR2,
dest_location IN VARCHAR2,
dest_filename IN VARCHAR2,
start_line IN BINARY_INTEGER DEFAULT 1,
end_line IN BINARY_INTEGER DEFAULT NULL);
·         By default, the whole file is copied if the start_line and end_line parameters are omitted. The source file is opened in read mode.
·         The destination file is opened in write mode.
·         Starting and ending line number can optionally be specified to select a portion from the center of the source file for copying.
FFLUSH
Physically writes all pending output to a file
Exceptions:
INVALID_FILENAME
INVALID_MAXLINESIZE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.FFLUSH (
file IN FILE_TYPE);
·         Normally, data being written to a file is buffered. The FFLUSH procedure forces the buffered data to be written to the file. The data must be terminated with a newline character.
·         Flushing is useful when the file must be read while still open. For example, debugging messages can be flushed to the file so that they can be read immediately.
FGETATTR
Reads & returns the attributes of a disk file
Exceptions:
INVALID_PATH
INVALID_FILENAME
INVALID_OPERATION
READ_ERROR
ACCESS_DENIED
UTL_FILE.FGETATTR(
location IN VARCHAR2,
filename IN VARCHAR2,
fexists OUT BOOLEAN,
file_length OUT NUMBER,
block_size OUT BINARY_INTEGER);
FGETPOS (fn)
Returns the current relative offset position within a file, in bytes
Exceptions
INVALID_FILEHANDLE
INVALID_OPERATION
READ_ERROR
UTL_FILE.FGETPOS (
file IN FILE_TYPE)
RETURN PLS_INTEGER;
■ If file is opened for byte mode operations, and then the INVALID OPERATION exception is raised.
■ Return Values: FGETPOS returns the relative offset position for an open file, in bytes. It raises an exception if the file is not open. It returns 0 for the beginning of the file.
FOPEN (fn)
Opens a file for input or output. 
You can specify the maximum line size and have a maximum of 50 files open simultaneously
Exceptions:
INVALID_MAXILINESIZE
INVALID_MODE
INVALID_OPERATION
INVALID_PATH
UTL_FILE.FOPEN (
location IN VARCHAR2,
filename IN VARCHAR2,
open_mode IN VARCHAR2,
max_linesize IN BINARY_INTEGER DEFAULT 1024)
RETURN FILE_TYPE;

·         The file location and file name parameters must be supplied to the FOPEN function as quoted strings so that the file location can be checked against the list of accessible directories as specified by the ALL_DIRECTORIES view of accessible directory objects.
·         Return Values: FOPEN returns a file handle, which must be passed to all subsequent procedures that operate on that file. The specific contents of the file handle are private to the UTL_FILE package, and individual components should not be referenced or changed by the UTL_FILE user.
o    It returns FILE_TYPE -- Handle to open file
FOPEN_NCHAR (fn)
Opens a file in national character set mode (Unicode) for input or output, with the maximum line size specified.
You can have a maximum of 50 files open simultaneously. With this function, you can read or write a text file in Unicode instead of in the DB character set.
Exceptions:
INVALID_MAXILINESIZE
INVALID_MODE
INVALID_OPERATION
INVALID_PATH
UTL_FILE.FOPEN_NCHAR (
location IN VARCHAR2,
filename IN VARCHAR2,
open_mode IN VARCHAR2,
max_linesize IN BINARY_INTEGER DEFAULT 1024)
RETURN FILE_TYPE;
·         Even though the contents of an NVARCHAR2 buffer may be AL16UTF16 or UTF8 (depending on the national character set of the DB), the contents of the file are always read and written in UTF8. UTL_FILE converts between UTF8 and AL16UTF16 as necessary.
·         Return Values: FOPEN_NCHAR returns a file handle, which must be passed to all subsequent procedures that operate on that file. The specific contents of the file handle are private to the UTL_FILE package, and individual components should not be referenced or changed by the UTL_FILE user.
o    It returns FILE_TYPE -- Handle to open file
FREMOVE
Deletes a disk file, assuming that you have sufficient privileges
Exceptions:
ACCESS_DENIED
DELETE_FAILED
INVALID_FILENAME
INVALID_OPERATION
INVALID_PATH
UTL_FILE.FREMOVE (
location IN VARCHAR2,
filename IN VARCHAR2);
·         FREMOVE does not verify privileges before deleting a file. The O/S verifies file and directory permissions. An exception is returned on failure.
FRENAME
Renames an existing file to a new name, similar to the UNIX mv function
Exceptions:
ACCESS_DENIED
INVALID_FILENAME
INVALID_PATH
RENAME_FAILED
UTL_FILE.FRENAME (
src_location IN VARCHAR2,
src_filename IN VARCHAR2,
dest_location IN VARCHAR2,
dest_filename IN VARCHAR2,
overwrite IN BOOLEAN DEFAULT FALSE);
FSEEK
Adjusts the file pointer forward / backward within the file by the number of bytes specified
Exceptions:
INVALID_FILEHANDLE
INVALID_OFFSET
INVALID_OPERATION
READ_ERROR
UTL_FILE.FSEEK (
file IN OUT UTL_FILE.FILE_TYPE,
absolute_offset IN PL_INTEGER DEFAULT NULL,
relative_offset IN PLS_INTEGER DEFAULT NULL);
·         Using FSEEK, you can read previous lines in the file without first closing & reopening the file. You must know the number of bytes by which you want to navigate.
·         If relative_offset, the procedure seeks forward. If relative_offset > 0, or backward, if relative_offset < 0, the procedure seeks through the file by the number of relative_offset bytes specified.
·          If the beginning of the file is reached before the number of bytes specified, and then the file pointer is placed at the beginning of the file. If the end of the file is reached before the number of bytes specified, then an INVALID_OFFSET error is raised.

·         If absolute_offset, the procedure seeks to an absolute location specified in bytes.
·         If file is opened for byte mode operations, then the INVALID OPERATION exception is raised.
GET_LINE
■ Reads text from an open file identified by the file handle and places the text in the output buffer parameter.
■ Text is read up to, but not including, the line terminator, or up to the end of the file, or up to the end of the len parameter. It cannot exceed the max_linesize specified in FOPEN.
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
NO_DATA_FOUND
READ_ERROR
UTL_FILE.GET_LINE (
file IN FILE_TYPE,
buffer OUT VARCHAR2,
len IN PLS_INTEGER DEFAULT NULL);
■ If the line does not fit in the buffer, a READ_ERROR exception is raised. If no text was read due to end of file, the NO_DATA_FOUND exception is raised. If the file is opened for byte mode operations, the INVALID_OPERATION exception is raised.
■ Because the line terminator character is not read into the buffer, reading blank lines returns empty strings.
■ Maximum size of the buffer parameter is 32767 bytes unless you specify a smaller size in FOPEN. If unspecified, Oracle supplies a default value of 1024.
GET_LINE_NCHAR
Reads text in Unicode from an open file
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
NO_DATA_FOUND
READ_ERROR
UTL_FILE.GET_LINE_NCHAR (
file IN FILE_TYPE,
buffer OUT NVARCHAR2,
len IN PLS_INTEGER DEFAULT NULL);
■ This procedure reads text from the open file identified by the file handle and places the text in the output buffer parameter. With this function, you can read a text file in Unicode instead of in the DB character set. The file must be opened in national character set mode, and must be encoded in the UTF8 character set.
■ The expected buffer datatype is NVARCHAR2. If a variable of another datatype, such as NCHAR, NCLOB, or VARCHAR2 is specified, PL/SQL will perform standard implicit conversion from NVARCHAR2 after the text is read.
GET_RAW
Reads a RAW string value from a file and adjusts the file pointer ahead by the number of bytes read.
UTL_FILE.GET_RAW ignores line terminators.
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
LENGTH_MISMATCH
NO_DATA_FOUND
READ_ERROR
UTL_FILE.GET_RAW (
file IN UTL_FILE.FILE_TYPE,
buffer OUT NOCOPY RAW,
len IN PLS_INTEGER DEFAULT NULL);
■ The subprogram will raise No_Data_Found when it attempts to read past the end of the file. Your application should allow for this by catching the exception in its processing loop.

PROCEDURE Sys.p (n IN VARCHAR2) IS
h UTL_FILE.FILE_TYPE := UTL_FILE.FOPEN('D', n, 'r', 32767);
Buf RAW(32767);
Amnt CONSTANT PLS_INTEGER := 32767;
BEGIN
LOOP
BEGIN
Utl_File.Get_Raw(h, Buf, Amnt);
-- Do something with this chunk
EXCEPTION WHEN No_Data_Found THEN EXIT; END;
END LOOP;
UTL_FILE.FCLOSE (h);
END;
IS_OPEN (fn)
Determines if a file handle refers to an open file
Exceptions:
INVALID_FILEHANDLE
UTL_FILE.IS_OPEN (
file IN FILE_TYPE)
RETURN BOOLEAN;
■ This function tests a file handle to see if it identifies an open file. IS_OPEN reports only whether a file handle represents a file that has been opened, but not yet closed. It does not guarantee that there will be no operating system errors when you attempt to use the file handle.
■ Return Values: TRUE or FALSE
NEW_LINE
Writes one or more operating system-specific line terminators to a file identified by the input file handle.
UTL_FILE.NEW_LINE (
file IN FILE_TYPE,
lines IN BINARY_INTEGER := 1);
This procedure is separate from PUT because the line terminator is a platform-specific character or sequence of characters.
PUT
Writes a String to a file
(i.e., writes the text string stored in the buffer parameter to the open file identified by the file handle.)
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUT (
file IN FILE_TYPE,
buffer IN VARCHAR2);
■ The file must be open for write operations. No line terminator is appended by PUT; use NEW_LINE to terminate the line or use PUT_LINE to write a complete line with a line terminator.
■ Maximum size of the buffer parameter is 32767 bytes unless you specify a smaller size in FOPEN. If unspecified, Oracle supplies a default value of 1024. The sum of all sequential PUT calls cannot exceed 32767 without intermediate buffer flushes.
■ If file is opened for byte mode operations, and then the INVALID OPERATION exception is raised.
PUT_LINE
Writes a line to a file, and so appends an operating system-specific line terminator
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUT_LINE (
file IN FILE_TYPE,
buffer IN VARCHAR2,
autoflush IN BOOLEAN DEFAULT FALSE);
■ This procedure writes the text string stored in the buffer parameter to the open file identified by the file handle. The file must be open for write operations.
■ PUT_LINE terminates the line with the platform-specific line terminator character or characters.
PUT_LINE_NCHAR
Writes a Unicode line to a file
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUT_LINE_NCHAR (
file IN FILE_TYPE,
buffer IN NVARCHAR2);
■ This procedure writes the text string stored in the buffer parameter to the open file identified by the file handle. With this function, you can write a text file in Unicode instead of in the DB character set. This procedure is equivalent to the PUT_NCHAR, except that the line separator is appended to the written text.
PUT_NCHAR
Writes a Unicode string to a file
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUT_NCHAR (
file IN FILE_TYPE,
buffer IN NVARCHAR2);
■ This procedure writes the text string stored in the buffer parameter to the open file identified by the file handle.
■ With this function, you can write a text file in Unicode instead of in the DB character set. The file must be opened in the national character set mode. The text string will be written in the UTF8 character set.
■ The expected buffer datatype is NVARCHAR2. If a variable of another datatype is specified, PL/SQL will perform implicit conversion to NVARCHAR2 before writing the text.
PUTF
PUT procedure with formatting
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUTF (
file IN FILE_TYPE,
format IN VARCHAR2,
[arg1 IN VARCHAR2 DEFAULT NULL,
. . .
arg5 IN VARCHAR2 DEFAULT NULL]);
■ This procedure is a formatted PUT procedure. It works like a limited printf( ).
■ If file is opened for byte mode operations, then the INVALID OPERATION exception is raised.
■ The format string can contain any text, but the character sequences %s and \n have special meaning.
%s -- Substitute this sequence with the string value of the next argument in the argument list.
\n -- Substitute with the appropriate platform-specific line terminator.
PUTF_NCHAR
PUT_NCHAR procedure with formatting, & writes a Unicode string to a file, with formatting
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUTF_NCHAR (
file IN FILE_TYPE,
format IN NVARCHAR2,
[arg1 IN NVARCHAR2 DEFAULT NULL,
. . .
arg5 IN NVARCHAR2 DEFAULT NULL]);
■ This procedure is a formatted version of a PUT_NCHAR Procedure. Using PUTF_NCHAR, you can write a text file in Unicode instead of in the DB character set. It accepts a format string with formatting elements \n and %s, and up to five arguments to be substituted for consecutive instances of %s in the format string. The expected datatype of the format string and the arguments is NVARCHAR2.
■ If variables of another DB are specified, PL/SQL will perform implicit conversion to NVARCHAR2 before formatting the text. Formatted text is written in the UTF8 character set to the file identified by the file handle. The file must be opened in the national character set mode.
PUT_RAW
Accepts as input a RAW data value and writes the value to the output buffer
Exceptions:
INVALID_FILEHANDLE
INVALID_OPERATION
WRITE_ERROR
UTL_FILE.PUT_RAW (
file IN UTL_FILE.FILE_TYPE,
buffer IN RAW,
autoflush IN BOOLEAN DEFAULT FALSE);
■ You can request an automatic flush of the buffer by setting the third argument to TRUE.


No comments:

Post a Comment