Databases Internationalization
From Scratchpad
Contents |
[edit] DB2
[edit] Character Set/Data Type
-
[edit] Supported Character Sets
- DB2 UDB supports UTF-8 for char and varchar and UTF-16 for graphic and vargraphic. see Supported territory codes and code pages for the list of supported character sets.
[edit] Setup
- When database is created, character set can be specified. And database configuration can be checked by GET DATABASE CONFIGURATION command. see GET DATABASE CONFIGURATION Command for details.
</dd>
[edit] Collation/Comparison
-
[edit] Supported Collations
- DB2 supports the following collation sequences.
- COMPATIBILITY
- The DB2 Version 2 collating sequence. Some collation tables have been enhanced. This option specifies that the previous version of these tables is to be used.
- IDENTITY
- Identity collating sequence, in which strings are compared byte for byte.
- IDENTITY_16BIT
- CESU-8 (Compatibility Encoding Scheme for UTF-16: 8-Bit) collation sequence as specified by the Unicode Technical Report #26, which is available at the Unicode Corsortium Web site (www.unicode.org). 1 This option can only be specified when creating a Unicode database.
- UCA400_NO
- The UCA (Unicode Collation Algorithm) collation sequence that is based on the Unicode Standard version 4.00 with normalization implicitly set to on. Details of the UCA can be found in the Unicode Technical Standard #10, which is available at the Unicode Consortium Web site (www.unicode.org). This option can only be used when creating a Unicode database.
- UCA400_LTH
- The UCA (Unicode Collation Algorithm) collation sequence that is based on the Unicode 9 Standard version 4.00 but will sort all Thai characters according to the Royal Thai Dictionary order. Details of the UCA can be found in the Unicode Technical Standard #10 available at the Unicode Consortium Web site (www.unicode.org). This option can only be used when creating a Unicode database. 9 Note that this collator might order Thai data differently from the 9 NLSCHAR collator option.
- NLSCHAR
- System-defined collating sequence using the unique collation rules for the specific code set/territory.
- Note: This option can only be used with the Thai code page (CP874). If this option is specified in non-Thai environments, the command will fail and return the error SQL1083N with Reason Code.
- SYSTEM
- Collating sequence that is based on the database territory. This option cannot be specified when creating a Unicode database. This is the default.
[edit] Setup
- Collation can be specified when the database is created and it cannot be changed. see CREATE DATABASE Command
[edit] Indexing
- Indexing will be done based on database collation setting.
</dd>
- DB2 supports the following collation sequences.
[edit] Calendar
[edit] Timezone
-
[edit] Supported Timezones
- Timezone should be handled in the different layer.
[edit] Setup
- n.a.
</dd>
[edit] Session Locale/Formatting
-
[edit] Session Locale
- Territory has to be set when the database is created as well as code page. And session language will be determined based on the client locale. e.g. if clp is launched under locale=en_US, the session language will be en_US.
- TODO: It is not mentioned how locale can be set over the session when JDBC connection is built.
[edit] Date/Time
- Date/time formatting is determined by territory code. The default format for locale can be found at Date and time formats by territory code. Also, it is possible to specify the format with VARCHAR_FORMAT, TIMESTAMP_FORMAT. Also DB2 provides TIMESTAMP_ISO to handle ISO8601 format.
[edit] Number
- Cannot find much information about number formatting. Need to check the behavior with DB2?
</dd>
[edit] SQL Server
[edit] Character Set/Data Type
-
[edit] Supported Character Sets
- SQL Server supports Unicode with nchar, nvarchar, and ntext only. Its encoding is UTF-16. char, varchar and text supports only code page of OS where SQL Server runs.
[edit] Setup
- When data types are nchar, nvarchar and ntext, it is always UTF-16 and there is no additional setup required for database character set. In case it is necessary to check character set of database, you can check it by using sp_helpdb <dbname> stored procedure.
</dd>
[edit] Collation/Comparison
-
[edit] Supported Collations
- SQL Server 2005 provides two groups of collations: Windows collations and SQL collations. We will use Windows collations when linguistic sorting is required. For other cases, we should use binary order as it is today.
- Windows Collations
- Windows collations are collations defined for SQL Server to support Windows locales. For a list of these collations, see Collation Settings in Setup. By specifying a Windows collation for SQL Server, the instance of SQL Server uses the same code pages and sorting and comparison rules as an application that is running on a computer for which you have specified the associated Windows locale. For example, the French Windows collation for SQL Server matches the collation attributes of the French locale for Windows.
- See also Windows Collation Sorting Styles
- SQL Collations
- SQL collations are a compatibility option to match the attributes of common combinations of code-page number and sort orders that have been specified in earlier versions of SQL Server. Many of these collations support suffixes for case, accent, kana, and width sensitivity, but not always. For more information, see Using SQL Collations.
[edit] Setup
- Collation settings, which include character set, sort order, and other locale-specific settings, are fundamental to the structure and function of Microsoft SQL Server databases. Within your organization, you should develop a standard for collation settings, and apply these settings at the time that you install SQL Server. Many server-to-server activities can fail or yield inconsistent results if collation settings are not consistent across servers. Select a Microsoft Windows locale to match the collation settings in other instances of SQL Server 2005; or select SQL Collations to match settings with the sort orders in earlier versions of SQL Server.
- SQL Server 2005 supports setting collations at the following levels of a SQL Server 2005 instance:
- Server level
- Database level
- Column level
- Expression level
- For more information on rebuilding system databases to specify a new system collation, see How to: Install SQL Server 2005 from the Command Prompt.
- In case collation is not set at any level, the default sorting will be determined by OS Locale. see Collation Settings in Setup for details.
[edit] Indexing
- When creating database schema, collation can be specified for the columns as below:
-
USE tempdb GO CREATE TABLE TestTab ( id int, GreekCol nvarchar(10) collate greek_ci_as, LatinCol nvarchar(10) collate latin1_general_cs_as )</dd>
- SQL Server 2005 provides two groups of collations: Windows collations and SQL collations. We will use Windows collations when linguistic sorting is required. For other cases, we should use binary order as it is today.
[edit] Calendar
-
[edit] Supported Calendars
- SQL Server supports Gregorian/Hijri Calendar with CAST/Convert functions. Other calendars such as Japanese Imperial, ROC Official and Thai Buddhist are not supported. Calendar handling need to be done in the different layer. see also CAST and CONVERT
[edit] Setup
- Calendar is specified only at expression level.
</dd>
[edit] Timezone
-
[edit] Supported Timezones
- Timezone seems to be handled at the different layer.
[edit] Setup
- n.a.
</dd>
[edit] Session Language/Formatting
-
[edit] Session Locale
- SQL server's locale is the same as Regional setting of OS. Also session language can be set by "SET LANGUAGE". see Session Language and sys.syslanguages (the list of supported languages) for details.
[edit] Date/Time
- There are two date types: datetime, smalldatetime. Values with the datetime data type are stored internally by the Microsoft SQL Server 2005 Database Engine as two 4-byte integers. The first 4 bytes store the number of days before or after the base date: January 1, 1900. The base date is the system reference date. The other 4 bytes store the time of day represented as the number of milliseconds after midnight.
- The smalldatetime data type stores dates and times of day with less precision than datetime. The Database Engine stores smalldatetime values as two 2-byte integers. The first 2 bytes store the number of days after January 1, 1900. The other 2 bytes store the number of minutes since midnight. datetime values are rounded to increments of .000, .003, or .007 seconds, as shown in the following table.
- Format can be set by "SET DATEFORMAT" statement at session level. If it is not set, it is determined by session language. Session language can be set by "SET LANGUAGE".
[edit] Number
- Cannot find much information about number formatting. Need to check the behavior with SQL Server?
</dd>
