Cardinality

In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range.
It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one or many-to-many.

High cardinality columns are those with very unique or uncommon data values. For example, in a database table that stores bank account numbers, the “Account Number” column should have very high cardinality – by definition, every item of data in this column should be totally unique.
Normal cardinality columns are those with a somewhat unique percentage of data values. For instance, if a table holds customer information, the “Last Name” column would have normal cardinality. Not every last name will be unique (for example, there will likely be several occurrences of “Smith”) but on the whole, the data is fairly non-repetitive.
Low cardinality columns are those with very few unique values. In a customer table, a low cardinality column would be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column.
Cardinality relationships between tables can take the form of one-to-one, one-to-many (whose reversal is many-to-one) or many-to-many. These terms simply refer to the relationships of data between the tables. For example, the relationship between the “Customers” table and the “Bank Accounts” table is one-to-many, that is, one customer can have several accounts, but one account cannot belong to more than one customer. That is, of course, assuming this bank has never heard of joint accounts!
This definition was written in the context of Databases

Post a Comment

0 Comments