Distribution of Structures in Space Groups

When the original Naval Research Laboratory Crystal Lattice Structures web page went on line one of the first questions we received was “Are there crystal structures in every space group?”

This was not an unreasonable question. In the fall of 2001 the site only contained 124 structures occupying 51 of the 230 space groups, an unsurprising result for a fairly new web site only updated on an ad hoc basis.

Of course the answer to the question is “yes, there are structures in every space group,” as demonstrated by Frank Hoffmann's wonderful space group list projectI,II.

A better question is how crystal structures are distributed between the space groups. Hoffmann notes that some space groups are very sparsely populated. In particular (CH)17FeO4Pt is the only compound to have ever been found in space group $P422$ #89.III,IV The Inorganic Crystal Structures Database (ICSD) has no entries for space group $P4_{2}22$ #93, although Hoffmann did find one in the literature, and the mostly organic Cambridge Structural Database (CSD) has nine entries, a mere 0.0007% of the total. At the other extreme, the 2024 CSD rankings list 404,837 entries (34%) for space group $P2_{1}c$ #14, and the most populated group in the ICSD is $Pnma$ #62, taking up 7.6% of the database.

What we'd like to do, then, is to tabulate the number of structures in each space group and then ask the questions

Which are the most populated space groups?
and
Which are the least populated space groups?

Once we have this information, perhaps (that should be in a very large font) we can begin to understand why structures are distributed the way they are, or at least be able to determine which types of space groups are heavily (or lightly) populated.

As it turns out, answer to both of question in the last paragraph is the same:

It depends.

Specifically, it depends on what structures are being counted, and how we're counting them. We'll discuss this below.

Classification of Space Groups

Before doing any enumeration of crystal structures in space groups, we should first ask exactly what it is that we wish to count. Do we consider every entry in the CSD or the ICSD as an individual structure? If we do that, then our calculations will be heavily weighted toward organic structures, since the CSD has about four times a many entries as the ICSD. Even if we separate the CSD and ICSD analysis we still have to do some thinking: the ICSD has nearly four thousand entries for compounds in the rock salt (halite) structure. Do we count all of those as one data point, or four thousand?

In this article we're going to have it both ways, or actually in multiple different ways. We'll look at the distribution of structures using the raw CSD and ICSD data, but we'll also try to lump the data into prototypes, so that the four thousand or so rock salt structures only count as one entry. When we look at the raw data, we'll find that some groups are highly favored,, e.g., $P2_{1}c$ takes up 34% of the entries in the CSD, $Pnma$ accounts for 7.6% of the ICSD, $P422$ and $P4_{2}22$ can be mostly ignored, etc. Does this favoritism persist when we go to prototypes? We'll see.

Another interesting topic is the question of what kind of space groups are favored. In our study of chiral structures we found that only 65 groups supported chiral structures, and that those were divided into two classes: Sohncke Class II groups, which are themselves chiral, and Sohncke Class III, which are achiral but have no mirror operations and so only support chiral structures. Our initial study found that only 1% of all structures, organic and inorganic, are in Sohncke II groups, 15-20% of all organic structures (depending on the source of the data) are in a Sohncke III group, while only 4% of all inorganic structures are in any of those groups.

Given that, let's discuss some categories of interest. We'll start with the types of chiral and achiral space groups discussed in the Chiral Space Groups article. There we divided the space groups into three classes, with Class I being all space groups that didn't support chiral structures and Classes II and III comprising the Sohncke space groups. In this discussion we'll further divide the Class I space groups into two categories, giving us four types. Each of the 230 3-dimensional space groups falls into one of these types:

  • Centrosymmetric Space Groups: (92 space groups) All of these space groups have an inversion site. That is, for an appropriate choice of origin, if there is an atom at the point (x y z) there is an identical atom at (-x -y -z). This reflection means that these space groups are achiral and any structure in falling into one of these groups is achiral.
  • Achiral Space Groups: (73 groups) This is not the best name, and we are open to suggestion for a better one. Each of these groups has one or more mirror operations of some type: a mirror plane, a glide plane, or a rotation followed by a reflection in a perpendicular plane, and chiral structures won't fit here. None have an inversion site, so they aren't centrosymmetric. That leaves them here, in search of a better descriptive name.
  • Sohncke Class II Space Groups: (22 groups) These are the groups which have a chiral screw operation or its mirror image: $3_{1}$/$3_{2}$, $4_{1}$/$4_{3}$, $6_{1}$/$6_{5}$, or $6_{2}$/$6_{4}$. The 22 groups with these operators form 11 enantiomorphic (mirror image) pairs.
  • Sohncke Class III Space Groups: (43 groups) These groups aren't themselves chiral but, like the Sohncke II groups, they have no reflection operations at all. That means any structure which forms in one of these groups must be chiral. Class III space groups may include screw axes, but they do not have one of the screw operators that form a Class II space group.
The two sets of Sohncke space groups only contain what are called “operations of the first kind,” that is, there are no reflections, only pure rotations or pure rotations followed by a translation. The other 165 space groups include “operations of the second kind,” rotations and/or translations followed by an inversion.

There are three more properties of space groups that are of interest:

  • Each of the 73 symmorphic space groups has a fixed point. No matter which group operation you apply to a structure in a symmorphic space group one point (often set to the origin) will not move. This eliminates screw operations and glide planes. In practice this means that if you look up the Wyckoff positions in the space group tables you won't find any fractions in the operations. For example, space group $P222$ #16 has the operations
    (x y z)   (-x -y z)   (-x y -z)   (x -y -z)   ,
    while space group $P222_{1}$ #17 has
    (x y z)   (-x -y z + ½)   (-x y -z + ½)   (x -y -z)   ,
    where the (-x -y z + ½) is a $2_{1}$ screw operation. $P222$ is symmorphic, while $P222_{1}$ is not.
    Since a symmorphic space group does not contain any explicit screw operations, none of the enantiomorphic space groups making up the Sohncke II class are symmorphic. There are, however, 24 space groups in the Sohncke III class that are symmorphic, though of course they do not include an explicit screw axis.
    Even though a symmorphic space group does not have any explicit screw operations, it can still have screw axes. The explanation for that, however, will have to wait for another day.
  • The 68 polar space groups have at least one axis where the origin is not fixed by the symmetry of the group. The simplest example is space group $P1$ #1, which has the single element (x y z). Suppose we had several atoms in this crystal, at the points (x$_{i}$ y$_{i}$ z$_{i}$). We could shift the origin to (x$_{0}$ y$_{0}$ z$_{0}$), moving the atom locations to (x$_{i}$-x$_{0}$ y$_{i}$-y$_{0}$ z$_{i}$-z$_{0}$) without changing any property of the crystal. On the other hand, space group $P\overline{1}$ #2 has an inversion operation (-x -y -z) as well as (x y z). If there is an atom at (x$_{1}$ y$_{1}$ z$_{1}$) then there must be an identical atom at (-x$_{1}$ -y$_{1}$ -z$_{1}$). This can only happen if the origin of all three axes is fixed.
    Obviously polar crystals cannot be centrosymmetric, nor can they be cubic. Structures in these groups are capable of second-harmonic generation and may also be piezoelectric (Lima-de-Faria, 1990).
    Polar crystals can by symmorphic, and 25 of the 68 polar groups are, indeed, symmorphic, of course including $P1$.
  • Ninety-nine of the space groups contain a screw operation in their Hermann-Mauguin space group symbol. A total of 187 space groups have screw axes, and most of these have multiple screw axes. As we will see below, the presence of certain screw axes seems to be correlated with a large number of observed structures, while the other screw axes, particularly $4_{2}$, suppress the number of structures found. We are investigating this phenomena but have no definitive conclusions to draw at this point, so we will not tabulate the occupancy of space groups with screw axes here, except for special cases which we will discuss below.

All of the above information is collected on the space group information page.

Experimental Distribution

The study of distributions of structures in space groups goes back to at least 1942, and early work in this field is summarized by (Urusov and Nadezhina, 2009). As far as we can tell (the early works being in Russian and not readily available), this consisted of simply counting the number of reports of structures in a given space group and compiling the results. This can certainly lead to an experimental bias, which we will talk about later. For now, though, let's just look at the raw numbers. For this we'll primarily use two sources:

Much of this data is freely available through the Cambridge Crystallographic Data Centre (CCDC) search engine. You can find the CSD or ICSD entry for a structure based on some identification of the structure, such as its name, or information about the publication of the data. If a given compound is not in the ICSD it may nevertheless have a CCDC entry, and we report these in the Encyclopedia when appropriate.

With all of that out of the way, let's look at the distribution of the structures in the CSD and the ICSD, simply counting the number of structure reports that occur in each space group. Start with the organic structures from the CSD: Fig. 1 shows the distribution of entries taken from the 2024 Cambridge Structural Database (CSD). Each bar indicates the number of structures found in a given space group, and the colors indicate the centrosymmetry/chirality/or lack thereof of the group. The number of structures is plotted on a logarithmic scale: over 75% of all the structures in the CSD are either triclinic or monoclinic. This may not be particularly surprising, as most organic systems are formed of molecules which do not have structural forms that easily stack into parallelepipeds. Perhaps more surprising is that the first and fourth most populated space groups are centrosymmetric $P2_{1}$ #4 and non-centrosymmetric $P2_{1}2_{1}2_{1}$ #19, both of which have screw axes. We will look at this phenomena more closely near the end of this article.

Distribution of entries
					     from the 2024 CSD in by
					     space group
Figure 1: The distribution of entries in each space group from the 2024 Cambridge Structural Database. The vertical lines divide the space groups into triclinic (unlabeled, space groups 1 and 2), monoclinic (3-15), orthorhombic (16-74), tetragonal (75-142), trigonal (143-167), hexagonal (168-194), and cubic (195-230). The colored bars indicate space groups which are centrosymmetric (red), non-centrosymmetric but otherwise achiral (green), chiral Sohncke Class II (blue), and Sohncke Class III, which supports chiral structures (purple). The y-axis uses a logarithmic scale, so there are far more triclinic and monoclinic structures than there are structures with higher symmetry.

The CSD is mostly restricted to organic systems. In materials physics we're usually concerned with inorganic materials, so the CSD might not be the best source of data for our purposes. Instead we have the ICSD, where the distribution is shown in Fig. 2. Here we see that the inorganic systems are more heavily weight toward high-symmetry structures. In fact, the highest occupation occurs in the orthorhombic group $Pnma$ #62, although monoclinic $P2_{1}$ #14 comes in second. As with the organics, screw axes are favored. Although it is not apparent from the International notation, the Hermann-Mauguin symbol for $Pnma$ is $P 21/n 21/m 21/a$, showing three screw axes. In addition to that group and $P2_{1}$ (2nd place) the cubic space group $Fd\overline{3}m / F 41/d -3 2/m$ #227 (which includes diamond) in third. In fact, of the top six entries, only $Fm\overline{3}m$ #225 does not have an explicit screw operation, but the Hypertext Book shows that it has multiple $2_{1}$ axes, the four $3_{1}$ and $3_{2}$ axes found in every cubic system, and three $4_{2}$ axes.

The aforementioned gap at $P4_{2}22$ #93 is easily seen. There also appears to be a gap at $P6_{4}$ #172, but this is an illusion of the log scale as it contains one structure. This is somewhat unfair, as its enantiomorphic twin, $P6_{2}$ #171, contains all of five structures, each of which could just as easily have been measured in $P6_{4}$. If we consider that, then enantiomorphic pairs such as $P4_{3}32$ #212 and $P4_{1}32$ #213 should be higher up in the list but they will still be relatively unpopulated compared to the big hitters.

Distribution of entries
					     from the 2023 ICSD in by
					     space group
Figure 2: The distribution of entries in each space group from the Inorganic Crystal Structure Database (ICSD). The classification scheme is the same as in Fig. 1, and the scale of the y-axis is again logarithmic. The gap at space group 89

Let's look at this data in a little more detail. Table 1 shows the distribution of structures by crystal system for the CSD and the ICSD.

Distribution of structural entries from the CSD and ICSD by crystal class.
Class # Space Groups CSD ICSD
  Number % Number % Number %
Triclinic 2 0.87 338,868 26.17 8,720 4.03
Monoclinic 13 5.65 666,959 51.51 36,204 16.73
Orthorhombic 59 25.65 218,603 16.88 45,286 20.92
Tetragonal 68 29.57 28,542 2.20 33,360 15.41
Trigonal 25 10.87 26,124 2.02 22,258 10.28
Hexagonal 27 11.74 7,197 0.56 23,968 11.07
Cubic 36 15.65 8,431 0.65 46,632 21.55

We see that the organic solids are mostly (78%) triclinic or monoclinic, while 79% of the inorganics have higher symmetry. There are very few tetragonal, trigonal, hexagonal, or cubic organic crystals.

What about chirality, centrosymmetry, or the lack of either behavior? That's shown in Table 2. Somewhat surprisingly, about 80% of all reported structures, both organic and inorganic, fit into centrosymmetric space groups, even though those comprise only 40% of all groups. The organic systems tend to be more chiral, which is not surprising given the handedness of biological amino acids and sugars, but even there over 80% of all structures cannot be distinguished from their mirror images.

Distribution of entries from the CSD and ICSD by chirality, centrosymmetry, or the lack of either property, which for lack of a better term we refer to as “achiral”. Sohncke Class II groups are the 22 chiral space groups forming 11 enantiomorphic pairs. Sohncke Class III space groups are the 43 groups that have no mirror operations of any kind, and so only support chiral structures.
Class # Space Groups CSD ICSD
  Number % Number % Number %
Centrosymmetric 92 40.00 1,014,301 78.34 177,507 82.02
Achiral 73 31.74 69,265 5.35 27,692 12.80
Sohncke Class II 22 9.57 13,851 1.07 2,410 1.11
Sohncke Class III 43 18.70 197,307 15.24 8,819 4.07

Now let's look at the distribution of structures which are either symmorphic or polar. This these results are shown in Table 3.

Distribution of structures in the CSD and ICSD which are either in symmorphic or polar space groups.
Property # Space Groups CSD ICSD
  Number % Number % Number %
Symmorphic 73 31.74384,341 29.69 84,709 39.14
Polar 68 29.57 163,111 12.60 20,245 9.35
Symmorphic & Polar 21 9.13 28,121 2.17 5,961 2.75

Although symmorphic and polar space groups both comprise about 30% of the total number of groups, polar space groups are very underrepresented, with only 10% of all structures being in polar groups. It's even worse to be polar & symmorphic: while 9% of all space groups fall into this category, less than 3% of all structures are in these groups. This is comparable to the population of Sohncke Class II. Since symmorphic and Sohncke II groups do not overlap, we've found that 65 space groups (28.26%) contain less than 4% of all crystal structures.

Finally, let's look at the heavy-hitters and the candidates for waivers: the space groups with the largest and smallest populations. Start with the popular groups:

Space groups with the largest number of structures in the CSC and the ICSD.
CSD ICSD
Group Population Group Population
$P2_{1}/c$ #14 440,837 $Pnma$ #62 16,424
$P\overline{1}$ #2 325,946 $P2_{1}/c$ #14 14,711
$C2/c$ #15 106,626 $Fd\overline{3}m$ #227 11,942
$P2_{1}2_{1}2_{1}$ #19 90,094 $Fm\overline{3}m$ #225 11,915
$P2_{1}$ #4 67,053 $I4/mmm$ #139 9,053
$Pbca$ #61 41,436 $P6_{3}/mmc$ #194 8,769
$Pna2_{1}$ #33 17,606 $P\overline{1}$ #2 8,113
$Cc$ #9 13,493 $C2/c$ #15 7,542
$P1$ #1 12,922 $C2/m$ #12 6,849
$Pnma$ #62 12,905 $Pm\overline{3}m$ #221 6,379

On the organic side the distribution is surprisingly narrow: the first two space groups, $P2_{1}/c$ and $P\overline{1}$ account for 59% off all the entries in the CSD. If we add $C2/c$ and $P2_{1}2_{1}2_{1}$ we account for nearly 3/4 of all the entries, and the top ten account for 87%. Inorganic structures are much more spread out, but even here the top ten space groups contain nearly half (47%) of all the entries in the ICSD.

Table 4 also shows that the triclinic space groups $P1$ and $P\overline{1}$ make the top ten of the CSD and $P\overline{1}$ is number seven in the ICSD ($P1$ is #66). This is not terribly surprising, as it is not difficult scramble a bunch of atoms or molecules. What is more surprising is that the top entry in the CSD, $P2_{1}/c$, has a screw axis, and that group ranks second in the ICSD. Indeed, six of the top ten CSD space groups, including $Pbca$ and $Pnma$ have explicitly named screw operations in their Hermann-Mauguin symbols, while the ICSD has four space groups with screw operations in its top ten, including its top three. We will talk about this more later on, but since it is not exactly clear how to distinguish space groups with screw operations from space groups that have screw axes will save a full discussion for another day.

Space groups with the fewest number of structures in the CSC and the ICSD. We have combined the 11 pairs of enantiomorphic space groups, since a structure in one of the pair can appear in the other.
CSD ICSD
Group Population Group Population
$P4_2mc$ #105 3 $P4_{2}22$ #93 0
$P6mm$ #183 4 $P432$ #207 2* (1)
$P\overline{4}m2$ #115 7 $P422$ #89 3* (0)
$P4_{2}32$ #208 8 $P6$ #168 3
$P4mm$ #99 9 $P4_{2}cm$ #101 4
$P4_{2}cm$ #101 9 $P622$ #177 4* (3)
$P\overline{4}2m$ #111 9 $I432$ #211 5* (0)
$P4_{2}22$ #93 10 $P6_{2}$ #171 / $P6_{4}$ #172 6
$Pmm2$ #25 12 $Pcc2$ #27 7* (5)
$Cmm2$ #35 14 $P4_{1}32$ #210 7

Table 5 shows that $P4_{2}22$ has no entries in the ICSD. The one $P422$ entry we have in the Encyclopedia was found using the CCDC search engine. Some of the other counts are suspect: the three entries reported in the ICSD for orthorhombic space group $P422$ are actually in tetragonal space group $P4/mmm$, hence $P422$ really has no ICSD entries, and we write its population as 3* (0). We only noticed this because it was first pointed out by Frank Hoffmann as part of his space group list project, but it prompted us to look at all of the bottom ten groups, where we found another nine structures with higher symmetries than shown in the ICSD, and another space group that has no ICSD entries.

While the $P6_{2}$/$P6_{4}$ enantiomorphic pair has only 6 entries in the ICSD, only two other space groups in the CSD ($P4mm$ and $P6mm$) and one in the ICSD ($P6$) are in the dread “enantiomorphic or symmorphic&polar” category. We do find that four of the CSD entries and two of the ICSD entries have explicitly named $4_{2}$ screw operations. This leads us to prepare one more table:

Population of space groups which have an explicitly named $4_{2}$ screw operation in their Hermann-Mauguin symbol. The second row is the sum of all entries in space groups which are either Sohncke Class II, are both symmorphic & polar, or have a $4_{2}$ operation.
Property # Space Groups CSD ICSD
Named $4_{2}$ Screw 18 7.83 3,422 0.26 3,561 1.65
Sohncke II or Symmorphic & Polar or $4_{2}$ Screw 61 26.52 45,394 3.51 11,932 5.51

This table shows that the space groups which:

  • are in Sohncke Class II, that is, explicitly chiral and one of an enantiomorphic pair, or
  • are both symmorphic and polar, or
  • have an explicitly named $4_{2}$ screw operationV
comprise over 1/4 of all the space groups and yet have only 3.5% of all the entries in CSD and 5.5% of the entries in the ICSD. For whatever reason nature does not seem to like these space groups.

Prototype Distribution

While the distribution plots in Fig. 1 and Fig. 2 are informative, they only count experiments rather than by structure, and so emphasize structures and compounds which are popular, useful, or have many possible chemical compositions. We would like to find a procedure which eliminates or at least minimizes this bias.

The problem can be seen by looking at space group $Fm\overline{e}m$ #225. The ICSD has 11,915 experimental entries in this group. More than one tenth of these entries (1,393) are reports of monatomic samples in the face-centered cubic (A1) structure. This includes multiple determinations of the lattice constant for 49 elements as well as many measurements on alloys. Another 3,894 entries are for compounds with the NaCl (halite) structure. In other words, 44% of the entries for $Pm\overline{3}m$ are taken up by two structures, significantly skewing the distribution. Similar behavior takes place in other space groups. For example, $Ia\overline{3}d$ #230 has 1,384 entries, but 571 of these are for the $S1_{4}$ form of garnet and 434 are for the Y$_{3}$Al$_{5}$O$_{12}$ form. Obviously the data is biased toward structures are frequently found in nature and that can be formed by many different combinations of elements.

What we'd like is to count unique structures, not thousands of samples of rock salt. Crystallographers have actually addressed this, introducing the concept of structure types (Lima-de-Faria, 1990). Briefly, two compounds belong to the same structure type if they are:

  • Isopointal, occupying the same space group (or its enantiomorphic pair) with the same occupied Wyckoff positions;
  • Isoconfigurational, having similar geometric configurations, including similar ratios of b/a, c/a, similar crystallographic angles, and atomic positions; and
  • Crystal-chemically isotypic, that is having similar atoms in similar configurations.
These ideas were later incorporated into the entries listed in the ICSD (Allmann, 2007).

While this categorization into structures is a useful definition it causes some difficulties for the Encyclopedia. Many structure types have include structures with different numbers of elements, a concept foreign to both AFLOW and the Encyclopedia. As an example, the ICSD lists Ho$_{11}$Ge$_{10}$ (AFLOW Label A10B11_tI84_139_dehim_eh2n-001), Tb$_{11}$Si$_{4}$In$_{6}$ (A6B4C11_tI84_139_hm_dei_eh2n-001), and Sc$_{11}$Al$_{2}$Ge$_{8}$ (A2B8C11_tI84_139_h_deim_eh2n-001) under the Ho$_{11}$Ge$_{10}$ structure type. Electronic structure inputs (e.g., VASP POSCAR files) for each of these structures would look very different. One of the capabilities of the Encyclopedia is the generation of such inputs, so it was decided to give break this structure type into three different prototypes. This follows the structure of the AFLOW prototype label (Mehl, 2017; Eckert, 2024) which distinguishes between monatomic, binary, ternary, …, compounds, so that each of the above structures has its own label, as noted.

Ideally we could use AFLOW-XtalFinder (Hicks, 2021) to compare each pair of structures in the ICSD and having the same space group, placing all structures sufficiently close to one another in a unique prototype. In practice this would require millions, if not billions, of calculations, a task well beyond the scope of this brief report. We can, however, determine the AFLOW prototype label for nearly every structure in the ICSD.VI We can then define a unique structural designation by combining the ICSD structure type and the AFLOW prototype label. We then assume that ICSD entries having the same structure type and AFLOW prototype label are in fact in the same prototype. Some ICSD entries do not list a structure type, so we categorize them solely by their AFLOW prototype labels.

This procedure results in a substantial reduction in the number of entries. To use our previous examples, the fcc and NaCl structures are reduced from 1,393 and 3,894, respectively, to one prototype each. Space group $Ia\overline{3}d$ has 1,384 entries in the ICSD but only 22 prototypes.

One caveat to this scheme is that the AFLOW prototype label depends on the alphabetical order of the compounds in the structure. Thus both Cu$_{3}$Au and CuAu$_{3}$ are in the $L1_{2}$ structure, but the former has the label AB3_cP4_221_a_c-001 and the later the label A3B_cP4_221_c_a-001. This means that we will have some problems with duplication. Still we have something much closer to a real index of prototypes with minimal effort. As in example, the ICSD has 1,267 entries in what it calls the “Auricupride#AuCu3” structure. This should reduce to one prototype, but in the quick and dirty scheme described here it has two entries.

Unfortunately we do not have a list of structure types for the entries in the CDC, so we must restrict our study to the ICSD. The results are shown in Fig. 3. It is not that different from the full ICSD plot in Fig. 2. The major change is that the monoclinic space groups $P2_{1}c$ and $C2/c$ and triclinic $P\overline{1}$ have more prototypes than $Pnma$. This is probably because the lower symmetry structures have more possible prototypes for a given set of Wyckoff positions.

Distribution of entries
					     by prototype in each
					     space group
Figure 3: The distribution of prototypes, as defined in the text, in each space group. The classification scheme is the same as in Fig. 1, and the scale of the y-axis is still logarithmic.

What about the prototypes that are actually listed in the Encyclopedia? Their distribution is shown in Fig. 4. At the time of its compilation (January, 2025) there were 2,014 entries, many of them chosen explicitly to populate small occupation space groups, because they had Strukturbericht labels, or because we were interested in the structures for our own research. This compilation bias means that the graph is substantially different from the previous ones. No effort was made to reproduce the distribution found in the ICSD. As the Encyclopedia expands the distribution will approach that of the ICSD, but that point is far in the future.

Distribution of entries
					     in the Encyclopedia in each
					     space group
Figure 4: The distribution of entries in the Encyclopedia in each space group as of 5 January 2025. The classification scheme is the same as in Fig. 1, but now the y-axis scale is linear.

Table 7 shows the distribution of prototypes and Encyclopedia entries in each space group. Even though space group $Pnma$ is not the most highly populated space group we see that overall there is a substantial shift from higher symmetries into orthorhombic crystals.

Distribution of prototypes (Fig. 3) and Encyclopedia entries (Fig. 4) by crystal class.
Class # Space Groups Prototype Encyclopedia
  Number % Number %
Triclinic 2 0.87 6,457 9.76 28 1.39
Monoclinic 13 5.65 19,341 29.25 329 16.34
Orthorhombic 59 25.65 15,504 23.45 546 27.11
Tetragonal 68 29.57 7,442 11.25 370 18.37
Trigonal 25 10.87 6,451 9.76 243 12.07
Hexagonal 27 11.74 5,126 7.75 234 11.62
Cubic 36 15.65 5,803 8.78 264 13.11

Table 8 shows the distribution of prototypes and Encyclopedia entries by chirality and centrosymmetry. The prototype distribution not much changed from the raw data. The Encyclopedia distribution is skewed toward Sohncke Class II, which has three times as many entries as the raw data (organic or inorganic) and prototype data. This is undoubtedly because we emphasized finding structures in each space group, and the Class II groups are difficult to fill experimentally.

Distribution of prototypes (Fig. 3) and Encyclopedia entries (Fig. 4) by chirality, centrosymmetry, or the lack of either property.
Class # Space Groups Prototype Encyclopedia
  Number % Number % Number %
Centrosymmetric 92 40.00 50,903 76.98 1,385 68.77
Achiral 73 31.74 9,567 14.47 388 19.27
Sohncke Class II 22 9.57 897 1.36 66 3.28
Sohncke Class III 43 18.70 4,760 7.20 175 8.69

Table 9 shows the distribution of structures for polar, symmorphic, and named $4_{2}$ screw operations space groups, combining Table 3 and Table 6. Both the prototype and Encyclopedia entries show an increased number of polar systems than the raw ICSD data, but the fraction of systems with screw axes are very similar to what we find in the raw data, with just over 30% of all systems having screw axes.

Distribution of prototypes (Fig. 3) and Encyclopedia entries (Fig. 4) in space groups which are symmorphic, polar, or have an explicitly named $4_{2}$ screw operation, as well as combinations of these properties.
Property # Space Groups Prototype Encyclopedia
  Number % Number % Number %
Symmorphic 73 31.74 24,321 36.78 708 35.15
Polar 68 29.57 9,275 14.03 316 15.69
Symmorphic & Polar 21 9.13 2,918 4.41 100 4.97
Named $4_{2}$ Screw 18 7.83 4,662 1.28 63 3.13
Sohncke II or Symmorphic & Polar or $4_{2}$ Screw 61 26.52 4,662 7.05 228 11.32

Finally, let's look at the space groups with the largest and smallest populations as a function of our prototypes and the Encyclopedia entries. The Prototype list is not much changed from the full ICSD list. Two of the new additions, $P2_{1}/m$ and $Cmcm$, have a screw axis, so now five of the top ten have screw axes.

Space groups with the largest number of prototype structures and Encyclopedia entries
Prototypes Encyclopedia
Group Population Group Population
$P2_{1}/c$ #14 7,825 $Pnma$ #62 105
$P\overline{1}$ #2 5,993 $P2_{1}/c$ #14 93
$C2/c$ #15 3,833 $C2/m$ #12 76
$Pnma$ #62 3,366 $P6_{3}/mmc$ #194 74
$C2/m$ #12 3,278 $C2/c$ #15 70
$R\overline{3}m$ #166 1,461 $R\overline{3}m$ #166 69
$P6_{3}/mmc$ #194 1,457 $Cmcm$ #63 56
$P2_{1}/m$ #11 1,315 $I4/mmm$ #139 50
$Cmcm$ #63 1,254 $P4/mmm$ #123 29
$I4/mmm$ #139 1162 $P\overline{3}m1$ #164 29
Space groups with the smallest number of prototype structures and Encyclopedia entries. The Encyclopedia currently has 21 space groups with only one entry each, so we arbitrarily chose the ones with the larger space group numbers, and so the higher symmetry, to display.
Prototypes Encyclopedia
Group Population Group Population
$P4_{2}22$ #93 0 $I432$ #211 1
$P432$ #207 2* (1) $P31m$ #157 1
$P422$ #89 3* (0) $P4_{2}/nnm$ #134 1
$P4cc$ #103 3 $P\overline{4}b2$ #117 1
$P6$ #168 3 $P\overline{4}2c$ #112 1
$I432$ #211 5* (0) $I4_{1}cd$ #110 1
$P4_{2}cm$ #101 4 $P4_{2}bc$ #106 1
$P622$ #177 4 $P4_{2}mc$ #105 1
$P6_{2}$ #171/$P6_{4}$ #172 5 $P\overline{4}$ #81 1
$Pcc2$ #27 7* (5) $I4_{1}$ #80 1

The prototype list has some of the usual suspects: two space groups with a $4_{2}$ screw operation and one enantiomorphic pair. The Encyclopedia entries are a little different: currently the Encyclopedia has 22 space groups with only one member. For Table 11 we arbitrarily list the ten with the largest space group number. The enantiomorphic pairs do not show up in the Encyclopedia list as we made an explicit effort to find at least one structure in every space group, so the total count for a pair is at least two. Only two pairs fall into this category: $P4_{1}22$/$P4_{3}22$ and $P6_{2}$/$P6_{4}$, with the other pairs having at least four entries between them.

Why Are There So Many Structures With Screw Axes?

As noted in Table 4, over one-third of the structures in the CDC are in space group $P2_{1}/c$, and another 14% are in space groups $P2_{1}2_{1}2_{1}$, $P2_{1}$, $Pna2_{1}$, or $Pnma$, all of which have screw operations mentioned in their Hermann-Mauguin symbols. The ICSD is not as biased, but the top three entries, $Pnma$, $P2_{1}/c$, and $Fd\overline{3}m$ all have screw operations. We might expect to find screw axes in organic systems, as they are often composed of molecules, and the twisting allowed by a screw axis may make it easier to form a compact structure (Cockcroft, 2016).

In fact, organic systems systems seem to prefer multiple screw axes, if possible. To see this, consider the first four monoclinic space groups: $P222$ #16, $P222_{1}$ #17, $P22_{1}2_{1}$ #18, and $P2_{1}2_{1}2_{1}$, #19. As we can see from the space group notation, these have zero, one, two, and three screw axes respectively. Otherwise they are quite similar: they are all in class Sohncke III, allowing chiral structures, they are non-centrosymmetric and non-polar. As we can see in Fig. 5, the number of structures more or less exponentially with the number of screw axes.

We see similar behavior in higher symmetry structures. Space group $Pnma$ #62 is the most populated entry in the ICSD and the most populated higher symmetry structure in our prototype list. $Pbca$ #61 has more organic entries than any other space group beyond #19, and it also has three screw axes. Obviously nature favors space groups with screw axes, the more the better.

Distribution of entries
					     in the 1st four
					     monoclinic space groups
Figure 5: The distribution of entries in each of the first four monoclinic space groups for all four of our sources.

From our figures, we see that $P2_{1}2_{1}2_{1}$ is the most heavily populated non-centrosymmetric space group in all distributions, including 32% of all the non-centrosymmetric entries in the CDC. It appears that if you must be a non-centrosymmetric crystal with a screw axis, you want to have as many screw axes as possible.

The Benford Distribution

Finally, let's have a little fun. Benford's Law states that if take our data, count the number of entries in a given space group, and then ask how many of those space groups have a count beginning with the digit $d$, those counts will be distributed along a logarithmic scale, with the probability of an entry beginning with the digit $d$ proportional to

$P(d) = \log_{10}(d+1) - \log_{10}(d)$  .  (1)
To see the plausibility of this, look at the log scale plot in Fig. 1 and calculate the distance between the numbers 1-2, 10-20, 100-200, and so on compared to that between 2-3, 20-30, 200-300, ….

In Fig. 6 we plot the number of entries in each space group that begin with the digit 1, 2, 3, etc. That is, if we look at the data back in Fig. 1, 78 of the space groups have a population beginning with 1, from 101,754 in $C2/c$ down to 14 in $Cmm2$. There are 34 groups with a count starting with 2, 25 with 3, and so on. The solid line shows the ideal distribution (1), normalized the the first CSD entry.

Distribution of entries
					     according to the Benford
					     distribution.
Figure 6: A plot of the number of space groups with occupations beginning with the digit 1, 2, 3, …, 9 for each of our datasets. Benford's Law states that this number should be proportional to (1). The solid line plots that formula, normalized to agree with the CDC count for the first digit.

Benford's Law works best when the data is distributed across many orders of magnitude, which is certainly true for the raw CDC and ICSD data, and are more or less distributed uniformly across their range. The reader will have to judge if this last statement is true from a study of Figures 1-4.

What We Know

So what have we found? As we noted at the beginning, the details of structural distribution will depend on what structures are being studied. However, we can make some general statements:

  • Nature (or at least researchers) loves centrosymmetric crystals — well over 70% of all studied crystal structures have an inversion site.
  • Having a large number of screw axes, as with $P2_{1}2_{1}2_{1}$., $Pbca$, and $Pnma$, doesn't hurt.
  • Despite the fact that many biological molecules (e.g., amino acids and sugars) are chiral, only 15% of organic crystals are chiral, and Sohncke Class II crystals only account for about 1% of all systems.
  • In addition, space groups which are part of an enantiomorphic pair (Sohncke Class II), or have a $4_{2}$ screw axis, or and both symmorphic and polar are sparsely populated.
The bottom line seems to be that if you pick up a random crystal it is likely to be centrosymmetric and quite possibly has a screw axis, as long as it's not $4_{2}$.

Acknowledgments

We would like to thank Prof. Harold Stokes for enumerating all of the screw axes in the space group tables, not just the ones explicitly named in the Hermann-Mauguin symbols for the space groups.

Footnotes

I If you have a hallway in your department that needs pictures, check out the poster version, which shows a structure from each of the 230 space groups.

II We have used a number of structures from this list for the Encyclopedia.

III For historical reasons the Encyclopedia has two entries for this structure, with and without the hydrogen atoms.

IV The current edition of the ICSD has three entries that are claimed to be in space group $P422$ #89, but all three are actually tetragonal. AFLOW places them in $P4/mmm$ #123. Hoffmann reported that the CSD has 9 entries, but 6 have no coordinates and 2 are incorrectly assigned, leaving only (CH)17FeO4Pt, technically known as tetrakis(μ2-acetato)-tetrakis(μ2- ferrocenecarboxylato)-tetra-platinum toluene solvate. (The current CSD lists 18 entries, but we have not been able to analyze the new entries to see if they are indeed in $P422$.)

V At the moment we are only counting the 20 space groups with $4_{2}$ operations in the Hermann-Mauguin symbol. This omits 12 space groups which have $4_{2}$ screw axes that are not so labeled. Ten of those groups are also sparsely populated and so fit into this category. The other two, $I4/mmm$ #139 and $Pm\overline{3}m$ #225, contain over 20,000 ICSD entries between them. Rather than hand-wave as to why these two groups should be neglected we will simply leave all 12 out of the calculation. As partial justification, we note that none of these twelve groups has a $4_{2}$ operation its group elements, and four of them, including $I4/mmm$ and $Fm\overline{3}m$, are symmorphic.

VI There several caveats to this statement:

  • Experimental reports of radical positions (OH-, H3O+, NH4+, etc.) often do not report the positions of the hydrogen atoms, and list the sites as “N” or “O”. If there are other nitrogen or oxygen atoms in the structure the radicals will be lumped together with them. This makes the stoichiometry inaccurate.
  • If the ICSD CIF lists multiple atoms on the same site, we use the first element in the list to determine the composition of the crystal, even if the composition is 50-50.
  • Partially occupied sites are considered fully occupied.
  • Approximately 150 structure in the ICSD are so complicated that it takes AFLOW an unreasonable amount of time to determine the proper ordering of the Wyckoff labels. We are working on this problem.

References

Resources

This is a list of resources mentioned in the text: