Overview

Dataset statistics

Number of variables8
Number of observations541909
Missing cells136534
Missing cells (%)3.1%
Duplicate rows5268
Duplicate rows (%)1.0%
Total size in memory33.1 MiB
Average record size in memory64.0 B

Variable types

Categorical5
Numeric3

Warnings

Dataset has 5268 (1.0%) duplicate rows Duplicates
InvoiceNo has a high cardinality: 25900 distinct values High cardinality
StockCode has a high cardinality: 4070 distinct values High cardinality
Description has a high cardinality: 4223 distinct values High cardinality
InvoiceDate has a high cardinality: 23260 distinct values High cardinality
CustomerID has 135080 (24.9%) missing values Missing
UnitPrice is highly skewed (γ1 = 186.5069717) Skewed

Reproduction

Analysis started2021-11-29 09:08:12.098007
Analysis finished2021-11-29 09:08:16.494103
Duration4.4 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

InvoiceNo
Categorical

HIGH CARDINALITY

Distinct25900
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
573585
 
1114
581219
 
749
581492
 
731
580729
 
721
558475
 
705
Other values (25895)
537889 

Unique

Unique5841 ?
Unique (%)1.1%

Sample

1st row536365
2nd row536365
3rd row536365
4th row536365
5th row536365
ValueCountFrequency (%)
5735851114
 
0.2%
581219749
 
0.1%
581492731
 
0.1%
580729721
 
0.1%
558475705
 
0.1%
579777687
 
0.1%
581217676
 
0.1%
537434675
 
0.1%
580730662
 
0.1%
538071652
 
0.1%
Other values (25890)534537
98.6%

StockCode
Categorical

HIGH CARDINALITY

Distinct4070
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
85123A
 
2313
22423
 
2203
85099B
 
2159
47566
 
1727
20725
 
1639
Other values (4065)
531868 

Unique

Unique233 ?
Unique (%)< 0.1%

Sample

1st row85123A
2nd row71053
3rd row84406B
4th row84029G
5th row84029E
ValueCountFrequency (%)
85123A2313
 
0.4%
224232203
 
0.4%
85099B2159
 
0.4%
475661727
 
0.3%
207251639
 
0.3%
848791502
 
0.3%
227201477
 
0.3%
221971476
 
0.3%
212121385
 
0.3%
207271350
 
0.2%
Other values (4060)524678
96.8%

Description
Categorical

HIGH CARDINALITY

Distinct4223
Distinct (%)0.8%
Missing1454
Missing (%)0.3%
Memory size4.1 MiB
WHITE HANGING HEART T-LIGHT HOLDER
 
2369
REGENCY CAKESTAND 3 TIER
 
2200
JUMBO BAG RED RETROSPOT
 
2159
PARTY BUNTING
 
1727
LUNCH BAG RED RETROSPOT
 
1638
Other values (4218)
530362 

Unique

Unique308 ?
Unique (%)0.1%

Sample

1st rowWHITE HANGING HEART T-LIGHT HOLDER
2nd rowWHITE METAL LANTERN
3rd rowCREAM CUPID HEARTS COAT HANGER
4th rowKNITTED UNION FLAG HOT WATER BOTTLE
5th rowRED WOOLLY HOTTIE WHITE HEART.
ValueCountFrequency (%)
WHITE HANGING HEART T-LIGHT HOLDER2369
 
0.4%
REGENCY CAKESTAND 3 TIER2200
 
0.4%
JUMBO BAG RED RETROSPOT2159
 
0.4%
PARTY BUNTING1727
 
0.3%
LUNCH BAG RED RETROSPOT1638
 
0.3%
ASSORTED COLOUR BIRD ORNAMENT1501
 
0.3%
SET OF 3 CAKE TINS PANTRY DESIGN 1473
 
0.3%
PACK OF 72 RETROSPOT CAKE CASES1385
 
0.3%
LUNCH BAG BLACK SKULL.1350
 
0.2%
NATURAL SLATE HEART CHALKBOARD 1280
 
0.2%
Other values (4213)523373
96.6%
(Missing)1454
 
0.3%

Quantity
Real number (ℝ)

Distinct722
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.552249547
Minimum-80995
Maximum80995
Zeros0
Zeros (%)0.0%
Memory size4.1 MiB
2021-11-29T18:08:17.081063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-80995
5-th percentile1
Q11
median3
Q310
95-th percentile29
Maximum80995
Range161990
Interquartile range (IQR)9

Descriptive statistics

Standard deviation218.0811579
Coefficient of variation (CV)22.83034554
Kurtosis119769.16
Mean9.552249547
Median Absolute Deviation (MAD)2
Skewness-0.2640763071
Sum5176450
Variance47559.39141
MonotocityNot monotonic
2021-11-29T18:08:17.204063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1148227
27.4%
281829
15.1%
1261063
11.3%
640868
 
7.5%
438484
 
7.1%
337121
 
6.9%
2424021
 
4.4%
1022288
 
4.1%
813129
 
2.4%
511757
 
2.2%
Other values (712)63122
11.6%
ValueCountFrequency (%)
-809951
< 0.1%
-742151
< 0.1%
-96002
< 0.1%
-93601
< 0.1%
-90581
< 0.1%
ValueCountFrequency (%)
809951
< 0.1%
742151
< 0.1%
125401
< 0.1%
55681
< 0.1%
48001
< 0.1%

InvoiceDate
Categorical

HIGH CARDINALITY

Distinct23260
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
2011/10/31 14:41
 
1114
2011/12/8 9:28
 
749
2011/12/9 10:03
 
731
2011/12/5 17:24
 
721
2011/6/29 15:58
 
705
Other values (23255)
537889 

Unique

Unique4242 ?
Unique (%)0.8%

Sample

1st row2010/12/1 8:26
2nd row2010/12/1 8:26
3rd row2010/12/1 8:26
4th row2010/12/1 8:26
5th row2010/12/1 8:26
ValueCountFrequency (%)
2011/10/31 14:411114
 
0.2%
2011/12/8 9:28749
 
0.1%
2011/12/9 10:03731
 
0.1%
2011/12/5 17:24721
 
0.1%
2011/6/29 15:58705
 
0.1%
2011/11/30 15:13687
 
0.1%
2011/12/8 9:20676
 
0.1%
2010/12/6 16:57675
 
0.1%
2011/12/5 17:28662
 
0.1%
2010/12/9 14:09652
 
0.1%
Other values (23250)534537
98.6%

UnitPrice
Real number (ℝ)

SKEWED

Distinct1630
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.611113626
Minimum-11062.06
Maximum38970
Zeros2515
Zeros (%)0.5%
Memory size4.1 MiB
2021-11-29T18:08:17.550064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-11062.06
5-th percentile0.42
Q11.25
median2.08
Q34.13
95-th percentile9.95
Maximum38970
Range50032.06
Interquartile range (IQR)2.88

Descriptive statistics

Standard deviation96.75985306
Coefficient of variation (CV)20.98405307
Kurtosis59005.7191
Mean4.611113626
Median Absolute Deviation (MAD)1.23
Skewness186.5069717
Sum2498803.974
Variance9362.469164
MonotocityNot monotonic
2021-11-29T18:08:17.710099image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.2550496
 
9.3%
1.6538181
 
7.0%
0.8528497
 
5.3%
2.9527768
 
5.1%
0.4224533
 
4.5%
4.9519040
 
3.5%
3.7518600
 
3.4%
2.117697
 
3.3%
2.4617091
 
3.2%
2.0817005
 
3.1%
Other values (1620)283001
52.2%
ValueCountFrequency (%)
-11062.062
 
< 0.1%
02515
0.5%
0.0014
 
< 0.1%
0.011
 
< 0.1%
0.033
 
< 0.1%
ValueCountFrequency (%)
389701
 
< 0.1%
17836.461
 
< 0.1%
16888.021
 
< 0.1%
16453.711
 
< 0.1%
13541.333
< 0.1%

CustomerID
Real number (ℝ≥0)

MISSING

Distinct4372
Distinct (%)1.1%
Missing135080
Missing (%)24.9%
Infinite0
Infinite (%)0.0%
Mean15287.69057
Minimum12346
Maximum18287
Zeros0
Zeros (%)0.0%
Memory size4.1 MiB
2021-11-29T18:08:17.859063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12626
Q113953
median15152
Q316791
95-th percentile17905
Maximum18287
Range5941
Interquartile range (IQR)2838

Descriptive statistics

Standard deviation1713.600303
Coefficient of variation (CV)0.1120902006
Kurtosis-1.179982372
Mean15287.69057
Median Absolute Deviation (MAD)1481
Skewness0.02983499005
Sum6219475867
Variance2936426
MonotocityNot monotonic
2021-11-29T18:08:17.991064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178417983
 
1.5%
149115903
 
1.1%
140965128
 
0.9%
127484642
 
0.9%
146062782
 
0.5%
153112491
 
0.5%
146462085
 
0.4%
130891857
 
0.3%
132631677
 
0.3%
142981640
 
0.3%
Other values (4362)370641
68.4%
(Missing)135080
 
24.9%
ValueCountFrequency (%)
123462
 
< 0.1%
12347182
< 0.1%
1234831
 
< 0.1%
1234973
< 0.1%
1235017
 
< 0.1%
ValueCountFrequency (%)
1828770
 
< 0.1%
18283756
0.1%
1828213
 
< 0.1%
182817
 
< 0.1%
1828010
 
< 0.1%

Country
Categorical

Distinct38
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.1 MiB
United Kingdom
495478 
Germany
 
9495
France
 
8557
EIRE
 
8196
Spain
 
2533
Other values (33)
 
17650

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnited Kingdom
2nd rowUnited Kingdom
3rd rowUnited Kingdom
4th rowUnited Kingdom
5th rowUnited Kingdom
ValueCountFrequency (%)
United Kingdom495478
91.4%
Germany9495
 
1.8%
France8557
 
1.6%
EIRE8196
 
1.5%
Spain2533
 
0.5%
Netherlands2371
 
0.4%
Belgium2069
 
0.4%
Switzerland2002
 
0.4%
Portugal1519
 
0.3%
Australia1259
 
0.2%
Other values (28)8430
 
1.6%