Dataset statistics
Number of variables | 8 |
---|---|
Number of observations | 541909 |
Missing cells | 136534 |
Missing cells (%) | 3.1% |
Duplicate rows | 5268 |
Duplicate rows (%) | 1.0% |
Total size in memory | 33.1 MiB |
Average record size in memory | 64.0 B |
Variable types
Categorical | 5 |
---|---|
Numeric | 3 |
Warnings
Dataset has 5268 (1.0%) duplicate rows | Duplicates |
InvoiceNo has a high cardinality: 25900 distinct values | High cardinality |
StockCode has a high cardinality: 4070 distinct values | High cardinality |
Description has a high cardinality: 4223 distinct values | High cardinality |
InvoiceDate has a high cardinality: 23260 distinct values | High cardinality |
CustomerID has 135080 (24.9%) missing values | Missing |
UnitPrice is highly skewed (γ1 = 186.5069717) | Skewed |
Reproduction
Analysis started | 2021-11-29 09:08:12.098007 |
---|---|
Analysis finished | 2021-11-29 09:08:16.494103 |
Duration | 4.4 seconds |
Software version | pandas-profiling v2.11.0 |
Download configuration | config.yaml |
Distinct | 25900 |
---|---|
Distinct (%) | 4.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.1 MiB |
573585 | 1114 |
---|---|
581219 | 749 |
581492 | 731 |
580729 | 721 |
558475 | 705 |
Other values (25895) |
Unique
Unique | 5841 ? |
---|---|
Unique (%) | 1.1% |
Sample
1st row | 536365 |
---|---|
2nd row | 536365 |
3rd row | 536365 |
4th row | 536365 |
5th row | 536365 |
Value | Count | Frequency (%) |
573585 | 1114 | 0.2% |
581219 | 749 | 0.1% |
581492 | 731 | 0.1% |
580729 | 721 | 0.1% |
558475 | 705 | 0.1% |
579777 | 687 | 0.1% |
581217 | 676 | 0.1% |
537434 | 675 | 0.1% |
580730 | 662 | 0.1% |
538071 | 652 | 0.1% |
Other values (25890) | 534537 |
Distinct | 4070 |
---|---|
Distinct (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.1 MiB |
85123A | 2313 |
---|---|
22423 | 2203 |
85099B | 2159 |
47566 | 1727 |
20725 | 1639 |
Other values (4065) |
Unique
Unique | 233 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 85123A |
---|---|
2nd row | 71053 |
3rd row | 84406B |
4th row | 84029G |
5th row | 84029E |
Value | Count | Frequency (%) |
85123A | 2313 | 0.4% |
22423 | 2203 | 0.4% |
85099B | 2159 | 0.4% |
47566 | 1727 | 0.3% |
20725 | 1639 | 0.3% |
84879 | 1502 | 0.3% |
22720 | 1477 | 0.3% |
22197 | 1476 | 0.3% |
21212 | 1385 | 0.3% |
20727 | 1350 | 0.2% |
Other values (4060) | 524678 |
Distinct | 4223 |
---|---|
Distinct (%) | 0.8% |
Missing | 1454 |
Missing (%) | 0.3% |
Memory size | 4.1 MiB |
WHITE HANGING HEART T-LIGHT HOLDER | 2369 |
---|---|
REGENCY CAKESTAND 3 TIER | 2200 |
JUMBO BAG RED RETROSPOT | 2159 |
PARTY BUNTING | 1727 |
LUNCH BAG RED RETROSPOT | 1638 |
Other values (4218) |
Unique
Unique | 308 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | WHITE HANGING HEART T-LIGHT HOLDER |
---|---|
2nd row | WHITE METAL LANTERN |
3rd row | CREAM CUPID HEARTS COAT HANGER |
4th row | KNITTED UNION FLAG HOT WATER BOTTLE |
5th row | RED WOOLLY HOTTIE WHITE HEART. |
Value | Count | Frequency (%) |
WHITE HANGING HEART T-LIGHT HOLDER | 2369 | 0.4% |
REGENCY CAKESTAND 3 TIER | 2200 | 0.4% |
JUMBO BAG RED RETROSPOT | 2159 | 0.4% |
PARTY BUNTING | 1727 | 0.3% |
LUNCH BAG RED RETROSPOT | 1638 | 0.3% |
ASSORTED COLOUR BIRD ORNAMENT | 1501 | 0.3% |
SET OF 3 CAKE TINS PANTRY DESIGN | 1473 | 0.3% |
PACK OF 72 RETROSPOT CAKE CASES | 1385 | 0.3% |
LUNCH BAG BLACK SKULL. | 1350 | 0.2% |
NATURAL SLATE HEART CHALKBOARD | 1280 | 0.2% |
Other values (4213) | 523373 | |
(Missing) | 1454 | 0.3% |
Quantity
Real number (ℝ)
Distinct | 722 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9.552249547 |
---|---|
Minimum | -80995 |
Maximum | 80995 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.1 MiB |
Quantile statistics
Minimum | -80995 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 3 |
Q3 | 10 |
95-th percentile | 29 |
Maximum | 80995 |
Range | 161990 |
Interquartile range (IQR) | 9 |
Descriptive statistics
Standard deviation | 218.0811579 |
---|---|
Coefficient of variation (CV) | 22.83034554 |
Kurtosis | 119769.16 |
Mean | 9.552249547 |
Median Absolute Deviation (MAD) | 2 |
Skewness | -0.2640763071 |
Sum | 5176450 |
Variance | 47559.39141 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
1 | 148227 | |
2 | 81829 | |
12 | 61063 | |
6 | 40868 | 7.5% |
4 | 38484 | 7.1% |
3 | 37121 | 6.9% |
24 | 24021 | 4.4% |
10 | 22288 | 4.1% |
8 | 13129 | 2.4% |
5 | 11757 | 2.2% |
Other values (712) | 63122 |
Value | Count | Frequency (%) |
-80995 | 1 | |
-74215 | 1 | |
-9600 | 2 | |
-9360 | 1 | |
-9058 | 1 |
Value | Count | Frequency (%) |
80995 | 1 | |
74215 | 1 | |
12540 | 1 | |
5568 | 1 | |
4800 | 1 |
Distinct | 23260 |
---|---|
Distinct (%) | 4.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.1 MiB |
2011/10/31 14:41 | 1114 |
---|---|
2011/12/8 9:28 | 749 |
2011/12/9 10:03 | 731 |
2011/12/5 17:24 | 721 |
2011/6/29 15:58 | 705 |
Other values (23255) |
Unique
Unique | 4242 ? |
---|---|
Unique (%) | 0.8% |
Sample
1st row | 2010/12/1 8:26 |
---|---|
2nd row | 2010/12/1 8:26 |
3rd row | 2010/12/1 8:26 |
4th row | 2010/12/1 8:26 |
5th row | 2010/12/1 8:26 |
Value | Count | Frequency (%) |
2011/10/31 14:41 | 1114 | 0.2% |
2011/12/8 9:28 | 749 | 0.1% |
2011/12/9 10:03 | 731 | 0.1% |
2011/12/5 17:24 | 721 | 0.1% |
2011/6/29 15:58 | 705 | 0.1% |
2011/11/30 15:13 | 687 | 0.1% |
2011/12/8 9:20 | 676 | 0.1% |
2010/12/6 16:57 | 675 | 0.1% |
2011/12/5 17:28 | 662 | 0.1% |
2010/12/9 14:09 | 652 | 0.1% |
Other values (23250) | 534537 |
Distinct | 1630 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 4.611113626 |
---|---|
Minimum | -11062.06 |
Maximum | 38970 |
Zeros | 2515 |
Zeros (%) | 0.5% |
Memory size | 4.1 MiB |
Quantile statistics
Minimum | -11062.06 |
---|---|
5-th percentile | 0.42 |
Q1 | 1.25 |
median | 2.08 |
Q3 | 4.13 |
95-th percentile | 9.95 |
Maximum | 38970 |
Range | 50032.06 |
Interquartile range (IQR) | 2.88 |
Descriptive statistics
Standard deviation | 96.75985306 |
---|---|
Coefficient of variation (CV) | 20.98405307 |
Kurtosis | 59005.7191 |
Mean | 4.611113626 |
Median Absolute Deviation (MAD) | 1.23 |
Skewness | 186.5069717 |
Sum | 2498803.974 |
Variance | 9362.469164 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
1.25 | 50496 | 9.3% |
1.65 | 38181 | 7.0% |
0.85 | 28497 | 5.3% |
2.95 | 27768 | 5.1% |
0.42 | 24533 | 4.5% |
4.95 | 19040 | 3.5% |
3.75 | 18600 | 3.4% |
2.1 | 17697 | 3.3% |
2.46 | 17091 | 3.2% |
2.08 | 17005 | 3.1% |
Other values (1620) | 283001 |
Value | Count | Frequency (%) |
-11062.06 | 2 | < 0.1% |
0 | 2515 | |
0.001 | 4 | < 0.1% |
0.01 | 1 | < 0.1% |
0.03 | 3 | < 0.1% |
Value | Count | Frequency (%) |
38970 | 1 | < 0.1% |
17836.46 | 1 | < 0.1% |
16888.02 | 1 | < 0.1% |
16453.71 | 1 | < 0.1% |
13541.33 | 3 |
Distinct | 4372 |
---|---|
Distinct (%) | 1.1% |
Missing | 135080 |
Missing (%) | 24.9% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 15287.69057 |
---|---|
Minimum | 12346 |
Maximum | 18287 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.1 MiB |
Quantile statistics
Minimum | 12346 |
---|---|
5-th percentile | 12626 |
Q1 | 13953 |
median | 15152 |
Q3 | 16791 |
95-th percentile | 17905 |
Maximum | 18287 |
Range | 5941 |
Interquartile range (IQR) | 2838 |
Descriptive statistics
Standard deviation | 1713.600303 |
---|---|
Coefficient of variation (CV) | 0.1120902006 |
Kurtosis | -1.179982372 |
Mean | 15287.69057 |
Median Absolute Deviation (MAD) | 1481 |
Skewness | 0.02983499005 |
Sum | 6219475867 |
Variance | 2936426 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) |
17841 | 7983 | 1.5% |
14911 | 5903 | 1.1% |
14096 | 5128 | 0.9% |
12748 | 4642 | 0.9% |
14606 | 2782 | 0.5% |
15311 | 2491 | 0.5% |
14646 | 2085 | 0.4% |
13089 | 1857 | 0.3% |
13263 | 1677 | 0.3% |
14298 | 1640 | 0.3% |
Other values (4362) | 370641 | |
(Missing) | 135080 | 24.9% |
Value | Count | Frequency (%) |
12346 | 2 | < 0.1% |
12347 | 182 | |
12348 | 31 | < 0.1% |
12349 | 73 | |
12350 | 17 | < 0.1% |
Value | Count | Frequency (%) |
18287 | 70 | < 0.1% |
18283 | 756 | |
18282 | 13 | < 0.1% |
18281 | 7 | < 0.1% |
18280 | 10 | < 0.1% |
Country
Categorical
Distinct | 38 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.1 MiB |
United Kingdom | |
---|---|
Germany | 9495 |
France | 8557 |
EIRE | 8196 |
Spain | 2533 |
Other values (33) | 17650 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | United Kingdom |
---|---|
2nd row | United Kingdom |
3rd row | United Kingdom |
4th row | United Kingdom |
5th row | United Kingdom |
Value | Count | Frequency (%) |
United Kingdom | 495478 | |
Germany | 9495 | 1.8% |
France | 8557 | 1.6% |
EIRE | 8196 | 1.5% |
Spain | 2533 | 0.5% |
Netherlands | 2371 | 0.4% |
Belgium | 2069 | 0.4% |
Switzerland | 2002 | 0.4% |
Portugal | 1519 | 0.3% |
Australia | 1259 | 0.2% |
Other values (28) | 8430 | 1.6% |