Xác suất thống kê: Áp dụng phương pháp Hypothesis testing phân tích người dùng Internet [English version]

12 10 0
Xác suất thống kê: Áp dụng phương pháp Hypothesis testing phân tích người dùng Internet [English version]

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Business Statistics – ECON1193 Assignment – Individual Case Study Content: Part 1: Introduction Part 2: A/ Probability B/ Descriptive Measures Part 3: Confidence Intervals Part 4: Hypothesis Testing Part 5: Overall Inclusion References Part 1: Introduction A Global Description of Individual using Internet: In the early 1980s, personal computers were popularized to people, instead of being used by experts, technologists (Sawyer 2004) Particularly, the Apple I was released by Steve Job in 1976 as well as the Internet was developed by Russia in 1957 to mark a milestone in the development of technology (Briddock 2014) From that, there was the Internet revolution changing the purpose of users such as education, communication, and commerce For example, there is a significant increase in using the Internet for education to create the connection between teachers and students, classroom and other classroom, global and national institutions (Using the Internet to connect students and teachers around the world for 'virtual exchanges', 2020) Furthermore, because of the spread of Covid-19, all global universities apply online learning methods to support individual students learn by themselves at home, for instance, Rmit university utilized Ultra Collocation as the main application To sum up, it is clearly seen that the Internet is widely available worldwide 1|Page Why it is important to monitor the Individuals using Internet to achieve the United Nation SDG goal: Economic Growth: The Internet brings not only social needs but also economic purpose Thanks to the development of technology, it supports companies and household increase advantages such as profit, reputation, storage of data through online economic activities For instance, Canada as a world leader involves the high percentage of businesses buying goods online, but it is lower in the rank of online selling (Bryan 2010) Therefore, Canadian business can gain more profit, improve their reputation, and store data easily On the other hand, the Internet also results in negative issues, because “The weakness of the internet is that it connects everything so easily” (Bennett 2017) For example, a robber can surpass the bank’s firewall to steal bitcoin As a result, it is true that individuals using the internet should be monitored to maintain the SDG goal The relationship between Individuals using Internet and Gross National Income (GNI): Gross National Income (GNI) means the gross income generated by all domestic and international production of a country's firms and households Therefore, there is a close relationship between Internet users and GNI, especially the growth of the economy It is obviously considered through developing and developed nations (Klein 2020) Figure shows that developed countries had a great number of internet access from 2005 to 2017, however, an amount of Internet users in developing nations just hold a half of internet access, compared to developed ones This leads to a generous gap between rich nations with high internet access and poor ones with low usage of Internet Part 2: A/ Probability: Middle-income (MI) High-income (HI) Total Low usage of Internet (L) Low-income (LI) High usage of Internet (H) 11 17 Total 15 25 Table 1: The table of individual using Internet in low-income, middle-income, high-income countries (Unit: country) 2|Page Calculation: P(L/LI) = P(L/MI) = P(L/HI) = P(L and LI) P(LI) P(L and MI) P(MI) P(L and HI) P(HI) = = = 4/25 4/25 4/25 15/25 0/25 6/25 = = 100 % = 0.26 = 26% = = 0% [P(L/LI) = 1] ≠ [P(L) = 0.32] [P(L/MI) = 0.26] ≠ [P(L) = 0.32] [P(L/HI) = 0] ≠ [P(L) = 0.32] => Dependent events = 0.32 = 32% 25 There are totally 25 countries mentioned in this report, including 17 countries with high usage of Internet, countries with low usage of Internet Additionally, 25 nations are divided into groups such as low-income, 15 middle-income, and high-income countries P(L) = From the below calculation, it is obvious that the relationship between individuals using the Internet and income is statistically dependent, which means the income of country categories is partially affected by the frequency of Internet use Looking at the details, the probability of events, low usage of Internet and low-income nations, is equal to 100%, and the probability of low usage of Internet and high-income countries is zero Nations which have low income are less likely to access the Internet, compared to developed countries with high income For example, poor quality of healthcare is usually considered in low-income countries (Roder-Dewan 2019) Through the survey in the midwestern United States, those who earn low income tend not to use Internet technology for health purposes (Jensen, King, Davis & Guntzviller 2010) Furthermore, Research ICT Africa recognized that “higher income users pay between US$3.17-9.52 per month, whereas low income users pay as low as US7 cents per month in Nigeria” This results in the evidence that people with a high salary utilize the Internet more than lower income users (SyndiGate Media Inc 2017) B/ Descriptive Measures: Central Tendency Low-income (LI) Middle-income (MI) High-income (HI) Mean 15.10 51.61 84.69 Median 18.31 55.68 85.93 Mode No Mode No Mode No Mode Table 1: Measures of central tendency for individual using Internet in low-income, middleincome, high-income countries (Unit: % of population) Firstly, the mode is not utilized in this case because it does not occur in country categories Additionally, the outliers test below shows that there are no outliers in this situation Therefore, 3|Page between the mean and the median, the best measure is the mean because this measure involves as many values as possible The highest average of individuals using the Internet belongs to the high-income group, nearly 85% of the population This figure is extremely greater than the percentage of low-income and middle-income nation categories with 15.10% and 51.61% of population respectively It is clearly seen that the disproportion of Internet access in different countries, which leads to the division of wealth because of the different GNI (Klein 2020) Outliers Test (Individual using Internet) Observation value Calculate value Smallest accept 3.93 -11.11491122 Largest accept 97.10 118.0604168 Table 2: Outliers Test for individual using Internet in low-income, middle-income, high-income countries (Unit: % of population) After checking outliers carefully, there are no outliers in this situation Variation Low-income (LI) Middle-income (MI) High-income (HI) Range 15.91 43.38 26.55 Interquartile 4.44 18.88 6.47 Variance 55.99 173.40 78.19 Standard Deviation 7.48 13.17 8.84 Coefficient Variation 50% 26% 10% Table 3: Measures of Variation for individual using Internet in low-income, middle-income, high-income countries (Unit: % of population) First of all, the range can include just values, it cannot be used for the best measure Although variance can involve all values and is also better than the interquartile range Equally important, standard deviation is still the best measure in this case when there are no outliers as well as the same unit Furthermore, the main reason why standard deviation is the most important of variation is because it can obviously show the distribution of values around the mean 4|Page It is clearly seen that country categories are completely different in using the Internet In the comparison of all groups, the left side of the boxes are larger than the right side, also the left whiskers are longer than the right ones Therefore, these box-plot are left-skewed according to the mean of low, middle, and high-income nations that are less than the median For example, the median of MI nations is greater than the mean of MI nations (55.68% and 51.61%) The median of HI countries is slightly more than the mean of HI countries (85.93% and 84.69%) Lastly, the mean of LI group is likely less than the median of LI group (15.10% and 18.31%) A 95% confidence interval is determined as the level of confidence for the final results Part 3: Confidence Intervals a) And b) Because the data set just collects 25 countries, not all countries of the world, therefore, the population standard deviation is unknown In addition, sample size is equal to 25 nations which is less than 30 Hence, rule is applied for this question, we assume the population is normally distributed, so the sample is also normally distributed As a result, we use t-distribution (t table) to find the confidence interval by calculating the lower value (X1) and upper value (X2) Calculation and the formula: S X = X ± t(1-α ), n * √ n o o o o o Sample mean ( X ) = 53.71 Sample standard deviation (S) = 24.81 Sample size (n) = 25 countries Level of confidence (1-α ) = 95% Degree of freedom (df) = n-1 = 25-1 = 24 To find down the lower value (X1), we apply the formula: 5|Page S 24.81 X1 = X - t 95%, df =24 * √ n = 53.71 – 2.06 * √ 25 = 43.49 (% of population) The upper value: S 24.81 X2 = X + t 95%, df =24 * √ n = 53.71 + 2.06 * √ 25 = 63.93 (% of population) In conclusion, we are 95% confidence to conclude that the confidence intervals for the world average of an individual using the Internet is from 43.49 % of population to 63.93 % of population This range illustrates that all middle-income countries collected in the data set fulfil the world average of an individual using the internet c) Because the question mentions that the world standard deviation of individuals using the Internet (% of population) which means population standard deviation is known Hence, instead of using t-table like the calculation above, we applied the Z-distribution for this census Also the sample standard deviation (S) is replaced by the population standard deviation (σ ) Definitely, the formula to identify the confidence interval is changed to a new one: σ X ± Z (1-α ) * √n Firstly, the result of independence interval depends on the levels of confidence, which is that higher confidence level leads to the higher range estimate Therefore, when we raise the confidence level, from 95% to 99%, in this case, we receive a different range of value that is greater than Secondly, the standard deviation also affects the change of confidence interval Because it is utilized to illustrate the distribution of all values, the higher standard deviation stands for values scatter more For example, there are two types of distribution properties, normal population distribution and normal sampling distribution Especially, they have the same mean, but there are different distribution shapes, which is shown by the figure If we make a census for all countries in the world, which means that the sample size for this case is 25 countries will increase more This additionally leads to the different distribution shapes, for instance, the smaller sample size and the larger sample size (figure 3) In reality, a sample is not a good presentation of the population’s underlying characteristics It may involve extra uncertainty because sample value is variable from sample to sample Otherwise, there are many reasons why experts prefer a sample to a population to collect the data, for example, less time-consuming, less costly, and less cumbersome to administer than a census As a result, this method is still more practical to conclude the issues 6|Page Part 4: Hypothesis Testing a) steps for the hypothesis testing: Step 1: Because the data set includes 25 nations, therefore, a sample size is not greater than 30 We utilize rule for this situation, we assume the population is normally distributed, so the sample is also normally distributed Step 2: The World Health Organization reported that the world average individual using the Internet (% of population) is 44.7% in 2016 In addition, the result in part 3a shows the confidence interval which is between 43.49% of population and 63.93% of population From these figures, I argue that the world means individuals using the Internet will increase in the future As a result, the null hypothesis (H0) and the alternative hypothesis (H1) are determined: - H0: the world average individual using the Internet is less than and equal to 44.7% of population - H1: the world average individual using the Internet is greater than 44.7% of population Step 3: Because the level of confidence is not mentioned o o o o o o I choose a 95% confidence level as well as the level of significant is equal to 0.05 The degree of freedom is 25 minus equal to 24 (df=24) The sample size is equal to 25 (n=25) The sample mean is equal to 53.71 ( X =53.71) The sample standard deviation is equal to 24.81 (S=24.81) The distribution shape is right-tailed Step 4: Because the population standard deviation is unknown, we use student’s t-distribution to find down t-critical value Step 5: the level of significance is 0.05 and the degree of freedom is 24, hence, we receive the tcritical value is equal to 1.71 Step 6: there is a formula to calculate the t- test value: X−μ 53.71−44.7 24.81 T-test value = S = = 1.81 √n √ 25 7|Page Step 7: t- critical value is equal to 1.71 and t- test value is 1.81 When we compare both, tcritical value is likely less than t- test value This means t- test value is in the rejection area, so we reject the null hypothesis (H0) and accept the alternative hypothesis (H1) Step 8: We are 95% confident that the world means individuals using the Internet will gradually rise in the future thanks to the development of technology as well as benefits of the Internet For example, students who use the Internet for educational purposes have high learning results (American Institute of Physics 2019) Step 9: Because we reject the null hypothesis (H0), we can still have 5% of type I error In reality, though the Internet contributes to many advantages for educational purposes, it also maybe leads to negative impacts to users, particularly the young generation Students utilize the Internet for entertainment such as playing games, watching videos, communicating in social networks (American Institute of Physics 2019) For that reason, adults will limit the Internet usage of their children based on each child’s personalities and awareness (Bremer 2005) This can result in the lower average of Internet users b) When the number of countries of the dataset triple, the sample size is also changed from 25 to 75 nations There are many shifts for t- critical value and t- test value because of the change of sample size, the distribution shape is right-tailed However, the result of the hypothesis testing will remain unchanged First of all, the t-distribution depends on the sample size, so sample size changes as well as tdistribution also changes Basically, the bigger sample size will result in the smaller t- critical value Secondly, looking at the formula to calculate the t- test value, except from sample size changed, other values are unchangeable Therefore, an increase of sample size leads to a rise of t8|Page test value To sum up, due to two changes of both t- values mentioned above, t- critical value is less than t- test value, which is t- test value is still in the rejection area In part 3c, I mentioned that a sample maybe has some error, compared to a population with more numerous data Therefore, if the sample size increases more, the result of hypothesis testing will be more accurate Part 5: Overall Conclusion In conclusion, there are outstanding findings in the report In part 2, we calculate the probability of the low usage of the Internet with countries categories such as low-income, middle-income, and high-income nations to test the relationship between them The result shows that these are not independent, which represents utilizing the Internet affects the GNI of country For example, from the dataset of assignment 2, Denmark as developed nation gained the highest GNI per capita (current US$), it also achieved the highest of table in Individuals using the Internet (% of the population), $58,355 and 97.10 % respectively In part 3, the confidence intervals for the world average of individual using the Internet is from 43.49 to 63.93 This range of value involves all middle-income countries in the dataset, however, just some nations have the high Internet access In part 4, we argue that the Internet will become more popular thanks to the conveniences of personal computer and benefits of the Internet to users Hence, the hypothesis testing successfully proves individuals using Internet will increase over 44.7% after 2007 However, limitation still may exist because of some negative impact to children, for instance, they are addicted to playing games, watching insignificant videos, and enjoying the fake life on social networks (American Institute of Physics 2019) To sum up, the Internet is a key factor influencing the economic growth of countries, on the other hand, it also negatively affects to children who are not aware enough to distinguish good and bad things (Bremer 2005) References: Sawyer, T 2004, ‘Personal Computer (cover story)’, ENR: Engineering New-Record, viewed 16 August 2020, Briddock, D 2014, Tech Origins The Personal Computer, Dennis Publishing Ltd, London, viewed 16 August 2020, 9|Page < https://search-proquest-com.ezproxy.lib.rmit.edu.au/docview/1518056131/594975B739384CE2P Q/2?accountid=13552 > Al-Shahi, R., Sadler, M., Rees, G & Bateman, D 2002, "The internet", Journal of Neurology, Neurosurgery and Psychiatry, vol 73, no 6, pp 619, viewed 16 August 2020, World Bank Blogs 2020. Using The Internet To Connect Students And Teachers Around The World For 'Virtual Exchanges', viewed 17 August 2020, Klein, J 2020, 51% Of The World Is Online, But The Access Gap Between Rich And Poor Prevails [online] BREAKERMAG, viewed 19 August 2020, Bryan, J 2010, Internet's benefits have just begun Column, Don Mills, Ont, viewed 18 August 2020, < https://search-proquest-com.ezproxy.lib.rmit.edu.au/docview/459083673/47879E5D6D9940D2P Q/2?accountid=13552 > Bennett, J 2017, Are the internet's benefits all they're cracked up to be?, New Plymouth, New Zealand, viewed 18 August 2020, < https://search-proquest-com.ezproxy.lib.rmit.edu.au/docview/1899569501/47879E5D6D9940D2 PQ/8?accountid=13552 > Roder-DeWan, S, Gage, A-D, Hirschhorn, L-R, Nana, A-Y-T, Liljestrand, J, Asante-Shongwe, K, Yahya, T & Kruk, M-E 2019, "Expectations of healthcare quality: A cross-sectional study of internet users in 12 low- and middle-income countries", PLoS Medicine, vol 16, no 8, viewed 18 August 2020, Jensen, J, King, A, Davis, L & Guntzviller, L 2010, Utilization of Internet Technology by LowIncome Adults. Journal of Aging and Health, 22(6), pp.804-826, viewed 19 August 2020, SyndiGate Media Inc, High Income Internet Users in Nigeria Spend N3,427 Monthly Report 2017, , Washington, viewed 18 August 2020, 10 | P a g e American Institute of Physics, The negative impact of the internet on the educational process 2019, , American Institute of Physics, Melville, viewed 19 August 2020, Gaming, Internet Have Negative Impact on Teen Sleep 2011, , Anthem Media Group, Los Angeles Anthem Median Group, Gaming, Internet Have Negative Impact on Teen Sleep 2011, , Anthem Media Group, Los Angeles, viewed 19 August 2020, Bremer, J 2005, "The internet and children: advantages and disadvantages", Child and adolescent psychiatric clinics of North America, vol 14, no 3, pp 405-28, viii, viewed 19 August 2020, < https://search-proquest-com.ezproxy.lib.rmit.edu.au/docview/67898214/2DBFFB3BE76643A6P Q/3?accountid=13552 > https://search-proquest-com.ezproxy.lib.rmit.edu.au/docview/ 67898214/2DBFFB3BE76643A6PQ/3?accountid=13552 (child personality) Appendix: Figure 1: The percentage of people online and offline in developing and developed countries 11 | P a g e https://primo-direct-apac.hosted.exlibrisgroup.com/primo-explore/fulldisplay? docid=RMIT_ALMA5175422010001341&context=L&vid=RMITU&lang=en_US&search_scop e=Books_articles_and_more&adaptor=Local%20Search %20Engine&tab=default_tab&query=any,contains,developed%20and%20developing %20countries&offset=0 mkt internet chapter Figure 2: the population and sampling distribution shapes with the same mean Figure 3: the distribution shapes with different sample sizes 12 | P a g e ... world leader involves the high percentage of businesses buying goods online, but it is lower in the rank of online selling (Bryan 2010) Therefore, Canadian business can gain more profit, improve their... calculation, it is obvious that the relationship between individuals using the Internet and income is statistically dependent, which means the income of country categories is partially affected by the

Ngày đăng: 10/02/2023, 14:24

Tài liệu cùng người dùng

Tài liệu liên quan