Finding Some Data

Data that is generated

Example 1: Climate change from 1991 - 2020 

         Clear citation of the data source: Climate change and the 1991-2020 U.S. Climate Normals |                 NOAA Climate.gov

What is the scope of the data?

               The scope of the data is the climate change in 20th century average. Every 30 years since                       1901 has a separate graph to represent climate change

 

How big is it? 

                The data is only 10 graphs, but each graph shows a 30-year time.

How do we quantify it? 

                Some ways we quantify these data charts is by comparing the color map to each one and                        determining a baseline from the earliest graph of the 20th century. 

Objective v. Subjective?

                This data is objective since it is taking in weather measurements over time

 Quantitative v. Qualitative?

                 This data is quantitative since is measures the different and average temperature for each                     graph in the USA

Intrinsic v. Interpreted/Extrinsic?

                  This data should be intrinsic since it is average weather temperatures per each 30 years                          since 1901 and this should be something that is given and not changing.  

What is the origin / source of the data?

                   The source of the data is from Rebecca Lindsey and on the climate gov website

What questions / insights / experiences might this data encourage?

                    The questions this might raise is over the past century alone the temperature of the USA                        has risen over 1 degree's F each 30 years since 1901.  This can be interesting because of                        this data we can determine that this is not only the USA but globally meaning we need to                        understand what is causing this change and how to prevent it since a century over the                           course of our existence and the earths is not a long time and if it is rising this drastically                       this fast then we have to figure out why and how to fix it. 

Does the location of this data matter?

                  The location of this data doesn't matter in the since that what part of the earth it was taken                     on since the earth revolves around the sun this will affect a bigger problem globally and not                  just in the USA

Where else could it be? Is there abstraction at play?

                 Some abstractions could be the graphs color correlation to the numbers since it's not a way                     to see it based on numbers but blue for cold and -1 and red for hotter and +1 this is a good                     way to see how just a small temperature increase a short amount of time could be bad with a                  longer than a century timeframe 


Example 2: Active X users

          Clear citation of the data source: Number of Twitter Users 2024: Demographics, Breakdowns & Predictions - Financesonline.com

What is the scope of the data? 

The scope of the data is the total amount of users on x

How big is it? 

The data size is in 2019 330M per month

How do we quantify it? 

The way we quantify it is by the number of active users and in 2020 69.3M users per day

Objective v. Subjective?

This is objective data since we know exactly how many actives users there are on x per month and day for each year

 Quantitative v. Qualitative? 

 This is quantitative data we can analyze and determine different number of users each year and began to create and make patterns from this data

Intrinsic v. Interpreted/Extrinsic?

This is interpreted data since it is number of users for each year on any given day something that is determined as not raw but who is actually an active user 

What is the origin / source of the data?

The origin of the data is 17 research companies that researched data from twitter x and put together different kinds of data based on active users

What questions / insights / experiences might this data encourage?

Some questions this can rise is how different trends affect active users, and possible later studies can determine the amount of time each active user is on x per day

Does the location of this data matter? 

The location of this data doesn't matter this data should all be coming from x or twitter but most of this data should be raw data of users then interpreted into other data

Where else could it be? Is there abstraction at play? 

Some abstraction that could be at play here is what determines an active user. Some other things that could be interesting is do any of the researchers use x twitter or do they have any affiliation with the company that would lead to good results from the research. 

 

Data that is measured

          Clear citation of the data source: Stock Market Forecast – Forbes Advisor

Example 1: Stock market predictions 

What is the scope of the data? 

The scope of the data is the USA S&P 500 stock market data from 2023 and how researchers can predict the 2024 stock market based on this 

How big is it?

The data is 500 companies which is why it is the S&P 500 which means the top 500 leading companies determine how good the stocks are doing in the USA and how the overall economy is doing based on these companies

How do we quantify it?

We can quantify it based on how good these companies do and how the overall market does each quarter of the year to have progress throughout the entire year 

Objective v. Subjective?

The data that is being derived is subjective this is prediction data in the 2024 market before it was here and determining how it would do based on the current market data 

 Quantitative v. Qualitative? 

This would be qualitative data since the real values in the real market of 2024 would reflect actual data points of the market these predictions are subjective and changing, they are not fixed 

Intrinsic v. Interpreted/Extrinsic?

 This data would be interpreted since it analyzes different company's projections of earning and debt compared to the USA debt and GDP, interest rates, and other economic factors meaning there is many factors at play here to try and predict the 2024 stock market  

What is the origin / source of the data?

The source of the data comes from the 2023 stock market for the S&P 500 since the 2024 market prediction is derived data from this time in the market 

What questions / insights / experiences might this data encourage?

This data can rise questions on if we can accurately predict markets in the future or is there way too many factors at play that cannot be accounted for. 

Does the location of this data matter? 

The location of this data does matter in other countries there might be more or less that goes into different stock market making each country unique  

Where else could it be? Is there abstraction at play?

How do we determine what factors matter or not that will affect the stock prices of 2024 as well as there are always real-world events that cannot be predicted and accounted for into stock market pricing per year


Example 2: 

          Clear citation of the data source:  Use of gasoline - U.S. Energy Information Administration (EIA)

What is the scope of the data? 

The amount of gasoline that is used in the USA in 2022 

How big is it? 

The data size is 135.73B gallons of gas 

How do we quantify it? 

Some ways that we can quantify it is by how many millions of barrels of gas each state users per day and how much millions of gallons they user per day along with the percentage of total consumption for each state. 

Objective v. Subjective?

This is objective each state will use different amounts of gas per day

 Quantitative v. Qualitative? 

This data is quantitative since it is specific amounts of gas used per day in the USA

Intrinsic v. Interpreted/Extrinsic?

This data is intrinsic because we can measure the total amount of gas used over all vehicles per day  

What is the origin / source of the data?

The origin of the data is from the monthly energy review  

What questions / insights / experiences might this data encourage?

This can help determine which states use the most gas and why this could also help determine if these states would benefit from infrastructure of public transportation to help reduce the amount of gas used by each state. 

Does the location of this data matter?

Yes, the location of this data does matter since each state will have a different amount of gas used per day  

Where else could it be? Is there abstraction at play?

Some abstractions that could be at play here is there are many different types of gas as well as different states have different roles in the greater economy so is this accounted for


Data that is synthesized

          Clear citation of the data source: What is DALL-E? How it works and how the system generates AI art (interestingengineering.com)

Example 1: DALLE E image generation

What is the scope of the data? 

 The scope of the data is this AI LLM can generate any image from only text

How big is it? 

The size is based on what the LLM is trained on so the entire internet and any other sources imputed into the training data

How do we quantify it? 

 The way we can quantify it is by how many factors go into creating an image how long possible for different text and does it have a set complexity it goes for

Objective v. Subjective?

This is subjective data since the data cannot be reproduced again once created every image from a prompt will be new and original  

 Quantitative v. Qualitative? 

This is qualitative data here because it is a form of art there is only being able to interpret what is it based on the prompt 

Intrinsic v. Interpreted/Extrinsic?

This data is interpreted because the goal was set from a human, but the AI ultimately chose the sources, how to represent it and finally creating it.  

What is the origin / source of the data?

The origin of who created it is the company OpenAI as far as how that's created in unknown the source is randomly scrapped off the internet to best predict the best match based on the prompts

What questions / insights / experiences might this data encourage?

Some questions that this arises is how reliable this data is and how important is the sources to the representations of the data if the data gets you the art you are looking for does it matter how it got there?

Does the location of this data matter? 

The location does matter because depending on each search the type of data pulled will be different every single time as far as that but for results of the data not necessarily since its somewhat random to get a best match for your prompt 

Where else could it be? Is there abstraction at play?

Yes, there is a lot of abstraction at play the way these models are trained and how it is constantly updating, and new information is constantly being produced we are unsure exactly how these LLM find and create data from scrapping the internet

Example 2: 

          Clear citation of the data source:  This Person Does Not Exist - Random Face Generator (this-person-does-not-exist.com)

What is the scope of the data? 

The scope of this data is creating very human like faces that don't actually exist  

How big is it? 

Since it is based on filters this is basically the limit of how much data is within the generator 

How do we quantify it? 

We can quantify it buy the number of unique images it can create  

Objective v. Subjective?

The images of faces generated can be subjective you can put filters to see who someone could look like based on these but it's not actually a real person so it does not actually mean anything just a generalization of someone traits and not someone who can prove that they are that person

 Quantitative v. Qualitative? 

 This data is qualitative the images of faces that are generated are determined by a neural network and often can make mistakes as our own brains can catch the fakes when this happens. When this doesn't happen it is very hard to determine when there is a fake picture of a human face that is generated

Intrinsic v. Interpreted/Extrinsic?

This is interpreted data since the data is based on traits to generate a real looking face 

What is the origin / source of the data?

The origin of the data is a neural network that generates faces based on traits that the network has been trained with  

What questions / insights / experiences might this data encourage?

Some questions that arise would be when this doesn't mess up its nearly impossible to tell meaning this could be very easy to put real faces into a neural network and create deep fakes of people doing something that can negative for their image which is a common case for this type of data synthesis 

Does the location of this data matter?

 The location of the data doesn't matter since the data is coming from a trained neural network on how to create human looking faces. But it likely does not have every single human trait meaning it can be very inaccurate

Where else could it be? Is there abstraction at play? 

Some other abstractions that could be at play is that these images have a certain position and face expression meaning there is limit to how realistic they look since they all have very similar pose and facial expression just different traits.  

Comments

Popular Posts