DATA BOOTCAMP FINAL PROJECT

Title: Analysis of H1B Petition from year 2008 to 2018

Data Source: United States Department of Labor - LCA Programs

Authors: Sarah Kim, Yilin Zhao, Jiaran Gu

Submission Date: May 15, 2020

TABLE OF CONTENT

1.Introduction

2.Data Cleaning

3.Data Analysis and Visualization

4.Descriptive Data Mining

5.Conclusion

1. Introduction

According to Workpermit.com, The US H-1B visa is a non-immigrant visa that allows US companies to employ graduate-level workers in specialty occupations that require theoretical or technical expertise in specialized fields such as in IT, finance, accounting, architecture, engineering, mathematics, science, medicine, etc.

As for this report, we would like to show the trend of the H1B petition over the years from 2008 to 2018 and potentia opportunities for international students to focus when they are applying for jobs in the United States.

Due to the Coronavirus, companies are laying off, and job searching is hard, but we still need to stay strong and be hopeful!

2. Data Cleaning

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
pd.set_option('display.max_columns', None)
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

Read in and Combine Data¶

fy2008 = pd.read_csv('H-1B_Case_Data_FY2008.csv')

fy2008 = fy2008[['CASE_NO','APPROVAL_STATUS','SUBMITTED_DATE','DOL_DECISION_DATE','NAME',
                                                     'CITY','OCCUPATIONAL_TITLE',
                                                      'STATE','POSTAL_CODE','JOB_CODE','JOB_TITLE',
                                                     'RATE_PER_1','PART_TIME_1',
                                                      'PREVAILING_WAGE_1']]

#Convert Part-time data to full-time to align with other data
fy2008['PART_TIME_1'] = fy2008['PART_TIME_1'].replace({'N': 'Y', 'Y': 'N'})

newnames08 = ['CASE_NO','CASE_STATUS','CASE_SUBMITTED', 'DECISION_DATE','EMPLOYER_NAME',
              'CITY','SOC_NAME','STATE','POSTAL_CODE','SOC_CODE','JOB_TITLE',
             'WAGE_UNIT','FULL_TIME','PREVAILING_WAGE']
fy2008.columns = newnames08

fy2009 = pd.read_csv('Icert_ LCA_ FY2009.csv',dtype='object')

fy2009 = fy2009[['LCA_CASE_NUMBER','STATUS','LCA_CASE_SUBMIT','Decision_Date', 
                 'LCA_CASE_EMPLOYER_NAME',  'LCA_CASE_EMPLOYER_CITY',
                 'LCA_CASE_EMPLOYER_STATE', 'LCA_CASE_EMPLOYER_POSTAL_CODE','LCA_CASE_SOC_CODE','LCA_CASE_SOC_NAME',
                'LCA_CASE_JOB_TITLE','FULL_TIME_POS', 'PW_1',
                 'PW_UNIT_1' ]]

newnames09 = ['CASE_NO','CASE_STATUS','CASE_SUBMITTED', 'DECISION_DATE','EMPLOYER_NAME',
              'CITY','STATE','POSTAL_CODE','SOC_CODE','SOC_NAME','JOB_TITLE',
             'FULL_TIME','PREVAILING_WAGE','WAGE_UNIT']
fy2009.columns = newnames09

fy2010 = pd.read_csv('H-1B_FY2010.csv',dtype='object')

fy2010 = fy2010[['LCA_CASE_NUMBER', 'STATUS', 'LCA_CASE_SUBMIT','DECISION_DATE','LCA_CASE_EMPLOYER_NAME',
                  'LCA_CASE_EMPLOYER_CITY','LCA_CASE_EMPLOYER_STATE', 'LCA_CASE_EMPLOYER_POSTAL_CODE','LCA_CASE_SOC_CODE','LCA_CASE_SOC_NAME',
                   'LCA_CASE_JOB_TITLE','PW_1','PW_UNIT_1' ]]
#Missing full_time information here

newnames10 = ['CASE_NO','CASE_STATUS','CASE_SUBMITTED', 'DECISION_DATE','EMPLOYER_NAME',
              'CITY','STATE','POSTAL_CODE','SOC_CODE','SOC_NAME','JOB_TITLE',
            'PREVAILING_WAGE','WAGE_UNIT']
fy2010.columns = newnames10

H1B08_10 =  pd.concat([fy2008,fy2009,fy2010],sort=False)

H1B11_14 = pd.concat(map(pd.read_csv, ['H-1B_iCert_LCA_FY2011_Q4 (1).csv','LCA_FY2012_Q4.csv','LCA_FY2013.csv','H-1B_FY2014.csv']))

C:\Data_Bootcamp\lib\site-packages\IPython\core\interactiveshell.py:3326: DtypeWarning:

Columns (22) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Data_Bootcamp\lib\site-packages\IPython\core\interactiveshell.py:3326: DtypeWarning:

Columns (22,26) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

H1B11_14 = H1B11_14[['LCA_CASE_NUMBER','STATUS','LCA_CASE_SUBMIT','LCA_CASE_JOB_TITLE','DECISION_DATE',
                 'LCA_CASE_EMPLOYER_NAME','LCA_CASE_EMPLOYER_CITY','LCA_CASE_EMPLOYER_STATE','LCA_CASE_EMPLOYER_POSTAL_CODE',
                 'FULL_TIME_POS','LCA_CASE_SOC_CODE','LCA_CASE_SOC_NAME','PW_1',
                 'LCA_CASE_WAGE_RATE_UNIT']]

newnames11 = ['CASE_NO','CASE_STATUS','CASE_SUBMITTED', 'JOB_TITLE','DECISION_DATE','EMPLOYER_NAME',
              'CITY','STATE','POSTAL_CODE', 'FULL_TIME','SOC_CODE','SOC_NAME',
            'PREVAILING_WAGE','WAGE_UNIT']
H1B11_14.columns = newnames11

H1B15_18 = pd.concat(map(pd.read_csv, ['FY2015.csv','FY2016.csv','FY2017.csv','FY2018.csv']))

C:\Data_Bootcamp\lib\site-packages\IPython\core\interactiveshell.py:3326: DtypeWarning:

Columns (14,15) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Data_Bootcamp\lib\site-packages\IPython\core\interactiveshell.py:3326: DtypeWarning:

Columns (14,15,16,24) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Data_Bootcamp\lib\site-packages\IPython\core\interactiveshell.py:3326: DtypeWarning:

Columns (15,16) have mixed types. Specify dtype option on import or set low_memory=False.

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

H1B15_18 = H1B15_18[['CASE_NUMBER','CASE_STATUS', 'CASE_SUBMITTED','JOB_TITLE','DECISION_DATE',
                 'EMPLOYER_NAME','EMPLOYER_CITY','EMPLOYER_STATE','EMPLOYER_POSTAL_CODE',
                'FULL_TIME_POSITION','SOC_CODE','SOC_NAME','PREVAILING_WAGE',
                 'WAGE_UNIT_OF_PAY']]

newnames15 = ['CASE_NO','CASE_STATUS','CASE_SUBMITTED', 'JOB_TITLE','DECISION_DATE','EMPLOYER_NAME',
              'CITY','STATE','POSTAL_CODE','FULL_TIME','SOC_CODE','SOC_NAME',
            'PREVAILING_WAGE','WAGE_UNIT']
H1B15_18.columns = newnames15

h1b =  pd.concat([H1B08_10,H1B11_14,H1B15_18])

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

h1b

h1b.dtypes

CASE_NO            object
CASE_STATUS        object
CASE_SUBMITTED     object
CITY               object
DECISION_DATE      object
EMPLOYER_NAME      object
FULL_TIME          object
JOB_TITLE          object
POSTAL_CODE        object
PREVAILING_WAGE    object
SOC_CODE           object
SOC_NAME           object
STATE              object
WAGE_UNIT          object
dtype: object

h1b.shape

(5118351, 14)

At this point we combined all the H1B data for 11 years, then we will start data cleaning.

Unifying the Case Status¶

h1b['CASE_STATUS'].value_counts()

CERTIFIED                                             4468166
CERTIFIED-WITHDRAWN                                    330444
DENIED                                                 171409
WITHDRAWN                                              148314
PENDING QUALITY AND COMPLIANCE REVIEW - UNASSIGNED         15
REJECTED                                                    2
INVALIDATED                                                 1
Name: CASE_STATUS, dtype: int64

h1b.drop(h1b.loc[h1b['CASE_STATUS'] == 'PENDING QUALITY AND COMPLIANCE REVIEW - UNASSIGNED'].index, inplace=True)
h1b.loc[h1b['CASE_STATUS']=='REJECTED','CASE_STATUS'] = 'DENIED'
h1b.loc[h1b['CASE_STATUS']=='INVALIDATED','CASE_STATUS'] = 'DENIED'

Unifying the Wage¶

h1b['WAGE_UNIT'].value_counts()

Year                4333031
yr                   369577
Hour                 367444
hr                    33441
Month                  6882
Week                   3172
mth                    1710
Bi-Weekly              1422
wk                      737
Select Pay Range        359
bi                      161
Name: WAGE_UNIT, dtype: int64

h1b.loc[h1b['WAGE_UNIT']=='yr','WAGE_UNIT'] = 'Year'
h1b.loc[h1b['WAGE_UNIT']=='hr','WAGE_UNIT'] = 'Hour'
h1b.loc[h1b['WAGE_UNIT']=='mth','WAGE_UNIT'] = 'Month'
h1b.loc[h1b['WAGE_UNIT']=='wk','WAGE_UNIT'] = 'Week'
h1b.loc[h1b['WAGE_UNIT']=='bi','WAGE_UNIT'] = 'Bi-Weekly'

h1b.loc[h1b['WAGE_UNIT']== 'Select Pay Range']

h1b.drop(h1b.loc[h1b['WAGE_UNIT']== 'Select Pay Range'].index, inplace=True)

Only save the yearly income.¶

h1b.drop(h1b.loc[h1b['PREVAILING_WAGE'] == '20-70'].index, inplace=True)

h1b.drop(h1b.loc[h1b['PREVAILING_WAGE'] == '.'].index, inplace=True)

h1b['PREVAILING_WAGE'] = h1b['PREVAILING_WAGE'].astype('str').str.replace(',','').str.replace('$','').str.replace(' ','')

# Change hourly wage into yearly wage
h1b.at[h1b['WAGE_UNIT'] == 'Hour', 'PREVAILING_WAGE'] = h1b.loc[h1b[
    'WAGE_UNIT'] == 'Hour']['PREVAILING_WAGE'].astype(float).apply(lambda x: x * 37.5 * 52)
# Change montly wage into yearly wage
h1b.at[h1b['WAGE_UNIT'] == 'Month', 'PREVAILING_WAGE'] = h1b.loc[h1b[
    'WAGE_UNIT'] == 'Month']['PREVAILING_WAGE'].astype(float).apply(lambda x: x * 12) 
# Change weekly into yearly wage
h1b.at[h1b['WAGE_UNIT'] == 'Week', 'PREVAILING_WAGE'] = h1b.loc[h1b[
    'WAGE_UNIT'] == 'Week']['PREVAILING_WAGE'].astype(float).apply(lambda x: x * 52)
# Change bi-weekly into yearly wage 
h1b.at[h1b['WAGE_UNIT'] == 'Bi-Weekly', 'PREVAILING_WAGE'] = h1b.loc[h1b[
    'WAGE_UNIT'] == 'Bi-Weekly']['PREVAILING_WAGE'].astype(float).apply(lambda x: x * 26) 
# Change all columns into float
h1b['PREVAILING_WAGE'] = h1b['PREVAILING_WAGE'].apply(lambda x: float(x.replace(',', '')) if isinstance(x, str) else x)
# Drop the wage unit column
h1b.drop(columns='WAGE_UNIT', inplace=True)

#drop wages not inbetween 10,000 and 2,000,000 as these were put with wrong wage units
h1b.drop(h1b.loc[(h1b['PREVAILING_WAGE'] > 2000000) | (h1b['PREVAILING_WAGE'] < 10000)].index, inplace=True)

Adding a Column with Full State Names.¶

h1b['STATE']

0         MA
1         PA
2         CA
3         NY
4         PA
          ..
654355    TX
654356    IL
654357    FL
654358    NC
654359    WA
Name: STATE, Length: 5018088, dtype: object

h1b['FSTATE'] = h1b['STATE']

usStateAbbr = {
    'AL': 'Alabama', 'AK': 'Alaska', 'AZ': 'Arizona', 'AR': 'Arkansas', 'CA': 'California',
    'CO': 'Colorado', 'CT': 'Connecticut', 'DE': 'Delaware', 'FL': 'Florida', 'GA': 'Georgia',
    'HI': 'Hawaii', 'ID': 'Idaho', 'IL': 'Illinois', 'IN': 'Indiana', 'IA': 'Iowa', 'KS': 'Kansas',
    'KY': 'Kentucky', 'LA': 'Louisiana', 'ME': 'Maine', 'MD': 'Maryland', 'MA': 'Massachusetts',
    'MI': 'Michigan', 'MN': 'Minnesota', 'MS': 'Mississippi', 'MO': 'Missouri', 'MT': 'Montana',
    'NE': 'Nebraska', 'NV': 'Nevada', 'NH': 'New Hampshire', 'NJ': 'New Jersey', 'NM': 'New Mexico',
    'NY': 'New York', 'NC': 'North Carolina', 'ND': 'North Dakota', 'OH': 'Ohio', 'OK': 'Oklahoma',
    'OR': 'Oregon', 'PA': 'Pennsylvania', 'RI': 'Rhode Island', 'SC': 'South Carolina',
    'SD': 'South Dakota', 'TN': 'Tennessee', 'TX': 'Texas', 'UT': 'Utah', 'VT': 'Vermont',
    'VA': 'Virginia', 'WA': 'Washington', 'WV': 'West Virginia', 'WI': 'Wisconsin', 'WY': 'Wyoming'}

h1b['FSTATE'] = h1b['FSTATE'].map(usStateAbbr).str.upper()

Cleaning the Date¶

h1b['DECISION_DATE'] = pd.to_datetime(h1b['DECISION_DATE'])
h1b['CASE_SUBMITTED'] = pd.to_datetime(h1b['CASE_SUBMITTED'])

h1b = h1b.dropna(subset=['DECISION_DATE','CASE_SUBMITTED'])

Cleaning the SOC_NAME¶

# Find one erro in SOC Name and cleaned here
h1b.loc[h1b['SOC_NAME'] == '15-1121', 'SOC_NAME'] = 'COMPUTER SYSTEMS ANALYSTS'

C:\Data_Bootcamp\lib\site-packages\pandas\core\indexing.py:494: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

h1b['SOC_NAME'].value_counts().head(20)

SOFTWARE DEVELOPERS, APPLICATIONS                  531434
COMPUTER SYSTEMS ANALYSTS                          363295
COMPUTER PROGRAMMERS                               260430
Computer Systems Analysts                          248901
COMPUTER OCCUPATIONS, ALL OTHER                    220031
Computer Programmers                               215482
OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING    179713
Software Developers, Applications                  133185
SOFTWARE DEVELOPERS, SYSTEMS SOFTWARE               83599
Computer Software Engineers, Applications           70784
MANAGEMENT ANALYSTS                                 52958
COMPUTER SYSTEMS ANALYST                            47798
ACCOUNTANTS AND AUDITORS                            44274
NETWORK AND COMPUTER SYSTEMS ADMINISTRATORS         41618
Computer Occupations, All Other                     38874
FINANCIAL ANALYSTS                                  38561
MECHANICAL ENGINEERS                                38277
OPERATIONS RESEARCH ANALYSTS                        37414
Management Analysts                                 36218
Financial Analysts                                  35654
Name: SOC_NAME, dtype: int64

h1b.loc[h1b['SOC_NAME']=='Computer Programmers','SOC_NAME'] = 'COMPUTER PROGRAMMERS'
h1b.loc[h1b['SOC_NAME']=='Computer Systems Analysts','SOC_NAME'] = 'COMPUTER SYSTEMS ANALYSTS'
h1b.loc[h1b['SOC_NAME']=='COMPUTER SYSTEMS ANALYST','SOC_NAME'] = 'COMPUTER SYSTEMS ANALYSTS'
h1b.loc[h1b['SOC_NAME']=='Computer Software Engineers, Applications','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='SOFTWARE DEVELOPERS, SYSTEMS SOFTWARE','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='Software Developers, Applications','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='SOFTWARE DEVELOPERS, APPLICATIONS ','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='Software Developers, Systems Software','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='Computer Software Engineers, Systems Software','SOC_NAME'] = 'SOFTWARE DEVELOPERS, APPLICATIONS'
h1b.loc[h1b['SOC_NAME']=='Computer Occupations, All Other','SOC_NAME'] = 'COMPUTER OCCUPATIONS, ALL OTHER'
h1b.loc[h1b['SOC_NAME']=='Financial Analysts','SOC_NAME'] = 'FINANCIAL ANALYSTS'

Final Dataset¶

h1b

h1b.dtypes

CASE_NO                    object
CASE_STATUS                object
CASE_SUBMITTED     datetime64[ns]
CITY                       object
DECISION_DATE      datetime64[ns]
EMPLOYER_NAME              object
FULL_TIME                  object
JOB_TITLE                  object
POSTAL_CODE                object
PREVAILING_WAGE           float64
SOC_CODE                   object
SOC_NAME                   object
STATE                      object
FSTATE                     object
dtype: object

3. Data Analysis and Visualization

Since 2017, the Trump administration has announced series reformation on H1B visa policy, it changes the landscape a lot. In the following analysis, we will compare the 2017 to 2018 data to visualize these changes.

H1B Processing Time¶

decisionDaysTook = h1b['DECISION_DATE'] - h1b['CASE_SUBMITTED']
decisionDaysTook = decisionDaysTook.dropna().dt.days

days_over_30 = len(decisionDaysTook[(decisionDaysTook > 30) & (decisionDaysTook < 90)])
days_over_90 = len(decisionDaysTook[(decisionDaysTook > 90)])
top3_dec = decisionDaysTook.value_counts().iloc[:3]
top5 = pd.Series(days_over_30, index =['30 to 90'])
top4 = pd.Series(days_over_90, index =['90 or more'])
top4n5 = top4.append(top5)
top_days = top3_dec.append(top4n5)

fig, ax = plt.subplots(figsize=(12, 4))

sns.barplot(y=top_days.index, x=top_days.values);
plt.title("Number of Days Taken for the Result", fontsize=15, fontweight='bold')
plt.ylabel("Days")
plt.xlabel("Number of Cases")
plt.legend(['Cases']);

Here we used case submitted date and decision date to calculate the number of days taken for the decision. The majority of decisions were made less than a week.

Final Application Decisions¶

h1b['YEAR'] = h1b['DECISION_DATE'].dt.year.astype('str')

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

H1B08_16 = h1b[~h1b["YEAR"].isin(["2017","2018"]) ]
H1B17_18 = h1b[h1b["YEAR"].isin(["2017","2018"]) ]

fig,ax = plt.subplots(nrows=1,ncols=3,sharey=False)
caseStatusSum = h1b.groupby(['CASE_STATUS'])[['CASE_NO']].count()
caseStatusb17 = H1B08_16.groupby(['CASE_STATUS'])[['CASE_NO']].count()
caseStatusa17 = H1B17_18.groupby(['CASE_STATUS'])[['CASE_NO']].count()
caseStatusSum.plot.pie(ax=ax[0],figsize=(15,10),y='CASE_NO', explode=(0.1, 0.1, 0.1, 0.1), autopct='%1.1f%%', shadow=True)
caseStatusb17.plot.pie(ax=ax[1],figsize=(15,10),y='CASE_NO', explode=(0.1, 0.1, 0.1, 0.1), autopct='%1.1f%%', shadow=True)
caseStatusa17.plot.pie(ax=ax[2],figsize=(15,10),y='CASE_NO', explode=(0.1, 0.1, 0.1, 0.1), autopct='%1.1f%%', shadow=True)
fig.tight_layout()
ax[0].set_title('Overall Decision on the Cases', fontsize=15, fontweight='bold')
ax[1].set_title('Before 2017 Decision on the Cases', fontsize=15, fontweight='bold')
ax[2].set_title('After 2017 Decision on the Cases', fontsize=15, fontweight='bold')
plt.legend(bbox_to_anchor=(1.4, 1), loc='upper right', ncol=1);

As the abrove pictures show, from the percentage perspective, there are no big changes before 2017, after 2017 and the overall data. The certified cases are around 88%, which is slightly increased after 2017.There is a 2% decrease shows in the denied cases and 1% increase on Certified-Withdrawn, which is interesting.

The results are different from our expection because we think the certified cases might decrease after the policy change in 2017. The next step, we will figure out if the case number is droped, which means less people apply for H1B visa per year after 2017.

H1B Applications Over Years¶

ycase = h1b.groupby('YEAR').agg('count')
ycase1 = ycase.iloc[:,:1]
ycase1
#Shows how many H1B cases was submitted through out the years

fig, ax = plt.subplots(figsize=(10,5))
ycase1.plot(ax=ax)
plt.title('H1B Applications Over Years',fontsize=15,fontweight='bold')
plt.ylabel('Number of Applications')
ax.legend(['Jobs'])

<matplotlib.legend.Legend at 0x21c9fb55a88>

In year 2013, the H1B application hit the lowest number which is 74897 and it's increasing after that. We can also see a tendency of dropping after year 2017, which matched with our expectation.

H1B Popular States¶

a = H1B08_10.groupby('STATE').agg('count')
b = a.iloc[:,:1].reset_index().rename(columns={'CASE_NO': 'Cases'})
fig = px.choropleth(b, 
                    locations="STATE", 
                    color="Cases", 
                    hover_name="STATE" , 
                    locationmode = 'USA-states') 
fig.update_layout(
    title_text = 'Year 2008 to 2010 H1B Job Rankings in States of US', 
    geo_scope='usa', 
    
)
fig.show()

a = h1b.groupby('STATE').agg('count')
b = a.iloc[:,:1].reset_index().rename(columns={'CASE_NO': 'Cases'})
fig = px.choropleth(b, 
                    locations="STATE", 
                    color="Cases", 
                    hover_name="STATE" , 
                    locationmode = 'USA-states') 
fig.update_layout(
    title_text = 'H1B Job Rankings in States of US', 
    geo_scope='usa', 
    
)
fig.show()

certified = h1b.loc[h1b['CASE_STATUS'] == 'CERTIFIED']
denied = h1b.loc[h1b['CASE_STATUS'] == 'DENIED']

certified18 = H1B17_18.loc[H1B17_18['CASE_STATUS'] == 'CERTIFIED']
denied18 = H1B17_18.loc[H1B17_18['CASE_STATUS'] == 'DENIED']

fig, ax = plt.subplots(2, 1, figsize=(13, 10))

sns.countplot(y=certified['FSTATE'], order=certified['FSTATE'].value_counts().iloc[:10].index, ax=ax[0])
ax[0].set_title('States with the Most Certified H1B Applicants', fontsize=15, fontweight='bold')
ax[0].set_xlabel('Number of Certified Cases')
ax[0].set_ylabel('City')
ax[0].legend(['Certified Cases'])

sns.countplot(y=certified18['FSTATE'], order=certified18['FSTATE'].value_counts().iloc[:10].index, ax=ax[1])
ax[1].set_title('2017 to 2018 States with the Most Certified H1B Applicants', fontsize=15, fontweight='bold')
ax[1].set_xlabel('Number of Certified Cases')
ax[1].set_ylabel('City')
ax[1].legend(['Certified Cases'])

plt.subplots_adjust(hspace=0.5);

fig, ax = plt.subplots(2, 1, figsize=(13, 10))

sns.countplot(y=denied['FSTATE'], order=denied['FSTATE'].value_counts().iloc[:10].index, ax=ax[0])
ax[0].set_title('States with the Most denied H1B Applicants', fontsize=15, fontweight='bold')
ax[0].set_xlabel('Number of denied Cases')
ax[0].set_ylabel('City')
ax[0].legend(['Denied Cases'])

sns.countplot(y=denied18['FSTATE'], order=denied18['FSTATE'].value_counts().iloc[:10].index, ax=ax[1])
ax[1].set_title('2017 to 2018 States with the Most denied H1B Applicants', fontsize=15, fontweight='bold')
ax[1].set_xlabel('Number of denied Cases')
ax[1].set_ylabel('City')
ax[1].legend(['Denied Cases'])

plt.subplots_adjust(hspace=0.5);

As the above pictures show, the number of H1B applications of states don't change that much. But we do see that Texas is poping up to offer more H1B visa and New York is slowing down it's visa application. According to a report from Business in Texas,Texas'IT services workforce is the second largest in the nation.

Number wise, California has the most number of applicants in both certified and denied. Interestingly, New Jersey has the second highest number of certified applicatns with the forth denied application. On the contrary, New York has higher denied rate with lower certified rate.

International students can focus on CA, NY, NJ and TX when looking for full-time jobs. Compare to the overall data with the data from 2017 to 2018,we suggest international students to pay extra attention on Texas and New Jersey are great opportunities.

H1B Popular Cities¶

fig, ax = plt.subplots(2,1, figsize=(13, 10))

sns.countplot(y=certified['CITY'], order=certified['CITY'].value_counts().iloc[:10].index, ax=ax[0])
ax[0].set_title('Most Certified H1B Cities', fontsize=15, fontweight='bold')
ax[0].set_xlabel('Number of Certified Cases')
ax[0].set_ylabel('City')
ax[0].legend(['Certified Cases'])

sns.countplot(y=denied['CITY'], order=denied['CITY'].value_counts().iloc[:10].index, ax=ax[1])
ax[1].set_title('Most denied H1B Cities', fontsize=15, fontweight='bold')
ax[1].set_xlabel('Number of denied Cases')
ax[1].set_ylabel('City')
ax[1].legend(['Denied Cases'])

plt.subplots_adjust(hspace=0.5);

From this chart, we can find that New York City is the highest numbers in certified and the most highest numbers in denied applications. And, Plano in Texas is a great opportunity with the second highest number of certified cases.

H1B Popular Occupations¶

fig, ax = plt.subplots(figsize=(12, 5))

sns.countplot(y=h1b['FULL_TIME'])
plt.title("Full/Part Time Position of Applicants", fontsize=15, fontweight='bold')
plt.xlabel("Numbers of Cases")
plt.ylabel("Position")
ax.set_yticklabels(['Full Time', 'Part Time'])
plt.legend(['Cases']);

Most of the applicants are working as a full-time rather than a part-time. Over 1.8 millions of applicatns were full-time.

fig, ax = plt.subplots(2,1, figsize=(13, 10))

sns.countplot(y=certified['SOC_NAME'], order=certified['SOC_NAME'].value_counts().iloc[:10].index, ax=ax[0])
ax[0].set_title('Overall Most Certified H1B Occupations', fontsize=15, fontweight='bold')
ax[0].set_xlabel('Number of Certified Cases')
ax[0].set_ylabel('Occupations')
ax[0].legend(['Certified Cases'])

sns.countplot(y=certified18['SOC_NAME'], order=certified18['SOC_NAME'].value_counts().iloc[:10].index, ax=ax[1])
ax[1].set_title('2017 to 2018 Most Certified H1B Occupations', fontsize=15, fontweight='bold')
ax[1].set_xlabel('Number of Certified Cases')
ax[1].set_ylabel('Occupations')
ax[1].legend(['Certified Cases'])

plt.subplots_adjust(hspace=0.5);

fig, ax = plt.subplots(2,1, figsize=(13, 10))

sns.countplot(y=denied['SOC_NAME'], order=denied['SOC_NAME'].value_counts().iloc[:10].index, ax=ax[0])
ax[0].set_title('Overall Most Denied H1B Occupations', fontsize=15, fontweight='bold')
ax[0].set_xlabel('Number of Denied Cases')
ax[0].set_ylabel('Occupations')
ax[0].legend(['Denied Cases'])

sns.countplot(y=denied18['SOC_NAME'], order=denied18['SOC_NAME'].value_counts().iloc[:10].index, ax=ax[1])
ax[1].set_title('2017 to 2018 Most Denied H1B Occupations', fontsize=15, fontweight='bold')
ax[1].set_xlabel('Number of Denied Cases')
ax[1].set_ylabel('Occupations')
ax[1].legend(['Denied Cases'])

plt.subplots_adjust(hspace=0.5);

The occupations with the highest certified rate are all in IT field and we do see diversified results of denied applications. Besides IT, there are also opportunities in analytics, accounting and engineering roles

Software developers have the highest certified rate and the highest denied rate.

H1B Popular Companies¶

fig, ax = plt.subplots(2,1,figsize=(13, 10))

sns.countplot(y=h1b['EMPLOYER_NAME'], order=h1b['EMPLOYER_NAME'].value_counts().iloc[:10].index, ax=ax[0]);
ax[0].set_title("Overall H1B Popular Comapnies", fontsize=15, fontweight='bold')
ax[0].set_ylabel("Companies")
ax[0].set_xlabel("Number")
ax[0].legend(['Cases'])

sns.countplot(y=H1B17_18['EMPLOYER_NAME'], order=H1B17_18['EMPLOYER_NAME'].value_counts().iloc[:10].index, ax=ax[1]);
ax[1].set_title("2017 to 2018 H1B Popular Comapnies", fontsize=15, fontweight='bold')
ax[1].set_ylabel("Companies")
ax[1].set_xlabel("Number")
ax[1].legend(['Cases'])

plt.subplots_adjust(hspace=0.5);

As we can see the trend, the companies with most H1B applicants are technology consulting firms. It's interesting to see all of the big tech firms, such as Apply, Google, Microsoft and Amazon were not in the list of top 10 sponsored companies.

top10_company = h1b['EMPLOYER_NAME'].value_counts().head(10).index.to_list()
Company_job = h1b[h1b['EMPLOYER_NAME'].isin(top10_company)].groupby(['EMPLOYER_NAME'])['JOB_TITLE'].value_counts(ascending=False).to_frame()
Company_job.rename(columns={'JOB_TITLE': 'Count'})
Company_job.drop(Company_job.loc[(Company_job['JOB_TITLE'] <2000)].index,inplace=True)

Company_job

fig, ax = plt.subplots(2, 1, figsize=(12, 8))

sns.countplot(y=certified['EMPLOYER_NAME'], order=certified['EMPLOYER_NAME'].value_counts().iloc[:10].index, ax=ax[0]);
ax[0].set_title("Comapnies with Most Certified H1B Applicants", fontsize=15, fontweight='bold')
ax[0].set_ylabel("Company")
ax[0].set_xlabel("Number")
ax[0].legend(['Cases']);

sns.countplot(y=denied['EMPLOYER_NAME'], order=denied['EMPLOYER_NAME'].value_counts().iloc[:10].index, ax=ax[1]);
ax[1].set_title("20017 Comapnies with Most Denied H1B Applicants", fontsize=15, fontweight='bold')
ax[1].set_ylabel("Company")
ax[1].set_xlabel("Number")
ax[1].legend(['Cases'])

plt.subplots_adjust(hspace=0.5);

Infosys limited, the one with the most H1B application has the highest certified cases. However, Deloitte consulting LLP has the highest denied cases. Also, those companies like IBM and Ernst&Young LLP, that relatively had lower numbers in H1B applications, has the higher denied cases that the companies like Infosys and Tata consulting services limited.

H1B Wages Aanalysis¶

h1b_agg1=h1b.groupby('EMPLOYER_NAME').agg({'PREVAILING_WAGE':[np.mean,np.median,np.min,np.max]})
h1b_agg1.columns = ['wage_mean','wage_median','wage_min','wage_max']
h1b_agg1.sort_values('wage_mean',ascending = False).head(10)

h1b_agg=h1b.groupby('FSTATE').agg({'PREVAILING_WAGE':[np.mean,np.median,np.min,np.max]})
h1b_agg.columns = ['wage_mean','wage_median','wage_min','wage_max']
h1b_agg.sort_values('wage_mean',ascending = False).head(10)

top10_state = h1b['FSTATE'].value_counts().head(10).index.to_list()
state_wage = h1b.groupby(['FSTATE','YEAR'],as_index=False)['PREVAILING_WAGE'].mean()

top10_state1 = H1B17_18['FSTATE'].value_counts().head(10).index.to_list()
state_wage1 = H1B17_18.groupby(['FSTATE','YEAR'],as_index=False)['PREVAILING_WAGE'].mean()

fig, ax = plt.subplots(2,1,figsize=(12,15))

for i in top10_state:
    tmp = state_wage.loc[state_wage['FSTATE']==i]
    
    ax[0].plot(tmp['YEAR'], tmp['PREVAILING_WAGE'], 
            label = i,
            linewidth = 2)
for i in top10_state1:
    tmp1 = state_wage1.loc[state_wage1['FSTATE']==i]
    
    ax[1].plot(tmp1['YEAR'], tmp1['PREVAILING_WAGE'], 
            label = i,
            linewidth = 2)


ax[0].set_title('Overall Trend of H1B Job Wages in Top 10 States',fontsize=15,fontweight='bold')
ax[1].set_title('2017 to 2018 Trend of H1B Job Wages in Top 10 States',fontsize=15,fontweight='bold')
ax[0].set_xlabel('Year')
ax[0].set_ylabel('Wages')
ax[0].legend(bbox_to_anchor=(1.1, 1.05),loc = 'best')
ax[1].set_xlabel('Year')
ax[1].set_ylabel('Wages')
ax[1].legend(bbox_to_anchor=(1.1, 1.05),loc = 'best')

plt.subplots_adjust(hspace=0.5);

The overall trend for the wages of H1B applicants are increasing, and we see a big increase in 2013.California and New York have relatively high wages than other states. Also, it's interesting to see that washington has high wages almost close to California in year 2017 and 2018

4.Descriptive Data Mining

pd.set_option('display.max_columns', None)
import statsmodels.formula.api as smf
from sklearn.linear_model import LinearRegression
import seaborn as sns
h1b =  pd.concat([H1B08_10,H1B11_14,H1B15_18])
h1b.dropna(how = 'all')

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:5: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

Logistic Regression¶

h1b02 = h1b[['CASE_STATUS','PREVAILING_WAGE']]

h1b02['PREVAILING_WAGE'] = h1b02['PREVAILING_WAGE'].astype('str').str.replace(',','').str.replace('$','').str.replace('-','')
h1b02['PREVAILING_WAGE']= pd.to_numeric(h1b02['PREVAILING_WAGE'],errors = 'coerce')
h1b02['CASE_STATUS']=h1b02['CASE_STATUS'].astype('str').str.replace('CERTIFIED-WITHDRAWN','DENIED')
h1b02['CASE_STATUS']=h1b02['CASE_STATUS'].astype('str').str.replace('DENIED','0').str.replace('CERTIFIED','1').str.replace('WITHDRAWN','0')
h1b02['CASE_STATUS']=h1b02['CASE_STATUS'].astype('str').str.replace('PENDING QUALITY AND COMPLIANCE REVIEW - UNASSIGNED','0')
h1b02['CASE_STATUS']=h1b02['CASE_STATUS'].astype('str').str.replace('REJECTED','0')
h1b02['CASE_STATUS']=h1b02['CASE_STATUS'].astype('str').str.replace('INVALIDATED','0')
h1b02.dropna(inplace=True)
h1b02

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:5: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:6: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:7: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Data_Bootcamp\lib\site-packages\ipykernel_launcher.py:8: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

from sklearn.linear_model import LogisticRegression
skl_reg1 = LogisticRegression().fit(h1b02[['PREVAILING_WAGE']].values,h1b02['CASE_STATUS'].values)
skl_reg1.score(h1b02[['PREVAILING_WAGE']].values,h1b02['CASE_STATUS'].values)

C:\Data_Bootcamp\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning:

Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.

0.8731155093984423

Then we got the result: 87.3%, which proves that 'Status' does have strong connection with income.

According to the H1b official report published in April 2017,there will be more H1b visas captured by STEM workers, and generally STEM workers who got H1b visas earned more more on average (US Dollar 76356) than American workers with a bachelor’s degree (US Dollar 67301), within the same age group and occupation. This is why H1b visas still go to workers with higher income, because most of them are able to get a job at high-tech companies such as Google and Facebook, and some of them are sponsored by consulting firms.

5.Conclusion

In this rerport, we analyzed the H1B petition data from year 2008 to year 2018. There are some interesting findings:

1.In year 2013, the case submission has a big drop.And the H1B wages has a slightly increase in the same year.

2.The Overall certified HIB rate is around 88% and the processing time is around a week.

3.California is the state that has the highest H1B case submission and average wage, which around 8k

4.New York, New Jerseay, and Texas are three states that have high H1B submission. Among these states,the application number and denied rate are increaing in New York, but Texsas is a growing places that deserve extra attention.

5.Besides, Washington has the highest average wages

6.IT is the most popular area that get H1B sponsored. The occupations are related to programming and software development. Beside IT, there are opportunities in accounting and analytics (financial, marketing, and management)

7.IT consulting firms are most welcome H1B sponsorship, but we do see the big tech companies such as Apply, Amazon, Google, and Microsoft are missing in the top 10 list

Thanks for reading. Hope you find it interesting and helpful!

#h1b.to_csv('Databootcamp_finalreport.csv')

	CASE_NO	CASE_STATUS	CASE_SUBMITTED	CITY	DECISION_DATE	EMPLOYER_NAME	FULL_TIME	JOB_TITLE	POSTAL_CODE	PREVAILING_WAGE	SOC_CODE	SOC_NAME	STATE	WAGE_UNIT
0	I-08090-4182485	CERTIFIED	2008/3/30	BURLINGTON	2008/3/30	SPI RESEARCH AND DEVELOPMENT SERVICES, LLC	Y	PROGRAMMER ANALYST	1803	44298	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	MA	yr
1	I-08090-4182487	CERTIFIED	2008/3/30	Horsham	2008/3/30	Nexus Software Solutions,Inc.	Y	JAVA/J2EE DEVELOPER	19044	43597	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	PA	yr
2	I-08090-4182493	CERTIFIED	2008/3/30	FREMONT	2008/3/30	GREYCELL INC	Y	COMPUTER PROGRAMMER	94536	52437	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	CA	yr
3	I-08090-4182495	CERTIFIED	2008/3/30	NEW YORK	2008/3/30	DATA PIONEER INC	Y	PROGRAMMER ANALYST	10005	60000	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	NY	yr
4	I-08090-4182497	CERTIFIED	2008/3/30	Horsham	2008/3/30	Nexus Software Solutions,Inc.	Y	ASP.NET DEVELOPER	19044	43597	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	PA	yr
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
654355	I-200-18080-658767	CERTIFIED	3/21/2018	IRVING	3/27/2018	CROCKETT TECHNICAL LLC	Y	ASSOCIATE SOFTWARE ENGINEER	75062	90,646.00	15-1132	SOFTWARE DEVELOPERS, APPLICATIONS	TX	Year
654356	I-200-18061-209836	CERTIFIED	3/5/2018	METTAWA	3/9/2018	BRUNSWICK CORPORATION	Y	MANAGER GLOBAL SUPPLY CHAIN BUSINESS ANALYTICS	60045	68,598.00	17-2112	INDUSTRIAL ENGINEERS	IL	Year
654357	I-201-17315-187517	CERTIFIED	11/13/2017	DORAL	11/17/2017	JOFEMAR USA INC.	Y	BUSINESS RESEARCH AND DEVELOPMENT	33172	65,936.00	13-1161	MARKET RESEARCH ANALYSTS AND MARKETING SPECIAL...	FL	Year
654358	I-200-18080-929219	CERTIFIED	3/21/2018	DURHAM	3/27/2018	IBM INDIA PRIVATE LIMITED	Y	PACKAGE SOLUTION CONSULTANT	27709	78,416.00	15-1121	COMPUTER SYSTEMS ANALYSTS	NC	Year
654359	I-200-18261-995519	CERTIFIED	9/18/2018	REDMOND	9/24/2018	MICROSOFT CORPORATION	Y	SOFTWARE ENGINEER	98052	133,806.00	15-1132	SOFTWARE DEVELOPERS, APPLICATIONS	WA	Year

	CASE_NO	CASE_STATUS	CASE_SUBMITTED	CITY	DECISION_DATE	EMPLOYER_NAME	FULL_TIME	JOB_TITLE	POSTAL_CODE	PREVAILING_WAGE	SOC_CODE	SOC_NAME	STATE	WAGE_UNIT
192	I-200-09205-336218	WITHDRAWN	2010/11/23	NaN	2010/11/23	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
301	I-200-09279-405989	WITHDRAWN	2011/2/10	NaN	2011/2/10	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
386	I-200-09301-178420	WITHDRAWN	2011/3/25	OKLAHOMA CITY	2011/3/25	SKY FOUNDATION	NaN	BUSINESS MANAGER	73106-	NaN	NaN	NaN	OK	Select Pay Range
411	I-200-09315-488549	WITHDRAWN	2011/3/25	SAN ANTONIO	2011/3/25	RIVERWALK FOUNDATION	Y	ASSISTANT PRINCIPAL	78209-	NaN	11-9032.00	Education Administrators, Elementary and Secon...	TX	Select Pay Range
415	I-200-09320-788518	WITHDRAWN	2011/3/25	NaN	2011/3/25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
379159	I-200-12131-574691	WITHDRAWN	2013/7/24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
379162	I-200-12236-316830	WITHDRAWN	2013/7/24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
379164	I-200-12257-907806	WITHDRAWN	2013/7/24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
382387	I-200-13158-209616	WITHDRAWN	2013/7/26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range
422949	I-200-13252-421149	WITHDRAWN	2013/9/9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Select Pay Range

	CASE_NO	CASE_STATUS	CASE_SUBMITTED	CITY	DECISION_DATE	EMPLOYER_NAME	FULL_TIME	JOB_TITLE	POSTAL_CODE	PREVAILING_WAGE	SOC_CODE	SOC_NAME	STATE	FSTATE	YEAR
0	I-08090-4182485	CERTIFIED	2008-03-30	BURLINGTON	2008-03-30	SPI RESEARCH AND DEVELOPMENT SERVICES, LLC	Y	PROGRAMMER ANALYST	1803	44298.0	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	MA	MASSACHUSETTS	2008
1	I-08090-4182487	CERTIFIED	2008-03-30	Horsham	2008-03-30	Nexus Software Solutions,Inc.	Y	JAVA/J2EE DEVELOPER	19044	43597.0	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	PA	PENNSYLVANIA	2008
2	I-08090-4182493	CERTIFIED	2008-03-30	FREMONT	2008-03-30	GREYCELL INC	Y	COMPUTER PROGRAMMER	94536	52437.0	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	CA	CALIFORNIA	2008
3	I-08090-4182495	CERTIFIED	2008-03-30	NEW YORK	2008-03-30	DATA PIONEER INC	Y	PROGRAMMER ANALYST	10005	60000.0	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	NY	NEW YORK	2008
4	I-08090-4182497	CERTIFIED	2008-03-30	Horsham	2008-03-30	Nexus Software Solutions,Inc.	Y	ASP.NET DEVELOPER	19044	43597.0	30	OCCUPATIONS IN SYSTEMS ANALYSIS AND PROGRAMMING	PA	PENNSYLVANIA	2008
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
654355	I-200-18080-658767	CERTIFIED	2018-03-21	IRVING	2018-03-27	CROCKETT TECHNICAL LLC	Y	ASSOCIATE SOFTWARE ENGINEER	75062	90646.0	15-1132	SOFTWARE DEVELOPERS, APPLICATIONS	TX	TEXAS	2018
654356	I-200-18061-209836	CERTIFIED	2018-03-05	METTAWA	2018-03-09	BRUNSWICK CORPORATION	Y	MANAGER GLOBAL SUPPLY CHAIN BUSINESS ANALYTICS	60045	68598.0	17-2112	INDUSTRIAL ENGINEERS	IL	ILLINOIS	2018
654357	I-201-17315-187517	CERTIFIED	2017-11-13	DORAL	2017-11-17	JOFEMAR USA INC.	Y	BUSINESS RESEARCH AND DEVELOPMENT	33172	65936.0	13-1161	MARKET RESEARCH ANALYSTS AND MARKETING SPECIAL...	FL	FLORIDA	2017
654358	I-200-18080-929219	CERTIFIED	2018-03-21	DURHAM	2018-03-27	IBM INDIA PRIVATE LIMITED	Y	PACKAGE SOLUTION CONSULTANT	27709	78416.0	15-1121	COMPUTER SYSTEMS ANALYSTS	NC	NORTH CAROLINA	2018
654359	I-200-18261-995519	CERTIFIED	2018-09-18	REDMOND	2018-09-24	MICROSOFT CORPORATION	Y	SOFTWARE ENGINEER	98052	133806.0	15-1132	SOFTWARE DEVELOPERS, APPLICATIONS	WA	WASHINGTON	2018

		JOB_TITLE
EMPLOYER_NAME	JOB_TITLE
ACCENTURE LLP	COMPUTER PROGRAMMER/CONFIGURER 2	6899
	COMPUTER SYSTEMS ANALYST 2	5229
	COMPUTER SPECIALIST/TESTING AND QUALITY ANALYST 2	4304
	COMPUTER SYSTEMS ANALYST 3	2900
CAPGEMINI AMERICA INC	SOFTWARE ENGINEER 2	2664
CAPGEMINI AMERICA INC	PROGRAMMER/DEVELOPER 2	2031
DELOITTE CONSULTING LLP	SENIOR CONSULTANT	16665
	CONSULTANT	14000
	MANAGER	6596
	SPECIALIST SENIOR	2342
ERNST & YOUNG U.S. LLP	ADVISORY SENIOR	6301
	ADVISORY MANAGER	3121
	ADVISORY STAFF	2601
	ASSURANCE STAFF	2203
HCL AMERICA, INC.	PROGRAMMER ANALYST - II	4476
HCL AMERICA, INC.	PROJECT MANAGER - III	2315
IBM INDIA PRIVATE LIMITED	APPLICATION DEVELOPER	4392
IBM INDIA PRIVATE LIMITED	IT SPECIALIST	2734
INFOSYS LIMITED	TECHNOLOGY LEAD - US	25095
	TECHNOLOGY ANALYST - US	19974
	SYSTEMS ENGINEER - US	7491
	CONSULTANT - US	6970
	PROJECT MANAGER - US	6036
	TECHNICAL TEST LEAD - US	4905
	TECHNOLOGY ARCHITECT - US	4628
	TECHNOLOGY LEAD - US - PRACTITIONER	4489
	TEST ANALYST - US	4172
	ASSOCIATE CONSULTANT - US	3675
	LEAD CONSULTANT - US	3547
	SENIOR PROJECT MANAGER - US	2602
MICROSOFT CORPORATION	SOFTWARE ENGINEER	7239
	SOFTWARE DEVELOPMENT ENGINEER	4384
	SOFTWARE DEVELOPMENT ENGINEER IN TEST	4348
	SENIOR SOFTWARE ENGINEER	2704
TATA CONSULTANCY SERVICES LIMITED	DEVELOPER	18874
	COMPUTER PROGRAMMER	12348
	DEVELOPER USER INTERFACE	11394
	ARCHITECT	5643
	WEB DEVELOPER	4520
	ANALYST	3283
	COMPUTER SYSTEMS ENGINEER	2775
WIPRO LIMITED	LEAD ENGINEER	12452
	PROGRAMMER ANALYST	9932
	PROJECT MANAGER	5493
	CONSULTANT	4490
	BUSINESS SYSTEM ANALYST	3463
	ARCHITECT LEVEL 2	3191
	SYSTEM ADMINISTRATOR	2799
	TEST ENGINEER LEVEL 2	2668

	wage_mean	wage_median	wage_min	wage_max
EMPLOYER_NAME
NEVADA ORTHOPEDIC AND SPINE CENTER, LLP	1996800.0	1996800.0	1996800.0	1996800.0
RENEGADE 83, LLC	1978782.0	1978782.0	1978782.0	1978782.0
STANTON OPTICAL FLORIDA, LLC	1960946.0	1960946.0	1960946.0	1960946.0
Shadowmachine, LLC	1954680.0	1954680.0	1954680.0	1954680.0
UNIVERSAL METAL PRODUCTS	1951196.0	1951196.0	1951196.0	1951196.0
george	1950000.0	1950000.0	1950000.0	1950000.0
MARTELL CAPITAL GROUP LLC	1946880.0	1946880.0	1946880.0	1946880.0
JLB PARTNERS LLC	1944176.0	1989598.0	1853332.0	1989598.0
ROSALYN YALOW CHARTER SCHOOL	1941472.0	1941472.0	1941472.0	1941472.0
SUN RIPE CO.	1940380.0	1940380.0	1940380.0	1940380.0

	CASE_NO
YEAR
2007	49669
2008	346735
2009	154337
2010	333679
2011	370950
2012	317824
2013	74897
2014	519375
2015	612865
2016	635996
2017	616599
2018	552474

	wage_mean	wage_median	wage_min	wage_max
FSTATE
WASHINGTON	86848.904402	84739.0	15966.0	1908480.0
NORTH DAKOTA	83574.505889	55020.0	16952.0	523713.0
CALIFORNIA	83414.816024	78811.0	11072.0	1995006.0
RHODE ISLAND	81873.326776	83098.0	13000.0	1300000.0
ARKANSAS	77551.986265	65686.0	10000.0	1781936.0
MAINE	77451.106632	60986.0	14137.5	558060.0
SOUTH DAKOTA	76587.626971	51235.0	15080.0	1551680.0
WEST VIRGINIA	74987.434427	53935.0	12780.0	414007.0
MINNESOTA	74616.088051	68518.0	13591.5	1973400.0
NEW YORK	74009.406464	65874.0	10144.0	1996644.0